Difference between revisions of "Variable"

From Testiwiki
Jump to: navigation, search
(moved from Intarese)
 
(Answer: 'data not used' added to rationale)
 
(44 intermediate revisions by 9 users not shown)
Line 1: Line 1:
[[Category:Universal product]]
+
<noinclude>
[[Category:Guidebook]]
+
[[Category:Universal object]]
'''Variable''' is a description of a particular piece of reality. It is the basic building block of a risk assessment. It can be a description of physical phenomena, e.g. yearly average of PM<sub>2.5</sub> concentration in Kuopio in 2006, or a description of a value judgement, e.g. willingness to pay to avoid lung cancer. Variables (the scopes of variables) can be more general or more specific and hierarchically related, e.g. yearly average of PM<sub>2.5</sub> concentration in Finland in 2006 (general variable) and daily average of PM<sub>2.5</sub> concentration in Kuopio on January 1st, 2006.
+
[[Category:Open policy practice]]
 +
[[Category:Decision analysis and risk management]]
 +
{{variable|moderator=Jouni}}
 +
{{Guidebook}}
 +
[[category:Glossary term]]
 +
<section begin=glossary />
 +
:'''Variable''' is a description of a particular piece of reality. It can be a description of a physical phenomenon, or a description of value judgements. Also decisions included in an assessment are described as variables. Variables are continuously existing descriptions of reality, which develop in time as knowledge about the topic increases. Variables are therefore not tied into any single assessment, but instead can be included in other assessments. A variable is the basic building block of describing reality.<section end=glossary />
  
In order to make coherent descriptions of reality in assessments, the assessments must have a certain clear structure. As we also want to produce descriptions that are coherent between assessments, there must be a universal structure for all assessments. Variables with a certain set of attributes, and linkages between these variables is the universal structure of the assessments. For further details, see [[Guidance and methods for indicator selection and specification]] | [[heande:Heande:Structures of the building blocks of open risk assessments|Building blocks of risk assessments]]. The universal assessment structure is essential for coherent inclusion of causality in assessments, enabling of collective structured learning, collaborative work as well as combining value judgements with descriptions of physical reality.
+
== Question ==
  
'''Variable structure'''
+
What should be the structure of a variable such that it
 +
* is able to systematically handle all kinds of information about the particular piece of reality that the variable is describing, especially
 +
** it is generic enough to be a standard building block in decision support work (including interpretation of scientific information and political discussions),
 +
* is able to systematically describe causal relationships between phenomena and variables that describe them,
 +
* enables both quantitative and qualitative descriptions,
 +
* is suitable for any kinds of variables, especially physical phenomena, decisions, and value judgements,
 +
* inherits its main structure from [[universal object]]s,
 +
* complies with the [[PSSP]] ontology,
 +
* can be operationalised in a computational model system,
 +
* results in variables that are independent of the assessment(s) they belong to;
 +
* results in variables that pass the [[Plausibility test|clairvoyant test]].
 +
* can be implemented on a website, and
 +
* is easy enough to be usable and understood by interested non-experts?
  
In the new risk assessment method, variables have a specified structure with four basic attributes (and possibly some sub-attributes). The attributes of variables are the same as for other objects in the information structure of pyrkilo method, i.e. [[Help:Risk assessment structure | assessments]] and [[Help:Class | classes]].
+
== Answer ==
  
{{Variable attributes}}
+
Variable is implemented as a web page in Opasnet wiki web-workspace. A variable page has the following structure.
 
 
'''Name''' attribute is the identifier of the variable, which of course already more or less describes what the real-world entity the variable describes is. The variable names should be chosen so that they are descriptive, unambiguous and not easily confused with other variables. An example of a good variable name could be e.g. ''daily average of PM<sub>2.5</sub> concentration in Helsinki''.
 
 
 
'''Scope''' attribute defines the boundaries of the variable - what does it describe and what not? The boundaries can be e.g. spatial, temporal or abstract. In the above example variable, at least the geographical boundary restricts the coverage of the variable to Helsinki and the considered phenomena are restricted to PM<sub>2.5</sub> daily averages. There could also be some further boundary settings defined in the scope of the variable, which are not explicitly mentioned in the name of the variable.
 
 
 
{{Help:Variable definition}}
 
 
 
'''Result''' attribute is an answer to the question presented in the scope of the variable. A result is preferably a probability distribution (which can in a special case be a single number), but a result can also be non-numerical such as "very good". It should be noted that the result is the distribution itself, although it can be expressed as some kind of description of the distribution, such as mean and standard deviation. The result should be described in such a detailed way that the full distribution can be reproduced from the information presented under this attribute. A technically straightforward way to do this is to provide a large random sample from the distribution.
 
 
 
The result may be a different number for different ''locations'', such as geographical positions, population subgroups, or other determinants, Then, the result is described as
 
 
 
  R|x<sub>1</sub>,x<sub>2</sub>,...
 
 
 
where R is the result and x<sub>1</sub> and x<sub>2</sub> are defining the locations. A ''dimension'' means a property along which there are multiple locations and the result of the variable may have different values when the location changes. In this case, x<sub>1</sub> and x<sub>2</sub> are dimensions, and particular values of x<sub>1</sub> and x<sub>2</sub> are locations. A variable can have zero, one, or more dimensions. Even if a dimension is continuous, it is usually operationalised in practice as a list of discrete locations. Such a list is called an ''index'', and each location is called a ''row'' of the index.
 
 
 
Uncertainty about the true value of the variable is one dimension. The index of the uncertainty dimension is called the ''Sample'' index, and it contains a list of integers 1,2,3... . Uncertainty is operationalised as a sequence of random samples from the probability distribution of the result. The i<sup>th</sup> random sample is located in the i<sup>th</sup> row of the Sample index.
 
 
 
 
 
'''General attribute structure'''
 
 
 
Each attribute may contain three kinds of information:
 
* Actual content (only this will have an impact on other objects)
 
* Narrative description (to help understanding the actual content). Includes uncertainty analysis.
 
* Discussion (argumentation about issues in the actual content)
 
 
 
For a detailed description of discussions, see [[Help:Dispute]].
 
 
 
'''Connection to the PSSP structure'''
 
 
 
A universal information structure has been suggested. This is called PSSP (Purpose, Structure, State, Performance). PSSP describes the attributes of universal objects, whereas pyrkilo method is intended for describing particular objects in the context of risk assessment. The variable structure is closely connected to PSSP, and the relationships can be described in the following way.
 
  
 
{|{{prettytable}}
 
{|{{prettytable}}
! PSSP
+
|+The attributes of a variable.
! Variable structure
+
! [[Attribute]]
 +
! Sub-attribute
 +
! Comments specific to the variable attributes
 
|-----
 
|-----
| Purpose
+
| '''Name'''
| The purpose of a variable is to describe a particular piece of reality.
+
|  
 +
| An identifier for the variable. Each Opasnet page have two kinds of identifiers: the name of the page (e.g. Variable) and the page identifier (e.g. Op_en2022). The former is used e.g. in links, the latter in [[R]] code.
 
|-----
 
|-----
| Structure
+
| '''Question'''
| Scope, Unit, and Definition describe the structure of the variable.
+
|  
 +
| Gives the question that is to be answered. It defines the scope of the variable. The question should be defined in a way that it has relevance in many different situations, i.e. makes the variable re-usable. (Compare to an [[assessment]] question, which is more specific to time, place and user need.)
 
|-----
 
|-----
| State
+
| '''Answer'''
| Result is an expression of the state of the variable.
+
|  
 +
| An answer presents an understandable and useful answer to the question. Its essence is often a machine-readable and human-readable probability distribution (which can in a special case be a single number), but an answer can also be non-numerical such as "very valuable" or a descriptive table like on this page. The units of interconnected variables need to be coherent with each other given the functions describing causal relations. The units of variables can be used to check the coherence of the causal network description. This is a so called [[Plausibility test|unit test]]. Typically the answer contains an [[R]] code that fetches the ovariable created under Rationale/Calculations and evaluates it.
 
|-----
 
|-----
| Performance
+
| rowspan="5" | '''Rationale'''
| Performance is an expression of the uncertainty of the variable, i.e. how well does the variable fulfill its purpose, i.e. describe the piece of reality defined in the scope. On variable-level performance is evaluated separately for result (parameter uncertainty) and definition (model uncertainty). However, evaluating the performance of a scope of a variable can not be done on the variable-level, but instead on assessment-level.
+
|  
 +
| Rationale contains anything that is necessary to convince a critical reader that the answer is credible and usable. It presents the reader the information required to derive the answer and explains how it is formed. Typically it has the following sub-attributes, but also other are possible. Rationale may also contain lengthy discussions about relevant topics.
 +
|----
 +
| Data
 +
| Data tells about direct observations (or expert judgements) about the variable itself.
 +
|----
 +
| Dependencies
 +
| Dependencies {{reslink|Dependencies instead of causality}} tells what we know about how upstream variables (i.e. causal parents) affect the variable. In other words, we attempt to estimate the answer indirectly based on information of causal parents. Sometimes also reverse inference is possible based on causal children. Dependencies list the causal parents and expresses their functional relationships (the variable as a function of its parents) or probabilistic relationships (conditional probability of the variable given its parents).
 +
|----
 +
| Calculations
 +
| Calculations {{reslink|Discussion on formula attribute}} is an operationalisation of how to calculate or derive the answer. Formula uses algebra, computer code, or other explicit methods if possible. Typically it is [[R]] code that produces and stores the necessary [[ovariable]]s to compute the current best answer to the question.
 +
|----
 +
| Data not used
 +
| Data not used are relevant for the research question, but for some reason they were not used in producing the current answer. I may be that the data was found after the synthesis, and an update has not yet been done; or it has been unclear how to merge these to the existing data. In any case, it is important to be differentiate and be explicit about whether data is irrelevant (and therefore removed from the page) or relevant but not used (and therefore waiting for further work).
 
|}
 
|}
  
'''There are different kinds of variables'''
+
In addition, it is practical to have additional subtitles on a variable page. These are not attributes, though.
 
+
* See also
Although all variables share the same basic structure, it is useful to distinguish different kinds of variables based on their use or position in a risk assessment.
+
* Keywords (not always used)
* '''Endpoint variables''' are variables that describe phenomena which are outcomes of the assessed causal network, i.e. there is no variables downstream from an endpoint variable according to the scope of the assessment. In practice endpoint variables are most often also chosen as indicators.
+
* References
* '''Intermediate variables''' include all other variables besides endpoint variables.
+
* Related files
* '''Key variable''' {{disclink|Key variable}}is a variable which is particularly important in carrying out the assessment successfully and/or assessing the endpoints adequately.
 
* '''Indicator''' is a variable that is particularly important in relation to the interests of the intended users of the assessment output or other stakeholders. Indicators are used as means of effective communication of the assessment results. Communication here refers to conveying information about certain phenomena of interest to the intended target audience of the assessment output, but also to monitoring the statuses of the certain phenomena e.g. in evaluating effectiveness of actions taken to influence that phenomena. In the context of integrated assessment indicators can generally be considered as pieces of information serving the purpose of communicating the most essential aspects of a particular risk assessment to meet the needs of the uses of the assessment. Indicators can be endpoint variables, but also any other variables located anywhere in the causal network.
 
* '''Decision variables''' are possible decisions that are in consideration within a risk assessment. The main interest of the assessment is then the comparison of outcomes resulting from the different decision options. More of [[Help:Decision variable|decision variables]] can be found from a separate page.
 
 
 
 
 
Variables are versatile objects. They are able to describe all of the following aspects of reality:
 
* '''Causal relationships''' linking variables in the different steps in the causal chain from source to impact (mainly in the definition/causality attribute);
 
* Different environmental, social, economic and infrastructural '''contexts''' in which risks might arise and play out (mainly in the  scope attribute);
 
* Physical and chemical '''processes''' that generate, transform and transport the hazards (agents) from source to the target organs in the human body (mainly as variables that are defined as functions);
 
* '''Indicators''' to describe and communicate the causal chain and impacts (variables selected for reporting);
 
* Different '''policy measures''' that might be taken to address the risks, and thus different assessment scenarios that might be compared (decision variables);
 
* '''Appraisal''' of the impacts (and the policy scenarios to which they relate), in the light of agreed value systems and rules for evaluation (variables describing value judgements or derived from value judgement variables).
 
* '''Adaptation and feedback loops''' arising as a result of adaptation to the risks, at both individual and institutional level. A feedback loop is described as a variable that is indirectly dependent on the result of itself '''at a previous time point'''.
 
 
 
Ideally, all variables in the full-chain can be expressed quantitatively. In order to use the full chain approach quantitatively in an integrated assessment, it is necessary to acquire data for the variables, or to estimate these variables by modelling the underlying causal processes.
 
 
 
'''Proxies are not indicators'''
 
 
 
The term indicator is sometimes also (mistakenly, in the eyes of the new risk assessment method) used in the meaning of a proxy. Proxies are used as replacements for the actual objects of interest in a description if adequate information about the actual object of interest is not available. Proxies are indirect representations of the object of interest that usually have some identified correlation with the actual object of interest. At least within the context of the new risk assessment method, proxy and indicator have clearly different meanings and they should not be confused with each other. The figure below attempts to clarify the difference between proxies and indicators:
 
 
 
 
 
<center>
 
[[Image:Indicators and proxies.PNG]]
 
</center>
 
 
 
 
 
In the example, a proxy (PM<sub>10</sub> site concentration) is used to indirectly represent and replace the actual object of interest (exposure to traffic PM<sub>2.5</sub>). Mortality due to traffic PM<sub>2.5</sub> is identified as a variable of specific interest to be reported to the target audience, i.e. selected as an indicator. The other two nodes in the graph are considered as ordinary variables. The above graph has been made with Analytica, here is the [[:media:Indicator guidance.ANA| the original Analytica file]].
 
 
 
'''Specifying indicators and other variables'''
 
 
 
When the endpoints, indicators and key variables have been identified, they should be specified in more detail. Additional variables are created and specified in addition to the endpoints, indicators and key variables as is necessary to complete the causal network. Specifying these variables means defining the contents of the attributes of each variable. The four plausibility tests are very useful in specifying variables.
 
 
 
{{Help:Plausibility tests}}
 
 
 
The specification of variables proceeds in iterative steps, going into more detail as the overall understanding of the assessed phenomena increases. First, it is most crucial to specify the scopes (and names) of the variables and their causal relations. As part of the specification process, in particular the name and scope attributes, the '''[[Help:Plausibility tests|clairvoyant test]]''' can be applied. The test helps to ensure the clarity  and unambiguity of the variable scope.
 
 
 
Addressing causalities means in practice that all changes in any variable description should be reflected in all the variables that the particular variables is causally linked to. At this point, the '''[[Help:Plausibility tests|causality test]]''' can be used, although not always necessarily quantitatively. In the early phases of the process, it is probably most convenient to describe causal networks as diagrams, representing the indicators, endpoints, key variables and other variables as nodes (or boxes) and causal relations as arrows pointing from ''upstream'' variables to ''downstream'' variables. In the graphical representations of causal networks the arrows are only statements of existence of a causal relation between particular variables, more detailed definitions of the relations should be described within the definition attribute of each variable according to how well the causal relation is known or understood.
 
 
 
Once a relatively complete and coherent graphical representation of the causal network has been created, the specification process for the identified indicators may continue to more detail. The indicators, the ''leading variables'', are of crucial importance in the assessment process. If, during the specification process, it turns out that the indicator would conflict with one or several of the properties of good indicators, such as calibration, it may be necessary to consider revising the scoping of the indicator or choosing another ''leading variable'' in the source - impact chain to replace it. This may naturally bring about a partial revision of the whole causal network affecting a bunch of key variables, endpoints and indicators. For example, it may happen that no applicable exposure-response function is available for calculating the health impact from intake of ozone. In this case, the exposure-response indicator may be replaced with an intake fraction indicator affecting both the ''downstream'' and ''upstream'' variables in the causal network in the form of e.g. bringing about a need to change the units the variables are described in.
 
 
 
The description, unit and definition attributes are specified as is explained in the previous section. The '''[[Help:Plausibility tests|unit test]]''' can be applied to check the calculability, and thus descriptive coherence, of the causal network. When all the variables in the network appear to pass the required tests, the indicator and variable results can be computed across the network and the first round of iteration is done. Improvement of the description takes place through deliberation and re-specification of the variables, especially definition and result attributes, until an adequate level of quality of description throughout the network has been reached. The discussion attribute provides the place for deliberating and documenting deliberation throughout the process.
 
 
 
'''Importance of indicators in the assessment process'''
 
 
 
Indicators have a special role in making the assessment. As mentioned above, indicators are the variables of most interest from the point of view of the use, users and other audiences of the assessment. The idea thus behind the indicator selection, specification and use is to highlight the most important and/or significant parts of the source-impact chain which are to be assessed and subsequently reported. The selected set of indicators guides the assessment process to address the relevant issues within the assessment scope according to the purpose of the assessment. It could be said that indicators are the ''leading variables'' in carrying out the assessment, other variables are subsidiary to specifying the indicators.
 
 
 
However, within the context of integrated risk assessment, selecting and specifying indicators may sound more straightforward than it actually is. Maybe, identification of indicators and specification of the causal network in line with the identified indicators, could grasp the essence of the process better. Instead of merely picking from a predefined set of indicators, selection here refers rather to identifying the most interesting phenomena within the scope of the assessment in order to describe and report them as indicators. Specification of indicators then is similar to specification of all other variables, although indicators are the ones that are primarily considered while other variables are considered secondarily, and mainly in relation to the indicators.
 
  
In principle, any variable could be chosen as an indicator and the set(s) of indicators could be composed of any types of indicators across the full-chain description. In practice, the generally relevant types of indicators, such as performance indicators can be somewhat predefined and even some detailed indicators can be defined in relation to commonly existing purposes and user needs. This kind of generality is also helpful in bringing coherence between the assessments.
+
== Rationale ==
  
'''On the generalizability of variables'''
+
[[File:Information_flow_within_open_policy_practice.png|thumb|450px]]
 +
The structure is based on extensive discussions between Mikko Pohjola and Jouni Tuomisto in 2006-2008 and intensive application in Opasnet ever since.
  
Aim: Variables must be generalizable so that they can be used without additional knowledge of the context. In other words, the context must be described well enough inside the variable.
+
For more detailed description about variables as information objects, see [[knowledge crystal]].
  
&rarr; Because of this, the variables must be estimates about the truth, and not deliberate under- or overestimates. Biased estimates are common in risk assessment because usually the assessments want to avoid false negative results much more than false positive results. In other words, it is much worse if there is a risk and you don't find it than if there is no risk and you think there is.
+
== See also ==
:&rarr; Decisions may be based on risk aversion, but the estimates of variables must be best estimates, because you cannot know which decisions will be based on the variable.
 
  
'''Technical issues in Mediawiki'''
+
* [[Ovariable]]
 +
* [[:Category:Variables | List of all variables]] in Opasnet
 +
* [[Universal object]]
 +
* [[Open assessment]]
 +
* [http://en.opasnet.org/w/index.php?title=Variable&oldid=5596 A previous version of this page] contains much of the discussion from the Intarese deliverables D17 and D18, which has been edited with a hard hand.
  
* Each variable is a page in the ''Variable'' namespace. The '''name''' of the variable is also the name of the page. However, draft variables may be parts of other pages.
+
== References ==
* The '''scope''' is the first paragraph(s) on the page, before the first sub-title. Scope starts with the word '''Scope''' in the previous line (wiki code <nowiki>'''Scope'''<br></nowiki>. The name should be repeated in the beginning of scope in bold, followed by text "describes..." and then a description of the scope (whenever the contents fits in this format). Subtitles are NOT used with Scope; this way, it locates above the table of contents.
 
* All other attributes ('''unit, definition, result''') are second-level (==) sub-titles on the page.
 
* Description of the attribute content is added at the end of that content; discussions on the content are added to the Talk page, each discussion under an own descriptive title.
 
* References to external sources are added to the text with the <nowiki><ref>Reference information</ref></nowiki> tag. The references are located in the  end of the page under subtitle References. However, reference is not an attribute of the variable despite it is technically similar.
 
* In the formula, computer code for a specific software may be used. The following are in use.
 
**Analytica_id: Identifier of the respective node in an Analytica model. <anacode>Place your Analytica code here. Use double Enter to make a line break.</anacode>
 
** <rcode>Place you R code here. Use double Enter to make a line break.<rcode>
 
  
'''See also'''
+
<references/>
  
* [[:heande:Heande:Structures of the building blocks of open risk assessments]]
+
== Related files ==
* [[:heande:Help:Open risk assessment]]
+
</noinclude>
* [[Seven challenges in integrated assessment: From properties to collaboration]]
 

Latest revision as of 13:33, 16 December 2015


<section begin=glossary />

Variable is a description of a particular piece of reality. It can be a description of a physical phenomenon, or a description of value judgements. Also decisions included in an assessment are described as variables. Variables are continuously existing descriptions of reality, which develop in time as knowledge about the topic increases. Variables are therefore not tied into any single assessment, but instead can be included in other assessments. A variable is the basic building block of describing reality.<section end=glossary />

Question

What should be the structure of a variable such that it

  • is able to systematically handle all kinds of information about the particular piece of reality that the variable is describing, especially
    • it is generic enough to be a standard building block in decision support work (including interpretation of scientific information and political discussions),
  • is able to systematically describe causal relationships between phenomena and variables that describe them,
  • enables both quantitative and qualitative descriptions,
  • is suitable for any kinds of variables, especially physical phenomena, decisions, and value judgements,
  • inherits its main structure from universal objects,
  • complies with the PSSP ontology,
  • can be operationalised in a computational model system,
  • results in variables that are independent of the assessment(s) they belong to;
  • results in variables that pass the clairvoyant test.
  • can be implemented on a website, and
  • is easy enough to be usable and understood by interested non-experts?

Answer

Variable is implemented as a web page in Opasnet wiki web-workspace. A variable page has the following structure.

The attributes of a variable.
Attribute Sub-attribute Comments specific to the variable attributes
Name An identifier for the variable. Each Opasnet page have two kinds of identifiers: the name of the page (e.g. Variable) and the page identifier (e.g. Op_en2022). The former is used e.g. in links, the latter in R code.
Question Gives the question that is to be answered. It defines the scope of the variable. The question should be defined in a way that it has relevance in many different situations, i.e. makes the variable re-usable. (Compare to an assessment question, which is more specific to time, place and user need.)
Answer An answer presents an understandable and useful answer to the question. Its essence is often a machine-readable and human-readable probability distribution (which can in a special case be a single number), but an answer can also be non-numerical such as "very valuable" or a descriptive table like on this page. The units of interconnected variables need to be coherent with each other given the functions describing causal relations. The units of variables can be used to check the coherence of the causal network description. This is a so called unit test. Typically the answer contains an R code that fetches the ovariable created under Rationale/Calculations and evaluates it.
Rationale Rationale contains anything that is necessary to convince a critical reader that the answer is credible and usable. It presents the reader the information required to derive the answer and explains how it is formed. Typically it has the following sub-attributes, but also other are possible. Rationale may also contain lengthy discussions about relevant topics.
Data Data tells about direct observations (or expert judgements) about the variable itself.
Dependencies Dependencies R↻ tells what we know about how upstream variables (i.e. causal parents) affect the variable. In other words, we attempt to estimate the answer indirectly based on information of causal parents. Sometimes also reverse inference is possible based on causal children. Dependencies list the causal parents and expresses their functional relationships (the variable as a function of its parents) or probabilistic relationships (conditional probability of the variable given its parents).
Calculations Calculations R↻ is an operationalisation of how to calculate or derive the answer. Formula uses algebra, computer code, or other explicit methods if possible. Typically it is R code that produces and stores the necessary ovariables to compute the current best answer to the question.
Data not used Data not used are relevant for the research question, but for some reason they were not used in producing the current answer. I may be that the data was found after the synthesis, and an update has not yet been done; or it has been unclear how to merge these to the existing data. In any case, it is important to be differentiate and be explicit about whether data is irrelevant (and therefore removed from the page) or relevant but not used (and therefore waiting for further work).

In addition, it is practical to have additional subtitles on a variable page. These are not attributes, though.

  • See also
  • Keywords (not always used)
  • References
  • Related files

Rationale

Error creating thumbnail: Unable to save thumbnail to destination

The structure is based on extensive discussions between Mikko Pohjola and Jouni Tuomisto in 2006-2008 and intensive application in Opasnet ever since.

For more detailed description about variables as information objects, see knowledge crystal.

See also

References


Related files