Free Republic
Browse · Search
News/Activism
Topics · Post Article

To: cornelis; Alamo-Girl; occamsrapier
I just realized I made a mistake. There are a few different types of correlation in probability theory. I gave you the definition of Pearson correlation. There are slighly different measures for the extent to which random variabels are related. For instance, there is Spearman correlation, which measures the extent to which the ranking of observations of two random variable are related.

Nevertheless, my basic point stands: correlation, no matter how you measure it, is a property of random variables. Anything non-random by definition is not correlated with anything.

234 posted on 11/03/2005 5:03:01 PM PST by curiosity (Cronyism is not conservative)
[ Post Reply | Private Reply | To 214 | View Replies ]


To: curiosity; Alamo-Girl; betty boop
Close, but needs some elucidation. Didact_Mode=[FULL]

Correlation does exist between random variables; "random variable", however, is just another name (used by probability theorists rather than analysts) for "measurable function." There is nothing "random" about random variables.

The mean of a random variable, g(x), is defined to be the average of g(x) over its domain (what x ranges over). If x is a discrete set (like the spots on a die) then the average is just the sum of the random variables divided by the number of elements in the set. For the die, (1+2+3+4+5+6)/6 = 3.5 Note that the mean, or average, or expected value need not be one of the possible values. If x is continuous, an integral replaces the sum.

The variance of a random variable is the average of the square minus the square of the average. It is a measure of the spread of the variable (the square root of the variance is the standard deviation.) For the die; one gets 1+4+9+16+25+36/6-3.5**2 = 2&11/12. This is the same as the average of the square of the variable minus its average.

The co-variance between two variables is like the variance, but instead of squaring the variable; one takes the product of the two variables. Let the variables be f(x) and g(x) and the respective means be f_bar and g_bar; also let the standard deviations of the variables be f_sdv and g_sdv; then the co-variance is just the average of (f(x)-f_bar)*(g(x)-g_bar) and the correlation coefficient is that value divided by f_sdev*g_sdev.

If the correlation coefficient is positive, the variables tend to increase together; if negative, one variable tends to increase while the other decreases. Perfect correlation is +1 and perfect anti-correlation is -1; 0 is perfectly uncorrelated. The correlation coefficient is actually the cosine of the angle between the random variables in the appropriate vector space.

If the random variables are produced by a statistical sampling procedure; there are tests to see if the magnitude of the correlation coefficient is meaningful. Any real-world sample will in general deviate somewhat from its true distribution and thus statistical tests are needed.

With real world measurements; small deviations from a zero correlation coefficient are insignificant; large ones are not. There are no examples of large correlations without some causation. It can be difficult to find out what the causal relation actually is; example: the number of heart attacks per year (in the US) is proportional to the number of mangoes eaten but mangoes are not believed to directly cause heart attacks. Both the number of heart attacks and number mangoes eaten are related to the population size; a better measure would be heart attacks per 100,000 population (which is closer to what is actually used.)

238 posted on 11/03/2005 6:57:12 PM PST by Doctor Stochastic (Vegetabilisch = chaotisch ist der Charakter der Modernen. - Friedrich Schlegel)
[ Post Reply | Private Reply | To 234 | View Replies ]

Free Republic
Browse · Search
News/Activism
Topics · Post Article


FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson