A scatter plot is used to determine whether a relationship exists between two sets of data.

In satistics, there is a difference between the terms "correlation" and "association".
Association refers to any relationship between two variables, be it linear or non-linear.
Correlation refers to specifically a linear relationship between two quantative variables.

hint gal
People may say there is "a strong correlation between hair color and IQ scores." What they mean to say is "a strong association between hair color and IQ scores", which , BTW, is a ridiculous statement. "Association" is a vague term describing a relationship, while "correlation" is a very precise term describing a linear relationship between quantitative (number) variables.
(Hair color is not a quantitative (number) variable, it's qualitative, "Correlation" does not apply.)

Since we are working with a "line" of best fit, we are working with a "linear" relationship. We will be using the term "correlation".
(Note: Your teacher may be using the more general term "association".)

definition
Correlation measures the strength of the linear relationship between two quantitative (number) variables.

When attempting to find a correlation, remember that:
1) "correlation" applies only to quantitative (number) variables.
2) while a correlation can be calculated for any pair of variables, it only measures the strength of the linear relationship, and will be misleading if the relationship is not linear.
3) outliers can distort a correlation (if an outlier is present, report the correlation with, and without, the outlier).

divider

Correlation and Lines of Best Fit

bluesocksgraph
The scatter plot at the left displays the relationship between the number of baskets scored at the big homecoming game and the number of pairs of blue socks owned by the players. It appears that the dots are clustering around a straight line moving upward across the graph.

A line of best fit has been found to predict the pattern seen in this graph. Notice that the slope of the line is positive. As the number of pairs of blue socks (the x-values) increases, the number of baskets made in the big game (the y-values) increases.

With the dots clustered closely to the increasing line of best fit in this scatter plot,
it can be said that there is a
high positive linear correlation for this data.

There are different types of linear correlations and different strengths to these correlations.

bullet Positive Linear Correlation:
(or Positive Linear Association)

A positive correlation indicates the extent to which data values increase at the same time. The y values will increase as the x values increase. The graph of such data will resemble a line rising from left to right. The slope of the line will be a positive number.

copositivehigh
These data points can be described as clustering about a rising straight line with a positive slope. The extent of the positive relationship will be strong.
 
lowposicor
These data points are not clustered to clearly show a straight line. They "tend" to be rising, but the extent of the positive relationship will be less strong (weaker).

 

bullet Negative Linear Correlation:
(or Negative Linear Association)

A negative correlation indicates the extent to which one data value increases as the other decreases. The y values will decrease as the x values increase. The graph of such data will resemble a line falling from left to right. The slope of the line will be a negative number.

copositivehigh
These data points can be described as clustering about a falling straight line with a negative slope. The extent of the negative relationship will be strong.
 
lownegcor
These data points are not clustered to clearly show a straight line. They "tend" to be falling, but the extent of the negative relationship will be less strong (weaker).

 

bullet No Linear Correlation:
(or No Linear Association)

If there is no apparent relationship between x and y, the data are said to have no correlation. The x and y values are referred to as being independent.

CorNoMix
There is no way of knowing from these data points if the pattern is rising or falling. A straight line cannot be found. There is no implication of a relationship.
 
lownegcor
Be careful here! While a straight line passes through these points, the line is horizontal with a slope of zero (meaning no change). This indicates that the value of x has no influence in changing the value of y.

divider


bullet Closeness of a Fit:


copositivehigh

When a line is a "good fit", the distances from the plotted points to the line will be SHORT. The shorter the distances the better the fit and the more reliable the predictions using this line.
 
lowposicor

While this is still a positive correlation, this line is less reliable as representing the data since the distances from some points to the line are getting longer in length. It is weaker for predictions.

 

divider

 

beware

It is fairly easy to find a situation where a change in one variable appears to predict a similar change in the other variable. When such situations are found, be careful not to assume that the change in one variable causes the change in the other variable.

In our example at the top of the page, it is highly unlikely that owning blue socks is influencing how many baskets are made in a basketball game. Yet, the graph indicates a statistical connection (correlation or association) between the data sets. Correlation does not imply "causation".

Keep in mind that there may be other factors influencing both variables in a similar manner,
or it might simply be a coincidence.

ti84c
Read more about Correlation and Causation
click here.

 


divider


NOTE: The re-posting of materials (in part or whole) from this site to the Internet is copyright violation
and is not considered "fair use" for educators. Please read the "Terms of Use".