A scatter plot is a graph of plotted points that shows a relationship between two sets of quantitative data (referred to as bivariate data). Scatter plots are composed of "dots" (points) on a set of coordinate axes. Do NOT connect the dots!

Statisticians and quality control technicians spend a good deal of time gathering sets of data to determine if relationships exist between the sets. Scatter plots are a popular and effective way of graphing data to display patterns, trends, relationships and an occasional extraordinary value located apart from the other values. Let's see an example.

 Does studying for that Final Exam really help your score? Does one event really affect the other?

The data in the table at the right shows the data for fifteen students as to their Final Exam scores and the number of hours they spent studying for that exam.

The scatter plot below displays this data. The plot appears to show that the longer students studied, the higher their examination scores.

According to this survey of 15 students studying for the same examination, it appears that the answer to our initial question is "yes", studying does affect your score. At least, the answer is "yes", for this particular group of students.

 Student Hours Studying Final Exam Score 1 1 50 2 2 70 3 2 68 4 3 60 5 4 55 6 4 75 7 5 90 8 6 70 9 6 80 10 7 75 11 7 80 12 7 90 13 8 85 14 8 98 15 9 95
NOTE: A scatter plot is not necessarily a function. It is often the case where the same x-value may have more than one corresponding y-value, such as (6,70) and (6,80).

Notice how the data in the graph could resemble a straight line rising from left to right. When working with scatter plots, if is often useful to represent the data with the equation of a straight line, called a "line of best fit", or a "trend" line. Such a line may pass through some of the points, none of the points, or all of the points on the scatter plot.

To see how to prepare a line of best fit by hand and a line of best fit with a graphing calculator, click the link below.

 What is the line of best fit for our "studying affects scoring" problem?

When finding the line of best fit "by hand", different students may arrive at different answers. So who's answer is the best? You will need a graphing calculator to get the "best" answer.

The graphing calculator computed the line of best fit shown at the right. The equation is:

y = 4.609662577x + 51.78911043

Based upon this equation, we can predict scores given any number of hours spent studying.

Interpolate:
If you are making predictions for values that fall within the plotted values, you are said to be interpolating. For this problem, our plotted values range from x = 1 to x = 9.

Example: Predict the final examination score of a student studying for 5½ hours.
(Substitute the number of hours into the equation for x.)

Score: approximately 77

Extrapolate: If you are making predictions for values that fall outside the plotted values, you are said to be extrapolating. Be careful when extrapolating.  The further away from the plotted values you go, the less reliable is your prediction. For this problem, outside of the plotted values would be x greater than 9 or x less than 1.

Example: Predict the final examination score of a student studying for 12 hours.
(Substitute the number of hours into the equation for x.)

Score: approximately 107

WOW!!! Great score!!
But is it realistic? It is very likely that the top score is 100.
So, in addition to yielding less reliable predictions, extrapolating
may also give completely unrealistic predictions.