Continuous Data Use Scatter and Line

Use scatterplots to show relationships between pairs of continuous variables. These graphs display symbols at the X, Y coordinates of the data points for the paired variables. Scatterplots are also known as scattergrams and scatter charts.

Scatterplot that displays the negative relationship between flash recovery time and batter votlage.The pattern of dots on a scatterplot allows you to determine whether a relationship or correlation exists between two continuous variables. If a relationship exists, the scatterplot indicates its direction and whether it is a linear or curved relationship.

Fitted line plots are a special type of scatterplot that displays the data points along with a fitted line for a simple regression model. This graph allows you to evaluate how well the model fits the data.

Use scatterplots to assess the following features of your dataset:

  • Examine the relationship between two variables.
  • Check for outliers and unusual observations.
  • Create a time series plot with irregular time-dependent data.
  • Evaluate the fit of a regression model.

At a minimum, scatterplots require two continuous variables. To learn about other graphs, read my Guide to Data Types and How to Graph Them.

Example Scatterplot

During an experiment, I measured the Body Mass Index (BMI) and body fat percentage of adolescent girls. I graphed these two variables in a scatterplot to assess the relationship between them.

Fitted line plot that fits the curved relationship between BMI and body fat percentage.

Scatterplots typically contain the following elements:

  • X-axis representing values of a continuous variable. By custom, this is the independent variable when you can classify one of the variables as such.
  • Y-axis representing values of a continuous variable. Traditionally, this is the dependent variable.
  • Symbols plotted at the (X, Y) coordinates of your data. Optionally, the graph can use different colored/shaped symbols to represent separate groups on the same chart.
  • Optionally, you can overlay fit lines to determine how well a model fits the data.

For the BMI and the body fat data, the scatterplot displays a moderately strong, positive relationship. As BMI increases, the body fat percentage also tends to increase. The relationship appears to curve slightly because it flattens out for higher BMI values. To model the curvature, the analysts include a squared term in the model. The fitted line follows the curvature of the data, indicating a good fit.

Interpreting Scatterplots and Assessing Relationships between Variables

Scatterplots display the direction, strength, and linearity of the relationship between two variables.

Positive and Negative Correlation and Relationships

Values tending to rise together indicate a positive correlation. For instance, the relationship between height and weight have a positive correlation.

This scatterplot displays a positive correlation between height and weight.

However, if one variable increases as the other decreases, it's a negative correlation, as shown below.

Scatterplot that displays the negative relationship between flash recovery time and batter voltage.

Strength of Relationships

Stronger relationships produce a tighter clustering of data points. Be aware that changes in scaling can change the apparent strength of the relationship. Correlation coefficients provide an objective assessment of strength independent of graph scaling.

In the two graphs below, the data points in the top graph cluster more tightly than the data points in the bottom graph. Consequently, the first dataset displays a stronger relationship.

Fitted line plot for a model with a high R-squared and low variability data.

Fitted line plot for a model with a low R-squared and high variability data.

Stronger relationships produce correlation coefficients closer to -1 and +1 and regression models that have higher R-squared values.

Related post: Interpreting Correlation Coefficients

Linear and Curved Relationships

Determine whether your data have a linear or curved relationship. When a relationship between two variables is curved, it affects the type of correlation you can use to assess its strength and how you can model it using regression analysis.

An example regression model to illustrate when to us regression.

Adding a fit line highlights how well the model fits your data. When a relationship exists, you might want to model it using regression analysis.

Related post: Modeling Curvature Using Regression

Determine Whether the Relationship Changes between Groups

When your data have groups, you can determine whether the relationship between two variables differs between the groups. To make these comparisons, you'll need a categorical variable that defines the groups. All groups must use the same X and Y measurements.

In this scatterplot, the slope of the relationship is the same for the two groups, but the output values of group B are consistently higher for any given input value.

Scatterplot for comparing whether the constants are different.

In this scatterplot, the slope for group B is steeper than for group A. As the input value increases, the output for group B increase more quickly than group A.

Scatterplot for comparing whether two regression models are different.

Use indicator variables and interaction terms in a regression model to test the statistical significance of these differences. Click the link below for details.

Related post: Comparing Regression Lines with Hypothesis Tests

Find Outliers and Unusual Observations with Scatterplots

Scatterplots can help you find multiple types of outliers.

Some outliers have extreme values. These outliers are distanced from other data points, as shown below.

Scatterplot that displays an outlier.

Unusual observations have values that are not necessarily extreme, but they do not fit the observed relationship. In the scatterplot below, the circled point has X and Y values that are not unusual. However, the combination of the two values clearly does not fit the overall relationship.

Scatterplot that displays an unusual value that does not fit the relationship.

Related post: Five Ways to Find Outliers in Your Data

Trends Over Time

Typically, analysts use time series plots to display data over time. However, you can also use scatterplots for this purpose. Scatterplots are a perfect choice for time-related data when your observations occur at irregular intervals. When creating a scatterplot for time data, be sure to add a connect line between the data points!

Use Scatterplots with the Appropriate Hypothesis Tests

You can use scatterplots to display the relationships between continuous variables. However, if you plan to use your sample to infer the characteristics of an entire population, be sure to perform the necessary hypothesis tests and assess statistical significance.

Related post: Descriptive versus Inferential Statistics

Graphs can be subjective because your software lets you edit their properties, such as the graph's scaling. Altering these settings can change the appearance of scatterplots and the conclusions you draw from them. On the other hand, hypothesis tests present an objective evaluation of statistical significance. They also account for the possibility of random error explaining the observed patterns and differences.

Correlation and regression analysis are the primary methods for statistically assessing relationships between continuous data.

strachandocausen.blogspot.com

Source: https://statisticsbyjim.com/graphs/scatterplots/

0 Response to "Continuous Data Use Scatter and Line"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel