Which error bar? When?

Is this the right page for you?

If you are looking for how to add error bars, go to descriptive stats page. If you want to understand which error bars to use when, keep reading!

Which type of error bars/descriptive stats do you need?

Standard Deviation (SD)

What does it tell you?
Standard deviation shows us the distance from the mean for a “typical” data point.

A low SD means less variability or variation in the data while high SD indicates more spread out data. The SD can be used to make inferences based on the premise that what is true for a randomly selected sample will be true, more or less, for the population from which the sample is chosen.

Standard deviation is a property of the population or thing being measured and will not get dramatically smaller even when you increase sample size a lot.

This is helpful when ... you want to describe the shape and spread of your data.

In a normal population we expect two thirds of the data points to fall within one SD of the mean. You can add SD error bars to a graph to help visually assess the distribution. When comparing two or more groups, SD error bars can also be useful in visualizing an effect size or how much difference there is between groups. All else being equal, if you have non-overlapping error bars with SD, you have a very big difference between groups.

What is it?
Standard deviation is the square root of the sum of differences between each data point and the mean squared, divided by the sample size minus one.

Standard Error of Mean (SEM)

What does it tell you?
SEM tells you how far your sample mean is likely to be from the true population mean.

Essentially this is a prediction of how far from your calculated mean the mean of another random sampling of data would be. In a sense it is a way to evaluate how strong your estimate of the mean is given the variation of your data around that mean. You could say that it is a way of estimating the precision of your estimate for the mean.

Because sample size is in the denominator of the equation for SEM, the standard error of the mean will become smaller as sample size gets larger. That makes sense because as you collect more data you should be able to get a better estimate of the true mean of the variable of interest in your study population.

This is helpful when ... you are comparing the means of two or more groups and want to see an estimate of precision around the means you have calculated for each group.

Note that in situations where the SEM will work it is also possible to use 2SEM (two times the SEM) or 95% Confidence Intervals as your error bars. Both 2SEM and 95% CI's are better choices if you want to make the connection between your graph and the results of a statistical test obvious for your audience. Unfortunately, the common misconception that "if the SEM bars do not overlap, the difference between the groups is statistically significant" is NOT always true.

What is it?
Standard Error of the Mean is the Standard Deviation divided by the square root of the sample size.

95% Confidence Interval (CI)

What does it tell you?
The 95% confidence interval can loosely be interpreted as the range that has a 95% chance of containing the true mean.

A more technically correct meaning is that it tells you that if you were to repeat the experiment many times, we would expect 95% of those confidence intervals to contain the true mean.

This is helpful when ... you are comparing the means of two or more groups and want to see if those groups have statistically different means.

Note that while it is true that if you have non-overlapping 95% confidence intervals then you will have a statistically significant difference between groups (P<0.05), the opposite is not always true. You can still have a statistically significant difference between groups with some degree of overlap in the 95% confidence intervals. Also note that sometimes 2SEM is used in place of 95%confidence intervals. If you compare the formulas for 2SEM and 95% CI's you will see that they are almost, but not perfectly, equivalent.

What is it?
95% confidence intervals are placed above and below the sample mean and have the value of 1.96 times the standard deviation divided by the sample size.

That 1.96 value is determined by your cutoff for statistical significance (alpha level). With a 0.05 cutoff the value is 1.96.

Box and Whisker

What does it tell you?
Box and Whisker plot visually summarizes the overall shape and spread of the data. It clearly highlights the median and shows where the middle 50% of the data are.

The box itself shows exactly where the middle 50% of the data are. This is often referred to as the interquartile range (IQR) and it goes from the 25th percentile (quartile 1) to the 75th percentile (quartile 3). The line that divides the box into two parts is the median value in the data. The whiskers above and below the box can show a few different things. The default in DataClassroom is to have the whiskers show the full range from the highest value to the lowest value in the data. If you choose to show outliers, then the whiskers will show either (depending on your choice) a value of 1.5 times the interquartile range (IQR) or two standard deviations above and below the mean.

This is helpful when ... you want to display a measure of central tendency for data that are not normally distributed.

In those cases the mean is not a good estimate of the middle of the data and a visualization that highlights the median and tells you something about the shape of the data, like the box and whiskers, is a better choice.

What is it?
The line in the middle is the median. The box is the middle 50% of the data and the whiskers are (usually) showing the highest and lowest values. Sometimes the whiskers show something else like 1.5 times the IQR.

The median is not a calculation of the average (mean) of data, but rather if all data is sorted in order, it is the middle value. If there is an even number of data points, then the median is the calculated average between the two most middle values of data. The 25th and 75th percentiles make up the upper and lower edges of the box. They are the medians of the upper and lower halves of the data.

Median Line

What does it tell you?
The median line tells you exactly where the middle value in your data is.

This is helpful when ... you simply want to highlight the median value itself and not the overall shape or spread of the data. It is great for emphasizing a comparison between two groups when the data are not normally distributed.

What is it?

The median is the middle value of your data.

This is not a calculation of the average (mean) of data, but rather if all data is sorted in order, it is the middle value. If there is an even number of data points, then the median is the calculated average between the two most middle values of data.