Which distribution is the most spread out




















Due to the difficulty in answering this question, some texts suggest that for an even-length list of ordinal data, one should instead simply choose the lower of the two middle values to be the median.

The mode is the most frequent data value in the population or sample. There can be more than one mode, although in the case where there are no repeated data values, we say there is no mode. Modes can be used even for nominal data. The midrange is just the average of the highest and lowest data values. While easily understood, it is strongly affected by extreme values in the data set, and does not reliably find the center of a distribution. In addition to knowing where the center is for a given distribution, we often want to know how "spread out" the distribution is -- this gives us a measure of the variability of values taken from this distribution.

The below graphic shows the general shape of three symmetric unimodal distributions with identical measures of center, but very different amounts of "spread". Just as there were multiple measures of center, there are multiple measures of spread -- each having some advantages in certain situations and disadvantages in others:. The range is technically the difference between the highest and lowest values of a distribution, although it is often reported by simply listing the minimum and maximum values seen.

It is strongly affected by extreme values present in the distribution. Another measure of spread is given by the mean absolute deviation , which is the average distance to the mean. However you should study the following step-by-step example to help you understand how the standard deviation measures variation from the mean. The calculator instructions appear at the end of this example. In a fifth grade class, the teacher was interested in the average age and the sample standard deviation of the ages of her students.

The ages are rounded to the nearest half year:. The variance may be calculated by using a table. Then the standard deviation is calculated by taking the square root of the variance.

We will explain the parts of the table after calculating s. The sample variance, s 2 , is equal to the sum of the last column 9. The sample standard deviation s is equal to the square root of the sample variance:. Typically, you do the calculation for the standard deviation on your calculator or computer. The intermediate results are not rounded. This is done for accuracy. Use your calculator or computer to find the mean and standard deviation.

Then find the value that is two standard deviations above the mean. The deviations show how spread out the data are about the mean. The data value A positive deviation occurs when the data value is greater than the mean, whereas a negative deviation occurs when the data value is less than the mean. The deviation is —1. If you add the deviations, the sum is always zero. So you cannot simply add the deviations to get the spread of the data.

By squaring the deviations, you make them positive numbers, and the sum will also be positive. The variance, then, is the average squared deviation. The variance is a squared measure and does not have the same units as the data.

Taking the square root solves the problem. The standard deviation measures the spread in the same units as the data. For the sample variance, we divide by the sample size minus one n — 1. Why not divide by n? The answer has to do with the population variance. The sample variance is an estimate of the population variance. Based on the theoretical mathematics that lies behind these calculations, dividing by n — 1 gives a better estimate of the population variance. Your concentration should be on what the standard deviation tells us about the data.

The standard deviation is a number which measures how far the data are spread from the mean. Let a calculator or computer do the arithmetic.

The variability in data depends upon the method by which the outcomes are obtained; for example, by measuring or by random sampling. When the standard deviation is zero, there is no spread; that is, the all the data values are equal to each other. The standard deviation is small when the data are all concentrated close to the mean, and is larger when the data values show more variation from the mean. The standard deviation, when first presented, can seem unclear. You will find that in symmetrical distributions, the standard deviation can be very helpful but in skewed distributions, the standard deviation may not be much help.

The reason is that the two sides of a skewed distribution have different spreads. In a skewed distribution, it is better to look at the first quartile, the median, the third quartile, the smallest value, and the largest value. Because numbers can be confusing, always graph your data. Display your data in a histogram or a box plot. The long left whisker in the box plot is reflected in the left side of the histogram. The histogram, box plot, and chart all reflect this. There are a substantial number of A and B grades 80s, 90s, and The histogram clearly shows this.

The following data show the different types of pet food stores in the area carry. Recall that for grouped data we do not know individual data values, so we cannot describe the typical value of the data with precision. In other words, we cannot find the exact mean, median, or mode. Just as we could not find the exact mean, neither can we find the exact standard deviation. Remember that standard deviation describes numerically the expected deviation a data value has from the mean.

Find the standard deviation for the data in Figure. This means that a randomly selected data value would be expected to be 3. If we look at the first class, we see that the class midpoint is equal to one. This is almost two full standard deviations from the mean since 7. It is usually best to use technology when performing the calculations.

Input the midpoint values into L1 and the frequencies into L2. Select 2 nd then 1 then , 2 nd then 2 Enter. The standard deviation is useful when comparing data values that come from different data sets.

If the data sets have different means and standard deviations, then comparing the data values directly can be misleading. In symbols, the formulas become:. Two students, John and Ali, from different high schools, wanted to find out who had the highest GPA when compared to his school.

Which student had the highest GPA when compared to his school? Pay careful attention to signs when comparing and interpreting the answer. For John,. For Ali,. Two swimmers, Angie and Beth, from different teams, wanted to find out who had the fastest time for the 50 meter freestyle when compared to her team. Which swimmer had the fastest time when compared to her team? The following lists give a few facts that provide a little more insight into what the standard deviation tells us about the distribution of the data.

King, Bill. The standard deviation can help you calculate the spread of data. There are different equations to use if are calculating the standard deviation of a sample or of a population. Two baseball players, Fredo and Karl, on different teams wanted to find out who had the higher batting average when compared to his team. Which baseball player had the higher batting average when compared to his team?

In the figure below, this corresponds to the region shaded pink. A set of data is normally distributed with a mean of 5. What percent of the data is less than 5? A normal distribution is symmetric about the mean. So, half of the data will be less than the mean and half of the data will be greater than the mean. The life of a fully-charged cell phone battery is normally distributed with a mean of 14 hours with a standard deviation of 1 hour. What is the probability that a battery lasts at least 13 hours?

The mean is 14 and the standard deviation is 1. The interval from 13 to 14 hours represents one standard deviation to the left of the mean. The average weight of a raspberry is 4.

What is the probability that a randomly selected raspberry would weigh at least 3. The mean is 4. In the above example, we have an even number of scores students, rather than an odd number, such as 99 students. However, if we had an odd number of scores say, 99 students , we would only need to take one score for each quartile that is, the 25th, 50th and 75th scores. You should recognize that the second quartile is also the median.

Quartiles are a useful measure of spread because they are much less affected by outliers or a skewed data set than the equivalent measures of mean and standard deviation. A common way of expressing quartiles is as an interquartile range. The interquartile range describes the difference between the third quartile Q3 and the first quartile Q1 , telling us about the range of the middle half of the scores in the distribution.

Hence, for our students:. However, it should be noted that in journals and other publications you will usually see the interquartile range reported as 45 to 71, rather than the calculated range.



0コメント

  • 1000 / 1000