The issues involved with figuring out how big a random sample needs to be in order to provide sharp estimates of a mean and also to identify important differences between control groups and treatment groups are very difficult ones. It is hard even for researchers to get it right. It is hard to talk about, and it is hard for researchers to communicate with the people funding the research to ask for sample sizes big enough to do the job that needs to be done. Samples cost money, often big money, and every extra person or item to be included in a sample has costs in time and money. You don't want a sample too big, or you waste money on more precise information than you need, and you don't want it too small, or you waste all the money you spent.
The application shows three probability functions plotted against sample size. Two of the probability functions, Success and Tolerance, increase with sample size. The other, Power, is the probability of making a "Type II" error. This decreases with sample size. So, Tolerance and Success are more likely the larger the sample, and error is less likely.
You will note the probabilities change dramatically with the assumptions made about the population to be sampled. So, what we are trying to do is, take a set of "N" measurements, like the diameters of N = 10 hole covers, and try to estimate the mean size in the population of manhole covers in our pile of manhole covers. By the way, you have to decide what units you measure things in. I don't care if it is yards, or pounds, meters, or nanoseconds. Just so the units are consistent.
If the manhole covers are supposed to be 1.5 feet or 18 inches, probably we want to estimate that mean so that we don't have a lot that drop under the street when we try to place them, landing on Norton down there, sort of defeats the purpose of making them round, and we don't want them so big, either, that they don't stick up on one edge and dislocate your wheel alignment. This would be a quality control application, but you could figure that the same theory is going to apply to a lot of different estimation problems.
Now, we could just go measure one manhole cover, right off the top of the pile, and if that were 18.05 inches, some people would just say OK, that's enough work. For the purposes of the software, and for good practice, we are going to set a minimal sample size of five. If you can get by with less than that, you probably don't need software anyway. By the way, the upper limit on sample size is about 500.
Now, let's say that we would want 99% of manhole covers to be within a half inch of 18 inches, either way, but we sure don't want to go measure all of them. How do we decide how many manhole covers we need to measure? (We can use JMSL to randomize the selection, too, so that the sample is representative, but that is a different issue.)
We set the Delta / Confidence Interval Width slider to be about 1.0, for the one inch difference, to take in the range from 17.5 to 18.5, That's our specification. And let's start off with the assumption that the population standard deviation (SD) is 0.33, a third of an inch. Since Delta/CI Width defaults to 1.00, move the slider down to 0.33. Click to the left or right of the thumb slider to make fine adjustments.
The Standard Deviation, which is your guess based on the best available information of what the population standard deviation is. Let's suppose the foreman of the casting shop has told us the covers are milled so that the SD is 3/8ths of an inch. Move the Standard Deviation slider down to 0.38.
Assume that the measurements of the manhole cover diameters are normally distributed, too. Now, let's use the default standard confidence level of 95%, which means if we follow these procedures every time, we'll be where we want to be in 95 out of 100 sets of measurements. The problem is always that we don't ever know which of those 95 are the real McCoy. Notice that this also sets the alpha level to 0.05, because confidence is always a percent.
We are now set to examine charts to investigate how big a sample we need under these various assumptions. Now, move the cursor over the tolerance curve (blue). Watch the reported Probability in the text area below the chart. Follow the Tolerance curve until the Probability is about 0.95. You should see "Sample Size = 33" on the bottom dialog window. This says that to have 95% of our population be within the half inch either side of our mean estimate, we need to measure 33 manhole covers, if we want to be right 95% of the time.
Suppose we relax our experiment and say we can get by with a Confidence level of just 90%. Adjust the top slider so that alpha = 0.10 and Confidence = 0.90. Notice first that the Success Probability (red) shifts to the left. This is because the probability of success in our efforts to measure the mean shifts to the left. That is, we decreased our expectations of success to 90%, so it takes smaller sample sizes to meet the same probability of success. This is merely interesting and informative additional information that lets us evaluate the process. The Tolerance curve (blue) also moves to the left. Following the Tolerance curve as before, we find that only 24 Samples are required for to be correct 95% of the time.
But what about our assumption on the standard deviation? Suppose the machinists are kidding us and the real SD is more like 3/4 of an inch rather than 3/8ths? Set the SD slider on 0.75, move the alpha slider back to 0.05 (confidence interval of 95%). Find the 0.95 Tolerance probability, and you should see a sample size of about 102. So, if the standard deviation is twice as large, we need a sample size almost three times larger (102 vs. 33) to obtain a 0.95 tolerance limit. This makes sense because if the sizes of the covers has more variation, we need to measure more samples to estimate the mean to a given level of accuracy. And that also means that there are going to be more covers that are either too big or too small.
The green graph line on the chart is the Power or Operating Characteristic Curve. This is the classical way of designing an experiment, where we are testing an "alternative" hypothesis against the proverbial "null" hypothesis. Here, we are designing an experiment to assure us at a given Alpha level we will have a given probability of making a "Type II" error, which is deciding that we have not found an important effect when there really is one. In this case, the "Delta" defines how big the effect is that we would not want to miss, and the Power curve tells us how big the sample size must be to control the probability of this sort of error (which is the "Beta" that goes with "Alpha" in classical experimental design.)
For example, using the settings from the last exercise, put the cursor on the green power line at a Probability of 0.10. You need a sample size of 56 to achieve this level of power. Using this application is certainly a lot more fun than looking up numbers in a big heavy book.
Back to the main page.