So what is different about inferential statistics? Instead of simply attempting to summarize and organize a distribution of data, inferential statistics are used to extend results generated from a subgroup of observations to a larger context - in other words, the purpose of inferential statistics is to generalize from samples to populations. This section will describe the fundamental concepts of the process of using inferential statistics.

A research hypothesis is the focal point of statistical inference. A research hypothesis makes a prediction, based on a theoretical foundation, that identifies an anticipated outcome - for example, higher reading scores, due to a new instructional technique. The research hypothesis is generated from a, usually broader, research question. An example of a research question might ask whether there is a difference between a new way of teaching and a traditional method. Research questions, in turn, are narrower investigations of a more general research purpose, which seeks to address an important problem in the field. So, you can think of a research hypothesis as the end of a series of narrower, more focused, statements about a research topic.

A research hypothesis aids the research study by providing a very specific and precise prediction to test. The test of the hypothesis involves sampling and then collecting data. The nature of the sample affects the ability of the researcher to generalize the results. A full introduction to sampling techniques is beyond the scope of this introduction, but the important aspect of sampling for statistical purposes involves the concept of representation. A quantitative researcher intends to apply her results to a larger group than the one she studied. The larger group is called the population. In order to generalize, the group involved in the study, called the sample, must resemble the population on variables that are important to the research. The simplest, but usually costliest, way to sample is randomly - the equivalent of pulling names out of a hat. This process is usually costly in terms of time, money, and logistics; so, a compromise is made for practical reasons. The compromise, and the act of sampling itself, introduce errors in the process; in other words, the smaller sample may differ from the population in some important way. A researcher cannot know whether this situation has actually happened though. Instead, the researcher will use statistics to describe the sample's characteristics and then infer from those sample characteristics what the properties of the larger population are. For example, testing a new reading method on a properly selected subset of all third-grade students allows a researcher to make statements about the value of the reading program for all third-grade students. Numbers pertaining to the sample are called statistics, and numbers pertaining to the population are called parameters. Statisticians use different symbols to distinguish the two set of numbers. Latin symbols (e.g., X, s, r) are used to represent sample statistics, and Greek symbols (e.g., μ, σ, ρ) [named mu, sigma, and rho] are used to represent population parameters. Here are several relationships to remember:

- Symbols for the mean: X (sample mean) is used to estimate μ (population mean)
- Symbols for the standard deviation: s (sample standard deviation) is used to estimate σ (population standard deviation)
- Symbols for the Pearson correlation coefficient: r (sample correlation) is used to estimate ρ (population correlation)

Here are some examples of null hypotheses:

Comparison of treatment (t) and control (c) group means

H

Comparison of pre-test (pre) and post-test (post) means

H

Test of linear relationship between age (a) and experience (e)

H

Notice that null hypotheses assume equality among groups, between variables, or over time. Researchers then collect data to see if there is evidence to the contrary. In this way, the null hypothesis provides a benchmark for assessing whether observed differences are due to chance or some other factor/variable (i.e., systematic differences).

Here are a few examples of research hypotheses, also called alternative hypotheses. These are directly testable because they refer to the sample statistics using Latin symbols, and not the population parameters. Notice that these research hypotheses contradict the equality represented by the null hypotheses. Research hypotheses can be directional (i.e., predicting that one statistic is greater than or less than the other) or they can be non-directional (i.e., predicting that the two statistics are different but not specifying how they differ).

Directional hypothesis that treatment (t) group mean is greater than control (c) group mean

H

Non-directional hypothesis that pre-test (pre) and post-test (post) means are different (i.e., not equal)

H

Directional hypothesis that a direct (i.e., positive) linear relationship between age (a) and experience (e) exists

H

Well written hypotheses should have the following characteristics. They should:

- Be declarative statements making specific predictions.
- Identify a specific expected relationship.
- Have a firm theory or literature base.
- Be concise and to the point.
- Be testable - allowing for the collection of data measuring variables in a systematic, unambiguous way.

A statistically significant result is one that is likely to be due to a systematic (i.e., identifiable) difference or relationship, not one that is likely to occur due to chance. No matter how carefully designed the research project is, there is always the possibility that the result is due to something other than the hypothesized factor. The need to control all possible alternative explanations of the observed phenomenon cannot be emphasized enough. Alternative explanations can stem from an unrepresentative sample, some other type of validity threat, or an unknown, confounding factor. The ideal situation is one in which all other possible explanations are ruled out so that the only viable explanation is the research hypothesis.

The level that demarks statistical significance (called alpha and designated with the Greek letter, α) is completely under the control of the researcher. Norms for different fields exist. For example, α=.05 is generally used in educational research. But, what does α=.05 actually mean? The level of statistical significance is the level of risk that the researcher is willing to accept that the decision to reject the null hypothesis may be wrong by mis-attributing a difference to the hypothesized factor, when no difference actually exists. In other words, the level of statistical significance is the level of risk associated with rejecting a true null hypothesis. Selecting α=.05 indicates that the researcher is willing to risk being wrong in the decision to reject the null hypothesis 5 times out of 100, or 1 time out of 20. Referring back to the normal curve, α=.05 divides the area under the curve into two sections - one section where the null hypothesis is retained and another section where the null hypothesis is rejected. Rejecting a true null hypothesis is called committing a Type I error.

Another type of error that can be made is retaining a false null hypothesis. This is called Type II error. There is also a probability level associated with this type of error, called beta and designated with the Greek letter, β. Associated with β is a probability known as the power of the test, which equals 1 - β. Like the chance of committing a Type I error, the chance of committing a Type II error is also under the control of the researcher. Unlike the Type I error level, which is set directly by the researcher, the Type II error level is determined by a combination of parameters, including the α level, sample size, and anticipated size of the results. Visit this site (http://wise.cgu.edu/power/power_applet.html) or this site (http://www.intuitor.com/statistics/T1T2Errors.html) to explore how these elements are related.

Here is a table to help you understand Type I and Type II errors. See page 149 for another version of the same table. The decision you will make as a researcher is whether to reject or retain the null hypothesis based on the evidence that you've collected from the sample. This decision is similar, in theory, to the decision a juror makes about the guilt or innocence of a person on trial based on the evidence presented in the case.

Decision (action) | Null is true (not guilty) | Null is false (guilty) |

Reject the null hypothesis (convict) | Type
I error (convict the innocent) level of statistical significance, α |
Correct decision (convict the
guilty) power of the test, 1 - β |

Retain the null hypothesis (acquit) | Correct decision (acquit the innocent) | Type II error (acquit the guilty) chance of Type II error, β |

Remember that the null hypothesis represents the true state of nature (e.g., the characteristics of the population), which cannot be discovered directly. The decision, or action, is the choice made by the researcher (or the juror) based on the collected evidence. If an error was made, you will know which it was because either the null hypothesis is rejected or retained. The catch is that you can never know without a doubt whether an error, or a correct decision, was made.

Which error is more serious? Does the seriousness of the error depend on the consequences of the decision/action taken? How does this relate to conducting research in an educational setting?

- State the null and research hypotheses.
- Establish the level of statistical significance (alpha level, level of risk for committing a Type I error).
- Select the appropriate test statistic (see the flowchart inside the back cover or visit here for a similar, computer-based form of the flowchart: https://usffiles.usfca.edu/FacStaff/baab/www/lessons/DecisionTree.html).
- Check the test's assumptions and the compute the test statistic based on the sample data (obtained value).
- Determine the critical value for the test statistic.
- Compare the obtained value with the critical value.
- Either reject or retain the null hypothesis based on the following.
- If obtained value > critical value, then reject the null hypothesis - evidence supports the research hypothesis.
- If obtained value <= critical value, then retain the null hypothesis - evidence does not support the research hypothesis.

It often helps to see a process unfold through an animation. To view what happens during the sampling process, visit this site (http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html), read the instructions, and experiment with drawing different sized samples from different types of population distributions.

Effect Size

Determining whether to reject or retain a null hypothesis is only part of the goal. If your hypothesis test suggested that you should reject the null hypothesis, then this means that you found a statistically significant result. But, is that result really important? This quesiton is answered by calculating another statistic, called Cohen's d, which is a measure of effect size. See Module 32 for examples of different types of effect size calculations. Based on Jacob Cohen's work, the following strengths of effect sizes have been determined for educational research:

small effect | .00 to .20 |

medium effect | .20 to .50 |

large effect | .50 and higher |

One of the types of effects sizes is Cohen's d, which is calculated using the following formula:

It doesn't really matter which mean is Mean

To calculate the effect size, you can also use the effect size calculator at http://web.uccs.edu/lbecker/Psy590/escalc3.htm. If you have the following two means and standard deviations, the online effect size calculator gives the following result, d = .53, which would generally be considered a large effect.