⚖️ T-Test

Independent / Paired / Welch t-test + effect size

What Is a T-Test?

The t-test is one of the most widely used statistical procedures in science. Developed by William Sealy Gosset in 1908 (published under the pseudonym "Student"), it determines whether the means of two groups are significantly different from each other. The core logic is straightforward: is the observed difference between group means large enough to be unlikely due to random chance alone?

T-tests are appropriate when you are comparing means from continuous, approximately normally distributed data. They form the basis for more complex analyses like ANOVA and are a staple of biomedical research, psychology, education, and quality control.

Three Types of T-Tests

1. Independent Two-Sample T-Test

Use this when comparing means from two separate groups of subjects. For example: treatment group vs. control group, male vs. female, drug A vs. drug B. The key assumption is that observations in one group are completely independent of those in the other.

The classic independent t-test assumes equal variances in both groups. When this assumption is violated (which is common in practice), use the Welch t-test instead, which adjusts the degrees of freedom to account for unequal variances.

2. Paired (Dependent) T-Test

Use this when the same subjects are measured twice — before and after treatment, left eye vs. right eye, or two matched conditions. The paired t-test works on the differences within each pair, which removes inter-subject variability and increases statistical power.

Example: You measure blood glucose levels in 20 patients before and after administering a drug. The paired t-test examines whether the mean change in glucose is significantly different from zero.

3. Welch's T-Test

Welch's t-test is a modification of the independent t-test that does not assume equal variances. It uses the Welch-Satterthwaite equation to estimate degrees of freedom. In modern practice, many statisticians recommend using Welch's test by default, since it performs well even when variances happen to be equal.

Understanding p-Values

The p-value answers a specific question: If the null hypothesis were true (i.e., no real difference exists), how likely would we be to observe a difference at least as extreme as what we found? A small p-value (conventionally p < 0.05) suggests the observed difference is unlikely under the null hypothesis.

However, p-values have important limitations. They do not tell you the probability that the null hypothesis is true. They do not tell you the size or importance of the effect. A very large sample can produce a tiny p-value for a trivially small difference. This is why effect sizes are essential.

Effect Size: Cohen's d

Cohen's d quantifies the magnitude of the difference in standard deviation units. It answers: "How big is the difference, practically speaking?"

Cohen's dInterpretationExample
0.2SmallBarely noticeable in practice
0.5MediumVisible to careful observers
0.8LargeObvious to anyone
> 1.2Very largeDramatic, often clinically meaningful

Always report both the p-value and the effect size. A statistically significant result with a tiny effect size may not be practically meaningful. Conversely, a non-significant result with a large effect size may indicate insufficient sample size rather than absence of an effect.

Assumptions of the T-Test

  1. Continuous data. The dependent variable must be measured on an interval or ratio scale.
  2. Independence. For the independent t-test, observations within and between groups must be independent.
  3. Normality. The data (or differences, for paired tests) should be approximately normally distributed. With n > 30 per group, the Central Limit Theorem makes the t-test robust to moderate non-normality.
  4. Equal variances (for classic independent t-test only). Use Levene's test to check, or simply default to Welch's test.

When the T-Test Is Not Appropriate

If your data is strongly skewed and sample sizes are small, consider the Mann-Whitney U test (for independent samples) or Wilcoxon signed-rank test (for paired samples) as non-parametric alternatives. If you are comparing more than two groups, use ANOVA instead.

Step-by-Step: How to Use This Calculator

  1. Choose your test type: Independent, Paired, or Welch.
  2. Enter or paste your data for each group. Accept numbers separated by commas, spaces, tabs, or newlines.
  3. Click "Calculate" to get the t-statistic, degrees of freedom, p-value, and Cohen's d.
  4. Interpret: Check if p < 0.05, then look at Cohen's d for practical significance.

Frequently Asked Questions

Should I use a one-tailed or two-tailed test?

Use a two-tailed test unless you have a strong, pre-specified directional hypothesis before collecting data. In practice, most journals and reviewers expect two-tailed tests. One-tailed tests cut your p-value in half but are only justified when you can genuinely rule out an effect in the opposite direction.

My sample sizes are unequal. Is that a problem?

Unequal sample sizes are common and acceptable. However, they can make the classic t-test sensitive to variance differences. The Welch t-test handles unequal n's and unequal variances gracefully, which is why many statisticians recommend it as the default.

What if my data is not normally distributed?

For moderate deviations from normality with n > 30, the t-test is reasonably robust due to the Central Limit Theorem. For small samples with clear non-normality, use the Mann-Whitney U test (independent samples) or Wilcoxon signed-rank test (paired samples).

How do I report a t-test result in a paper?

Follow APA style: "An independent-samples t-test revealed a significant difference in blood pressure between the treatment (M = 120.3, SD = 8.2) and control groups (M = 132.1, SD = 9.7), t(38) = 4.12, p < .001, Cohen's d = 1.31." Always include means, SDs, t-statistic, df, p-value, and effect size.

This tool is free forever. If it saved you time, consider buying me a coffee.

☕ Buy me a coffee