📊 Descriptive Statistics
Mean, median, std dev, box plot
What Is Descriptive Statistics?
Descriptive statistics is the foundation of any quantitative analysis. Before you run a single hypothesis test, before you build a regression model, you need to understand your data. Descriptive statistics gives you that understanding by summarizing raw numbers into meaningful measures of central tendency, spread, and shape.
In practice, descriptive statistics answers the most fundamental questions a researcher can ask: What does a typical observation look like? and How much variation exists? Whether you are analyzing patient blood pressure readings, student exam scores, or enzyme activity measurements, these summaries are your first line of insight.
Measures of Central Tendency
The three pillars of central tendency are the mean, median, and mode. The arithmetic mean is the sum of all values divided by the count — it is sensitive to outliers, which can be both a strength and a weakness. The median is the middle value when data is sorted, making it robust against extreme observations. The mode identifies the most frequently occurring value, which is particularly useful for categorical or discrete data.
A practical rule of thumb: when the mean and median are close together, your data is roughly symmetric. When they diverge significantly, your distribution is likely skewed, and the median may be a better representative of "typical."
Measures of Dispersion
Central tendency alone tells an incomplete story. Two datasets can have identical means but vastly different spreads. The standard deviation (SD) measures the average distance of each data point from the mean. A small SD means data clusters tightly; a large SD means it is spread out.
The variance is simply the square of the standard deviation. While less intuitive (its unit is squared), variance has useful mathematical properties that make it central to ANOVA and regression analysis. The range (max minus min) gives the crudest measure of spread, while the interquartile range (IQR = Q3 − Q1) provides a robust alternative that ignores extreme values.
The coefficient of variation (CV = SD / mean × 100%) lets you compare variability across datasets with different units or magnitudes. An assay with a CV under 10% is generally considered precise in biomedical research.
Quartiles and Percentiles
Quartiles divide your sorted data into four equal parts. Q1 (25th percentile) marks the boundary below which 25% of values fall. Q2 is the median (50th percentile). Q3 (75th percentile) is the upper boundary. Together, Q1 and Q3 define the interquartile range, which captures the central 50% of your data.
How to Read a Box Plot
A box plot (box-and-whisker plot) provides a visual summary of your data's distribution in a single graphic. The box spans from Q1 to Q3, with a line at the median. The whiskers typically extend to 1.5 × IQR beyond the quartiles, and any points beyond the whiskers are plotted individually as potential outliers.
Box plots are especially powerful when comparing distributions across groups. At a glance, you can assess whether groups differ in central tendency, spread, or skewness — all without assuming a specific distribution shape.
Skewness and Kurtosis
Skewness quantifies the asymmetry of your distribution. A positive skew means a long right tail (e.g., income data); negative skew means a long left tail (e.g., age at retirement). Values between −0.5 and +0.5 are generally considered approximately symmetric.
Kurtosis measures the "tailedness" of a distribution. High kurtosis (leptokurtic) indicates heavy tails and a sharp peak, meaning more extreme values than a normal distribution. Low kurtosis (platykurtic) indicates light tails. The normal distribution has a kurtosis of 3 (excess kurtosis of 0).
How to Interpret Results from This Calculator
- Check the mean vs. median. If they are similar, your data is likely symmetric. If the mean is much larger, you have positive skew.
- Examine the standard deviation relative to the mean. A CV above 30% often signals high variability that warrants investigation.
- Look at the box plot. Are there outliers? Is the box symmetric, or is one whisker much longer?
- Consider sample size. With small samples (n < 30), all statistics are less reliable. Report confidence intervals when possible.
Frequently Asked Questions
When should I use the mean vs. the median?
Use the mean when your data is approximately symmetric and free of extreme outliers. Use the median when your data is skewed or contains outliers (e.g., salary data, hospital length of stay). In biomedical research, always report both and let the reader understand the distribution shape.
What is the difference between population and sample standard deviation?
Population SD divides by N (the total count), while sample SD divides by N−1 (Bessel's correction). Since researchers almost always work with samples rather than entire populations, the sample SD (with N−1) is the default in this calculator and in virtually all statistical software.
How many data points do I need for reliable descriptive statistics?
There is no strict minimum, but with fewer than 5 observations, percentiles and box plots become unreliable. For meaningful skewness and kurtosis estimates, aim for at least 20–30 observations. For publication-quality descriptive tables, most journals expect at least n=10 per group.
Can I paste data from Excel or Google Sheets?
Yes. Copy a column of numbers from your spreadsheet and paste directly into the text area above. The calculator accepts numbers separated by commas, spaces, tabs, or newlines.
Is my data stored or sent to a server?
No. All computations run entirely in your browser using JavaScript. Your data never leaves your device. There is no server-side processing, no cookies, and no tracking.
This tool is free forever. If it saved you time, consider buying me a coffee.
☕ Buy me a coffee