Descriptive Statistics: Definition, Overview, Types, and Examples

[ad_1]

What Are Descriptive Statistics?

Descriptive statistics are brief informational coefficients that summarize a given dataset, which can be either a representation of the entire population or a sample of a population. Descriptive statistics are broken down into measures of central tendency and measures of variability (spread). Measures of central tendency include the mean, median, and mode, while measures of variability include standard deviation, variance, minimum and maximum variables, kurtosis, and skewness.

Key Takeaways

Descriptive statistics summarize or describe the characteristics of a dataset.
Descriptive statistics consist of three basic categories of measures: measures of central tendency, measures of variability (or spread), and frequency distribution.
Measures of central tendency describe the center of the dataset (mean, median, mode).
Measures of variability describe the dispersion of the dataset (variance, standard deviation).
Measures of frequency distribution describe the occurrence of data within the dataset (count).

Jessica Olah / Investopedia

Understanding Descriptive Statistics

Descriptive statistics help describe and explain the features of a specific dataset by giving short summaries about the sample and measures of the data. The most recognized types of descriptive statistics are measures of center. For example, the mean, median, and mode, which are used at almost all levels of math and statistics, are used to define and describe a dataset.

The mean, or the average, is calculated by adding all the figures within the dataset and then dividing by the number of figures within the set. The mode of a dataset is the value appearing most often, and the median is the value situated in the middle of the dataset. It is the figure separating the higher figures from the lower figures within a dataset. However, there are less common types of descriptive statistics that are still very important.

For example, the sum of the following dataset is 20: (2, 2, 3, 5, 8). The mean is 4 (20/5). The mode is 2. The median is 3.

People use descriptive statistics to repurpose hard-to-understand quantitative insights across a large dataset into bite-sized descriptions.

Important

Descriptive statistics, especially in fields such as medicine, often visually depict data using scatter plots, histograms, line graphs, or stem and leaf displays.

Types of Descriptive Statistics

All descriptive statistics are either measures of central tendency or measures of variability, also known as measures of dispersion.

Central Tendency

Measures of central tendency describe the center of a dataset. It measures what is typical and can be represented by the mean, mode, or median, depending on the skew of the values. Central tendency helps identify when a new value is far from normal or to quickly summarize a large number of values from a specific time period or group.

Measures of Variability

Measures of variability (or measures of spread) aid in analyzing how dispersed the distribution is for a dataset. For example, while the measures of central tendency may give a person the average of a dataset, it does not describe how the data is distributed within the set.

So, while the average of the data might be 65 out of 100, there can still be data points at both 1 and 100. Measures of variability help communicate this by describing the shape and spread of the dataset. Range, quartiles, absolute deviation, and variance are all examples of measures of variability.

Consider the following dataset: 5, 19, 24, 62, 91, 100. The range of that dataset is 95, which is calculated by subtracting the lowest number (5) in the dataset from the highest (100).

Distribution

Distribution (or frequency distribution) refers to the number of times a data point occurs. Alternatively, it can be how many times a data point fails to occur. Consider this dataset: male, male, female, female, female, other. The distribution of this data can be classified as:

The number of males in the dataset is 2.
The number of females in the dataset is 3.
The number of individuals identifying as other is 1.
The number of non-males is 4.

Univariate vs. Bivariate

In descriptive statistics, univariate data analyzes only one variable. It is used to identify characteristics of a single trait and is not used to analyze any relationships or causations.

For example, imagine a room full of high school students. Say you wanted to gather the average age of the individuals in the room. This univariate data is only dependent on one factor: each person’s age. By gathering this one piece of information from each person and dividing by the total number of people, you can determine the average age.

Bivariate data, on the other hand, attempts to link two variables by searching for correlation. Two types of data are collected, and the relationship between the two pieces of information is analyzed together. Because multiple variables are analyzed, this approach may also be referred to as multivariate.

Let’s say each high school student in the example above takes a college assessment test, and we want to see whether older students are testing better than younger students. In addition to gathering the ages of the students, we need to find out each student’s test score. Then, using data analytics, we mathematically or graphically depict whether there is a relationship between student age and test scores.

Fast Fact

Preparing and reporting financial statements are an example of descriptive statistics. Analyzing that financial information to make decisions on the future is inferential statistics.

Descriptive Statistics and Visualizations

One essential aspect of descriptive statistics is graphical representation. Visualizing data distributions effectively can be incredibly powerful, and this is done in several ways.

A histogram is like a bar chart for numbers. It groups data into ranges (called bins) and uses bars to indicate the number of values that fall into each range. The bar representation makes it easy to see the overall shape of the data and quickly ascertain if values are clustered in the middle, spread out, or skewed to one side.

A boxplot (also known as a box-and-whisker plot) provides a quick snapshot of how data is spread out. It displays the median (the middle value), the quartiles (which divide the data into four parts), and any outliers that are far from the rest. Boxplots are especially useful when you want to compare several groups side by side to see how their distributions differ.

Descriptive Statistics and Outliers

Whenever descriptive statistics are being discussed, it’s important to note outliers. Outliers are data points that significantly differ from other observations in a dataset. These could be errors, anomalies, or rare events within the data.

Detecting and managing outliers is a step in descriptive statistics to ensure accurate and reliable data analysis. To identify outliers, you can use graphical techniques (such as boxplots or scatter plots) or statistical methods (such as Z-score or the IQR method). These approaches help pinpoint observations that deviate substantially from the overall pattern of the data.

The presence of outliers can have a notable impact on descriptive statistics, skewing results and affecting the interpretation of data. Outliers can disproportionately influence measures of central tendency, such as the mean, pulling it toward their extreme values. For example, the dataset of (1, 1, 1, 997) is 250, even though that is hardly representative of the dataset. This distortion can lead to misleading conclusions about the typical behavior of the dataset.

Depending on the context, outliers can often be treated by removing them (if they are genuinely erroneous or irrelevant). Alternatively, outliers may hold important information and should be kept for the value they may be able to demonstrate. As you analyze your data, consider the relevance of what outliers can contribute and whether it makes more sense to just strike those data points from your descriptive statistic calculations.

Descriptive Statistics vs. Inferential Statistics

Descriptive statistics have a different function from inferential statistics, which are datasets that are used to make decisions or apply characteristics from one dataset to another.

Imagine another example where a company sells hot sauce. The company gathers data such as the count of sales, average quantity purchased per transaction, and average sales per day of the week. All of this information is descriptive, as it tells a story of what actually happened in the past. In this case, it is not being used beyond being informational.

Now, let’s say that the company wants to roll out a new hot sauce. It gathers the same sales data above, but it uses the information to make predictions about what the sales of the new hot sauce will be. The act of using descriptive statistics and applying characteristics to a different dataset makes the dataset inferential statistics. We are no longer simply summarizing data; we are using it to predict what will happen regarding an entirely different body of data (in this case, the new hot sauce product).

Explain Like I’m 5

A student’s grade-point average (GPA), for example, provides a good understanding of descriptive statistics. The idea of a GPA is that it takes data points from a range of individual course grades and averages them together to provide a general understanding of a student’s overall academic performance. A student’s personal GPA reflects their mean academic performance.

What Do Descriptive Statistics Do?

Descriptive statistics are a means of describing features of a dataset by generating summaries about data samples. For example, a population census may include descriptive statistics regarding the ratio of men and women in a specific city.

What Are Examples of Descriptive Statistics?

In recapping a Major League Baseball season, for example, descriptive statistics might include team batting averages, the number of runs allowed per team, and the average wins per division.

What Is the Main Purpose of Descriptive Statistics?

The main purpose of descriptive statistics is to provide information about a dataset. In the example above, there are dozens of baseball teams, hundreds of players, and thousands of games. Descriptive statistics summarize large amounts of data into useful bits of information.

What Are the Types of Descriptive Statistics?

The three main types of descriptive statistics are frequency distribution, central tendency, and variability of a dataset. Frequency distribution records how often data occurs, central tendency records the data’s center point of distribution, and variability of a dataset records its degree of dispersion.

Can Descriptive Statistics Be Used to Make Inferences or Predictions?

Technically speaking, descriptive statistics only serve to help understand historical data attributes. Inferential statistics, a separate branch of statistics, are used to understand how variables interact with one another in a dataset and possibly predict what might happen in the future.

The Bottom Line

Descriptive statistics refer to the analysis, summary, and communication of findings that describe a dataset. While not often used in decision-making, descriptive statistics do provide a clear, high-level snapshot of the data, such as the mean, median, mode, variance, range, and count of information.

[ad_2]

Source link