R - Mean, Median and Mode

Measures of central tendency are statistical values that help describe the center of a data set. They are used to summarize a data set by identifying the single value that is most typical/representative of the collected data.

The three most common measures of central tendency are:

  1. Mean: The mean is the average of all the values in a data set. It is calculated by adding all the values and dividing by the number of values.
  2. Median: The median is the middle value in a data set when all the values are arranged in increasing or decreasing order. If the number of values is odd, there is a single middle value. If the number of values is even, the median is the average of the two middle values.
  3. Mode: The mode is the most frequent value in a data set.

In R, you can calculate the mean, median, and mode of a data set using the following functions:

  1. mean(): Calculates the mean of a data set.
  2. median(): Calculates the median of a data set.
  3. mode(): Calculates the mode of a data set.

The choice of which measure of central tendency to use depends on the data set and the purpose of the analysis. The mean is the most commonly used measure of central tendency, but it can be sensitive to outliers. The median is not sensitive to outliers, but it can be less representative of the data set if the data is not symmetric. The mode is the least affected by outliers, but it can be misleading if there are multiple modes.

Mean

  1. The mean, also known as the average, is calculated by summing all data points and dividing by the total number of data points.
  2. It represents the "center" of the data and is sensitive to extreme values (outliers).
data <- c(12, 15, 18, 22, 30, 42, 50) # Calculate the mean mean_value <- mean(data) print(mean_value) #Output: 27

Median

  1. The median is the middle value in a dataset when it's ordered from lowest to highest. If there's an even number of data points, the median is the average of the two middle values.
  2. It is less affected by extreme values compared to the mean and provides a better representation of the central value for skewed data.
data <- c(12, 15, 18, 22, 30, 42, 50) # Calculate the median median_value <- median(data) print(median_value) #Output: 22

Mode

The mode is the most frequently occurring value(s) in a dataset. A dataset can have zero modes (no repeated values), one mode (unimodal), or multiple modes (multimodal).

In R, there isn't a built-in function specifically for mode, but you can create a custom function to find it.

data <- c(12, 15, 18, 22, 30, 30, 42, 50, 50) # Custom mode function custom_mode <- function(x) { uniq_x <- unique(x) freq_x <- table(x) mode_values <- uniq_x[freq_x == max(freq_x)] return(mode_values) } # Calculate the mode mode_value <- custom_mode(data) print(mode_value) #Output: 30 50

In the example above, the dataset has two modes: 30 and 50.

Conclusion

These central tendency measures are essential for summarizing and understanding the characteristics of a dataset. Depending on the nature of your data and the specific questions you want to answer, you may choose to use mean, median, or mode (or a combination) to describe the central value of your data.