### Sample mean

It has been suggested that this article be split into multiple articles. (Discuss) Proposed since February 2013. |

**
**

The **sample mean** or **empirical mean** and the **sample covariance** are statistics computed from a collection of data on one or more random variables. The sample mean is a vector each of whose elements is the sample mean of one of the random variables – that is, each of whose elements is the arithmetic average of the observed values of one of the variables. The sample covariance matrix is a square matrix whose *i, j* element is the sample covariance (an estimate of the population covariance) between the sets of observed values of two of the variables and whose *i, i* element is the sample variance of the observed values of one of the variables. If only one variable has had values observed, then the sample mean is a single number (the arithmetic average of the observed values of that variable) and the sample covariance matrix is also simply a single value (the sample variance of the observed values of that variable).

## Contents

## Sample mean

Let $x\_\{ij\}$ be the *i*^{th} independently drawn observation (*i=1,...,N*) on the *j*^{th} random variable (*j=1,...,K*). These observations can be arranged into *N*
column vectors, each with *K* entries, with the *K* ×1 column vector giving the *i*^{th} observations of all variables being denoted $\backslash mathbf\{x\}\_i$ (*i=1,...,N*).

The **sample mean vector** $\backslash mathbf\{\backslash bar\{x\}\}$ is a column vector whose *j*^{th} element $\backslash bar\{x\}\_\{j\}$ is the average value of the *N* observations of the *j*^{th} variable:

- $\backslash bar\{x\}\_\{j\}=\backslash frac\{1\}\{N\}\backslash sum\_\{i=1\}^\{N\}x\_\{ij\},\backslash quad\; j=1,\backslash ldots,K.$

Thus, the sample mean vector contains the average of the observations for each variable, and is written

- $\backslash mathbf\{\backslash bar\{x\}\}=\backslash frac\{1\}\{N\}\backslash sum\_\{i=1\}^\{N\}\backslash mathbf\{x\}\_i.$

## Sample covariance

It has been suggested that portions of this section be moved into Estimation of covariance matrices. (Discuss) |

The **sample covariance matrix** is a *K*-by-*K* matrix $\backslash textstyle\; \backslash mathbf\{Q\}=\backslash left[\; q\_\{jk\}\backslash right]$ with entries

- $q\_\{jk\}=\backslash frac\{1\}\{N-1\}\backslash sum\_\{i=1\}^\{N\}\backslash left(\; x\_\{ij\}-\backslash bar\{x\}\_j\; \backslash right)\; \backslash left(\; x\_\{ik\}-\backslash bar\{x\}\_k\; \backslash right),$

where $q\_\{jk\}$ is an estimate of the covariance between the j^{th}
variable and the k^{th} variable of the population underlying the data.
In terms of the observation vectors, the sample covariance is

- $\backslash mathbf\{Q\}\; =\; \{1\; \backslash over\; \{N-1\}\}\backslash sum\_\{i=1\}^N\; (\backslash mathbf\{x\}\_i-\backslash mathbf\{\backslash bar\{x\}\})\; (\backslash mathbf\{x\}\_i-\backslash mathbf\{\backslash bar\{x\}\})^\backslash mathrm\{T\},$

Alternatively, arranging the observation vectors as the columns of a matrix, so that

- $\backslash mathbf\{F\}\; =\; \backslash begin\{bmatrix\}\backslash mathbf\{x\}\_1\; \&\; \backslash mathbf\{x\}\_2\; \&\; \backslash dots\; \&\; \backslash mathbf\{x\}\_N\; \backslash end\{bmatrix\}$,

which is a matrix of *K* rows and *N* columns.
Here, the sample covariance matrix can be computed as

- $\backslash mathbf\{Q\}\; =\; \backslash frac\{1\}\{N-1\}(\; \backslash mathbf\{F\}\; -\; \backslash mathbf\{\backslash bar\{x\}\}\; \backslash ,\backslash mathbf\{1\}\_N^\backslash mathrm\{T\}\; )\; (\; \backslash mathbf\{F\}\; -\; \backslash mathbf\{\backslash bar\{x\}\}\; \backslash ,\backslash mathbf\{1\}\_N^\backslash mathrm\{T\}\; )^\backslash mathrm\{T\}$,

where $\backslash mathbf\{1\}\_N$ is an *N* by 1 vector of ones.
If the observations are arranged as rows instead of columns, so $\backslash mathbf\{\backslash bar\{x\}\}$ is now a 1×*K* row vector and $\backslash mathbf\{M\}=\backslash mathbf\{F\}^\backslash mathrm\{T\}$ is an *N*×*K* matrix whose column *j* is the vector of *N* observations on variable *j*, then applying transposes
in the appropriate places yields

- $\backslash mathbf\{Q\}\; =\; \backslash frac\{1\}\{N-1\}(\; \backslash mathbf\{M\}\; -\; \backslash mathbf\{1\}\_N\; \backslash mathbf\{\backslash bar\{x\}\}\; )^\backslash mathrm\{T\}\; (\; \backslash mathbf\{M\}\; -\; \backslash mathbf\{1\}\_N\; \backslash mathbf\{\backslash bar\{x\}\}\; ).$

## Discussion

The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector $\backslash textstyle\; \backslash mathbf\{X\}$, a row vector whose *j*^{th} element (*j = 1, ..., K*) is one of the random variables.^{[1]} The sample covariance matrix has $\backslash textstyle\; N-1$ in the denominator rather than $\backslash textstyle\; N$ due to a variant of Bessel's correction: In short, the sample covariance relies on the difference between each observation and the sample mean, but the sample mean is slightly correlated with each observation since it's defined in terms of all observations. If the population mean $\backslash operatorname\{E\}(\backslash mathbf\{X\})$ is known, the analogous unbiased estimate

- $q\_\{jk\}=\backslash frac\{1\}\{N\}\backslash sum\_\{i=1\}^N\; \backslash left(\; x\_\{ij\}-\backslash operatorname\{E\}(X\_j)\backslash right)\; \backslash left(\; x\_\{ik\}-\backslash operatorname\{E\}(X\_k)\backslash right),$

using the population mean, has $\backslash textstyle\; N$ in the denominator. This is an example of why in probability and statistics it is essential to distinguish between random variables (upper case letters) and realizations of the random variables (lower case letters).

The maximum likelihood estimate of the covariance

- $q\_\{jk\}=\backslash frac\{1\}\{N\}\backslash sum\_\{i=1\}^N\; \backslash left(\; x\_\{ij\}-\backslash bar\{x\}\_j\; \backslash right)\; \backslash left(\; x\_\{ik\}-\backslash bar\{x\}\_k\; \backslash right)$

for the Gaussian distribution case has *N* in the denominator as well. The ratio of 1/*N* to 1/(*N* − 1) approaches 1 for large *N*, so the maximum likelihood estimate approximately equals the unbiased estimate when the sample is large.

## Variance of the sample mean

For each random variable, the sample mean is a good estimator of the population mean, where a "good" estimator is defined as being efficient and unbiased. Of course the estimator will likely not be the true value of the population mean since different samples drawn from the same distribution will give different sample means and hence different estimates of the true mean. Thus the sample mean is a random variable, not a constant, and consequently has its own distribution. For a random sample of *N* observations on the *j*^{th} random variable, the sample mean's distribution itself has mean equal to the population mean $E(X\_j)$ and variance equal to $\backslash frac\{\backslash sigma^2\_j\}\{N\},$ where $\backslash sigma^2\_j$ is the variance of the random variable *X*_{j}.

## Weighted samples

It has been suggested that portions of this section be moved into Weighted mean. (Discuss) |

In a weighted sample, each vector $\backslash textstyle\; \backslash textbf\{x\}\_\{i\}$ (each set of single observations on each of the *K* random variables) is assigned a weight $\backslash textstyle\; w\_i\; \backslash geq0$. Without loss of generality, assume that the weights are normalized:

- $\backslash sum\_\{i=1\}^\{N\}w\_i\; =\; 1.$

(If they are not, divide the weights by their sum). Then the weighted mean vector $\backslash textstyle\; \backslash mathbf\{\backslash bar\{x\}\}$ is given by

- $\backslash mathbf\{\backslash bar\{x\}\}=\backslash sum\_\{i=1\}^N\; w\_i\; \backslash mathbf\{x\}\_i.$

and the elements $q\_\{jk\}$ of the weighted covariance matrix $\backslash textstyle\; \backslash mathbf\{Q\}$ are
^{[2]}

- $q\_\{jk\}=\backslash frac\{\backslash sum\_\{i=1\}^\{N\}w\_i\}\{\backslash left(\backslash sum\_\{i=1\}^\{N\}w\_i\backslash right)^2-\backslash sum\_\{i=1\}^\{N\}w\_i^2\}$

\sum_{i=1}^N w_i \left( x_{ij}-\bar{x}_j \right) \left( x_{ik}-\bar{x}_k \right) .

If all weights are the same, $\backslash textstyle\; w\_\{i\}=1/N$, the weighted mean and covariance reduce to the sample mean and covariance above.

## Criticism

The sample mean and sample covariance are widely used in statistics and applications, and are extremely common measures of location and dispersion, respectively, likely the most common: they are easily calculated and possess desirable characteristics.

However, they suffer from certain drawbacks; notably, they are not robust statistics, meaning that they are sensitive to outliers. As robustness is often a desired trait, particularly in real-world applications, robust alternatives may prove desirable, notably quantile-based statistics such the sample median for location,^{[3]} and interquartile range (IQR) for dispersion. Other alternatives include trimming and Winsorising, as in the trimmed mean and the Winsorized mean.