In statistics, an Lestimator is an estimator which is an Lstatistic – a linear combination of order statistics of the measurements. This can be as little as a single point, as in the median (of an odd number of values), or as many as all points, as in the mean.
The main benefits of Lestimators are that they are often extremely simple, and often robust statistics: assuming sorted data, they are very easy to calculate and interpret, and are often resistant to outliers. They thus are useful in robust statistics, as descriptive statistics, in statistics education, and when computation is difficult. However, they are inefficient, and in modern robust statistics Mestimators are preferred, though these are much more difficult computationally. In many circumstances Lestimators are reasonably efficient, and thus adequate for initial estimation.
Contents

Examples 1

Robustness 2

Applications 3

Advantages 4

Efficiency 5

See also 6

References 7
Examples
A basic example is the median. Given n values x_1, \ldots, x_n, if n=2k+1 is odd, the median equals x_{(k+1)}, the (n+1)/2th order statistic; if n=2k is even, it is the average of two order statistics: (x_{(k)} + x_{(k+1)})/2. These are both linear combinations of order statistics, and the median is therefore a simple example of an Lestimator.
A more detailed list of examples includes: with a single point, the maximum, the minimum, or any single order statistic or quantile; with one or two points, the median; with two points, the midrange, the range, the midsummary (trimmed midrange, including the midhinge), and the trimmed range (including the interquartile range and interdecile range); with three points, the trimean; with a fixed fraction of the points, the trimmed mean (including interquartile mean) and the Winsorized mean; with all points, the mean.
Note that some of these (such as median, or midrange) are measures of central tendency, and are used as estimators for a location parameter, such as the mean of a normal distribution, while others (such as range or trimmed range) are measures of statistical dispersion, and are used as estimators of a scale parameter, such as the standard deviation of a normal distribution.
Lestimators can also measure the shape of a distribution, beyond location and scale. For example, the midhinge minus the median is a 3term Lestimator that measures the skewness, and other differences of midsummaries give measures of asymmetry at different points in the tail.
Sample Lmoments are Lestimators for the population Lmoment, and have rather complex expressions. Lmoments are generally treated separately; see that article for details.
Robustness
Lestimators are often statistically resistant, having a high breakdown point. This is defined as the fraction of the measurements which can be arbitrarily changed without causing the resulting estimate to tend to infinity (i.e., to "break down"). The breakdown point of an Lestimator is given by the closest order statistic to the minimum or maximum: for instance, the median has a breakdown point of 50% (the highest possible), and a n% trimmed or Winsorized mean has a breakdown point of n%.
Not all Lestimators are robust; if it includes the minimum or maximum, then it has a breakdown point of 0. These nonrobust Lestimators include the minimum, maximum, mean, and midrange. The trimmed equivalents are robust, however.
Robust Lestimators used to measure dispersion, such as the IQR, provide robust measures of scale.
Applications
In practical use in robust statistics, Lestimators have been replaced by Mestimators, which provide robust statistics that also have high relative efficiency, at the cost of being much more computationally complex and opaque.
However, the simplicity of Lestimators means that they are easily interpreted and visualized, and makes them suited for descriptive statistics and statistics education; many can even be computed mentally from a fivenumber summary or sevennumber summary, or visualized from a box plot. Lestimators play a fundamental role in many approaches to nonparametric statistics.
Though nonparametric, Lestimators are frequently used for parameter estimation, as indicated by the name, though they must often be adjusted to yield an unbiased consistent estimator. The choice of Lestimator and adjustment depend on the distribution whose parameter is being estimated.
For example, when estimating a location parameter, for a symmetric distribution a symmetric Lestimator (such as the median or midhinge) will be unbiased. However, if the distribution has skew, symmetric Lestimators will generally be biased and require adjustment. For example, in a skewed distribution, the nonparametric skew (and Pearson's skewness coefficients) measure the bias of the median as an estimator of the mean.
When estimating a scale parameter, such as when using an Lestimator as a robust measures of scale, such as to estimate the population variance or population standard deviation, one generally must multiply by a scale factor to make it an unbiased consistent estimator; see scale parameter: estimation.
For example, dividing the IQR by 2\sqrt{2} \operatorname{erf}^{1}(1/2) \approx 1.349 (using the error function) makes it an unbiased, consistent estimator for the population variance if the data follow a normal distribution.
Lestimators can also be used as statistics in their own right – for example, the median is a measure of location, and the IQR is a measure of dispersion. In these cases, the sample statistics can act as estimators of their own expected value; for example, the sample median is an estimator of the population median.
Advantages
Beyond simplicity, Lestimators are also frequently easy to calculate and robust.
Assuming sorted data, Lestimators involving only a few points can be calculated with far fewer mathematical operations than efficient estimates. Before the advent of electronic calculators and computers, these provided a useful way to extract much of the information from a sample with minimal labour. These remained in practical use through the early and mid 20th century, when automated sorting of punch card data was possible, but computation remained difficult, and is still of use today, for estimates given a list of numerical values in nonmachinereadable form, where data input is more costly than manual sorting. They also allow rapid estimation.
Lestimators are often much more robust than maximally efficient conventional methods – the median is maximally statistically resistant, having a 50% breakdown point, and the X% trimmed midrange has an X% breakdown point, while the sample mean (which is maximally efficient) is minimally robust, breaking down for a single outlier.
Efficiency
While Lestimators are not as efficient as other statistics, they often have reasonably high relative efficiency, and show that a large fraction of the information used in estimation can be obtained using only a few points – as few as one, two, or three. Alternatively, they show that order statistics contain a significant amount of information.
For example, in terms of efficiency, given a sample of a normallydistributed numerical parameter, the arithmetic mean (average) for the population can be estimated with maximum efficiency by computing the sample mean – adding all the members of the sample and dividing by the number of members.
However, for a large data set (over 100 points) from a symmetric population, the mean can be estimated reasonably efficiently relative to the best estimate by Lestimators. Using a single point, this is done by taking the [3] For three points, the trimean (average of median and midhinge) can be used, though the average of the 20th, 50th, and 80th percentile yields 88% efficiency. Using further points yield higher efficiency, though it is notable that only 3 points are needed for very high efficiency.
For estimating the standard deviation of a normal distribution, the scaled [3]
For small samples, Lestimators are also relatively efficient: the midsummary of the 3rd point from each end has an efficiency around 84% for samples of size about 10, and the range divided by \sqrt{n} has reasonably good efficiency for sizes up to 20, though this drops with increasing n and the scale factor can be improved (efficiency 85% for 10 points). Other heuristic estimators for small samples include the range over n (for standard error), and the range squared over the median (for the chisquared of a Poisson distribution).
See also
References
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.