Exponential smoothing is a rule of thumb technique for smoothing time series data, particularly for recursively applying as many as 3 Lowpass filters with exponential window functions. Such techniques have broad application that is not intended to be strictly accurate or reliable for every situation. It is an easily learned and easily applied procedure for approximately calculating or recalling some value, or for making some determination based on prior assumptions by the user, such as seasonality. Like any application of repeated lowpass filtering, the observed phenomenon may be an essentially random process, or it may be an orderly, but noisy, process. Whereas in the simple moving average the past observations are weighted equally, exponential window functions assign exponentially decreasing weights over time. The use of three filters is based on empirical evidence and broad application.
Exponential smoothing is commonly applied to smooth data, as many window functions are in signal processing, acting as lowpass filters to remove high frequency noise. This method parrots Poisson's use of recursive exponential window functions in convolutions from the 19th century, as well as Kolmogorov and Zurbenko's use of recursive moving averages from their studies of turbulence in the 1940s. See KolmogorovZurbenko filter for more information.
The raw data sequence is often represented by \{x_t\} beginning at time t = 0, and the output of the exponential smoothing algorithm is commonly written as \{s_t\}, which may be regarded as a best estimate of what the next value of x will be. When the sequence of observations begins at time t = 0, the simplest form of exponential smoothing is given by the formulae:^{[1]}
\begin{align} s_0& = x_0\\ s_{t}& = \alpha x_{t} + (1\alpha)s_{t1},\ t>0 \end{align}
where \alpha is the smoothing factor, and 0 < \alpha < 1.
Contents

Background 1

Window functions 1.1

The simple moving average 1.1.1

The weighted moving average 1.1.2

The exponential moving average 2

Time Constant 2.1

Choosing the initial smoothed value 2.2

Optimization 2.3

Why is it “exponential”? 2.4

Comparison with moving average 2.5

Double exponential smoothing 3

Triple exponential smoothing 4

See also 5

Notes 6

External links 7
Background
Window functions
The simple moving average
Intuitively, the simplest way to smooth a time series is to calculate a simple, or unweighted, moving average. This is known as using a rectangular or "boxcar" window function. The smoothed statistic s_{t} is then just the mean of the last k observations:

s_t = \frac{1}{k} \, \sum_{n=0}^{k1} x_{tn} = \frac{x_t + x_{t1} + x_{t2} + \cdots + x_{tk+1}}{k} = s_{t1} + \frac{x_t  x_{tk}}{k},
where the choice of an integer k > 1 is arbitrary. A small value of k will have less of a smoothing effect and be more responsive to recent changes in the data, while a larger k will have a greater smoothing effect, and produce a more pronounced lag in the smoothed sequence. One disadvantage of this technique is that it cannot be used on the first k −1 terms of the time series without the addition of values created by some other means. This means effectively extrapolating outside the existing data, and the validity of this section would therefore be questionable and not a direct representation of the data.
It also introduces a phase shift into the data of half the window length. For example, if the data were all the same except for one high data point, the peak in the "smoothed" data would appear half a window length later than when it actually occurred. Where the phase of the result is important, this can be simply corrected by shifting the resulting series back by half the window length.
A major drawback with the SMA is that it lets through a significant amount of the signal shorter than the window length. Worse, it actually inverts it. This can lead to unexpected artifacts, such as peaks in the "smoothed" result appearing where there were troughs in the data. It also leads to the result being less "smooth" than expected since some of the higher frequencies are not properly removed.
See moving average for more detail.
The weighted moving average
A slightly more intricate method for smoothing a raw time series {x_{t}} is to calculate a weighted moving average by first choosing a set of weighting factors

\lbrace w_1, w_2,\dots,w_k \rbrace such that \sum_{n=1}^k w_n = 1
and then using these weights to calculate the smoothed statistics {s_{t}}:

s_t = \sum_{n=1}^k w_n x_{t+1n} = w_1x_t + w_2x_{t1} + \cdots + w_kx_{tk+1}.
In practice the weighting factors are often chosen to give more weight to the most recent terms in the time series and less weight to older data. Notice that this technique has the same disadvantage as the simple moving average technique (i.e., it cannot be used until at least k observations have been made), and that it entails a more complicated calculation at each step of the smoothing procedure. In addition to this disadvantage, if the data from each stage of the averaging is not available for analysis, it may be difficult if not impossible to reconstruct a changing signal accurately (because older samples may be given less weight). If the number of stages missed is known however, the weighting of values in the average can be adjusted to give equal weight to all missed samples to avoid this issue.
The exponential moving average
The use of the exponential window function is first attributed to Poisson^{[2]} as an extension of a numerical analysis technique from the 17th century, and later adopted by the signal processing community in the 1940s. Here, exponential smoothing is the application of the exponential, or Poisson, window function. Exponential smoothing was first suggested in the statistical literature without citation to previous work by Robert Goodell Brown in 1956,^{[3]} and then expanded by Charles C. Holt in 1957.^{[4]} The formulation below, which is the one commonly used, is attributed to Brown and is known as “Brown’s simple exponential smoothing”.^{[5]} All the methods of Holt, Winters and Brown may be seen as a simple application of recursive filtering, first found in the 1940s^{[2]} to convert FIR filters to IIR filters.
The simplest form of exponential smoothing is given by the formula:

s_t = \alpha \cdot x_{t} + (1\alpha) \cdot s_{t1}.
where α is the smoothing factor, and 0 < α < 1. In other words, the smoothed statistic s_{t} is a simple weighted average of the current observation x_{t} and the previous smoothed statistic s_{t−1}. The term smoothing factor applied to α here is something of a misnomer, as larger values of α actually reduce the level of smoothing, and in the limiting case with α = 1 the output series is just the same as the original series. Simple exponential smoothing is easily applied, and it produces a smoothed statistic as soon as two observations are available.
Values of α close to one have less of a smoothing effect and give greater weight to recent changes in the data, while values of α closer to zero have a greater smoothing effect and are less responsive to recent changes. There is no formally correct procedure for choosing α. Sometimes the statistician’s judgment is used to choose an appropriate factor. Alternatively, a statistical technique may be used to optimize the value of α. For example, the method of least squares might be used to determine the value of α for which the sum of the quantities (s_{n1} − x_{n1})^{2} is minimized.
Unlike some other smoothing methods, such as the simple moving average, this technique does not require any minimum number of observations to be made before it begins to produce results. In practice, however, a “good average” will not be achieved until several samples have been averaged together; for example, a constant signal will take approximately 3/α stages to reach 95% of the actual value. To accurately reconstruct the original signal without information loss all stages of the exponential moving average must also be available, because older samples decay in weight exponentially. This is in contrast to a simple moving average, in which some samples can be skipped without as much loss of information due to the constant weighting of samples within the average. If a known number of samples will be missed, one can adjust a weighted average for this as well, by giving equal weight to the new sample and all those to be skipped.
This simple form of exponential smoothing is also known as an exponentially weighted moving average (EWMA). Technically it can also be classified as an Autoregressive integrated moving average (ARIMA) (0,1,1) model with no constant term.^{[6]}
Time Constant
The time constant of an exponential moving average is the amount of time for the smoothed response of a unit set function to reach 11/e \approx 63.2\,\% of the original signal. The relationship between this time constant, \tau , and the smoothing factor, \alpha , is given by the formula:

\alpha = 1  e^{\Delta T \over \tau}
Where \Delta T is the sampling time interval of the discrete time implementation. If the sampling time is fast compared to the time constant then

\alpha \approx {\Delta T \over \tau}
Choosing the initial smoothed value
Note that in the above definition s_{1} is being initialized to x_{0}. Because exponential smoothing requires that at each stage we have the previous forecast, it is not obvious how to get the method started. We could assume that the initial forecast is equal to the initial value of demand; however, this approach has a serious drawback. Exponential smoothing puts substantial weight on past observations, so the initial value of demand will have an unreasonably large effect on early forecasts. This problem can be overcome by allowing the process to evolve for a reasonable number of periods (10 or more) and using the average of the demand during those periods as the initial forecast. There are many other ways of setting this initial value, but it is important to note that the smaller the value of α, the more sensitive your forecast will be on the selection of this initial smoother value s_{1}.^{[7]}
Optimization
For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter (α), but for the methods that follow there is usually more than one smoothing parameter.
There are cases where the smoothing parameters may be chosen in a subjective manner — the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.
The unknown parameters and the initial values for any exponential smoothing method can be estimated by minimizing the SSE. The errors are specified as e_{t}=y_{t}\hat{y}_{tt1} for t=1,…,T (the onestepahead withinsample forecast errors). Hence we find the values of the unknown parameters and the initial values that minimize
SSE=\sum_{t=1}^T (y_{t}\hat{y}_{tt1})^2=\sum_{t=1}^T e_{t}^2 ^{[8]}
Unlike the regression case (where we have formulae that return the values of the regression coefficients which minimize the SSE) this involves a nonlinear minimization problem and we need to use an optimization tool to perform this.
Why is it “exponential”?
The name 'exponential smoothing' is attributed to the use of the exponential window function during convolution. It is no longer attributed to Holt, Winters & Brown.
By direct substitution of the defining equation for simple exponential smoothing back into itself we find that

\begin{align} s_t& = \alpha x_{t} + (1\alpha)s_{t1}\\[3pt] & = \alpha x_{t} + \alpha (1\alpha)x_{t1} + (1  \alpha)^2 s_{t2}\\[3pt] & = \alpha \left[x_{t} + (1\alpha)x_{t1} + (1\alpha)^2 x_{t2} + (1\alpha)^3 x_{t3} + \cdots + (1\alpha)^{t1} x_{1} \right] + (1\alpha)^{t} x_0. \end{align}
In other words, as time passes the smoothed statistic s_{t} becomes the weighted average of a greater and greater number of the past observations x_{t−n}, and the weights assigned to previous observations are in general proportional to the terms of the geometric progression {1, (1 − α), (1 − α)^{2}, (1 − α)^{3}, ...}. A geometric progression is the discrete version of an exponential function, so this is where the name for this smoothing method originated according to Statistics lore.
Comparison with moving average
Exponential smoothing and moving average have similar defects of introducing a lag relative to the input data. While this can be corrected by shifting the result by half the window length for a symmetrical kernel, such as a moving average or gaussian, it is unclear how appropriate this would be for exponential smoothing. They also both have roughly the same distribution of forecast error when α = 2/(k+1). They differ in that exponential smoothing takes into account all past data, whereas moving average only takes into account k past data points. Computationally speaking, they also differ in that moving average requires that the past k data points be kept, whereas exponential smoothing only needs the most recent forecast value to be kept.^{[9]}
In the signal processing literature, the use of noncausal (symmetric) filters is commonplace, and the exponential window function is broadly used in this fashion.
Double exponential smoothing
Simple exponential smoothing does not do well when there is a trend in the data, which is inconvenient.^{[1]} In such situations, several methods were devised under the name "double exponential smoothing" or "secondorder exponential smoothing.", which is the recursive application of an exponential filter twice, thus being termed "double exponential smoothing". This nomenclature is similar to quadruple exponential smoothing, which also references its recursion depth.^{[10]} The basic idea behind double exponential smoothing is to introduce a term to take into account the possibility of a series exhibiting some form of trend. This slope component is itself updated via exponential smoothing.
One method, sometimes referred to as "HoltWinters double exponential smoothing"^{[11]} works as follows:^{[12]}
Again, the raw data sequence of observations is represented by {x_{t}}, beginning at time t = 0. We use {s_{t}} to represent the smoothed value for time t, and {b_{t}} is our best estimate of the trend at time t. The output of the algorithm is now written as F_{t+m}, an estimate of the value of x at time t+m, m>0 based on the raw data up to time t. Double exponential smoothing is given by the formulas

\begin{align} s_1& = x_1\\ b_1& = x_1  x_0\\ \end{align}
And for t > 1 by

\begin{align} s_{t}& = \alpha x_{t} + (1\alpha)(s_{t1} + b_{t1})\\ b_{t}& = \beta (s_t  s_{t1}) + (1\beta)b_{t1}\\ \end{align}
where α is the data smoothing factor, 0 < α < 1, and β is the trend smoothing factor, 0 < β < 1.
To forecast beyond x_{t}

\begin{align} F_{t+m}& = s_t + mb_t \end{align}
Setting the initial value b_{0} is a matter of preference. An option other than the one listed above is (x_{}n  x0)/n for some n > 1.
Note that F_{0} is undefined (there is no estimation for time 0), and according to the definition F_{1}=s_{0}+b_{0}, which is well defined, thus further values can be evaluated.
A second method, referred to as either Brown's linear exponential smoothing (LES) or Brown's double exponential smoothing works as follows.^{[13]}

\begin{align} s'_0& = x_0\\ s''_0& = x_0\\ s'_{t}& = \alpha x_{t} + (1\alpha)s'_{t1}\\ s''_{t}& = \alpha s'_{t} + (1\alpha)s''_{t1}\\ F_{t+m}& = a_t + mb_t, \end{align}
where a_{t}, the estimated level at time t and b_{t}, the estimated trend at time t are:

\begin{align} a_t& = 2s'_t  s''_t\\ b_t& = \frac \alpha {1\alpha} (s'_t  s''_t). \end{align}
Triple exponential smoothing
Triple exponential smoothing takes into account seasonal changes as well as trends (all of which are trends). Seasonality is deﬁned to be the tendency of timeseries data to exhibit behavior that repeats itself every L periods, much like any harmonic function. The term season is used to represent the period of time before behavior begins to repeat itself. There are different types of seasonality: 'multiplicative' and 'additive' in nature, much like addition and multiplication are basic operations in mathematics. It is unclear why the statistical literature chooses to adopt special terminology for this application of common filters which predates the use in Statistics by nearly 150 years.
If every month of December we sell 10.000 more apartments that we do in November the seasonality is additive in nature. Can be represented by an 'absolute' increase. However, if we sell 10% more apartments in the summer months than we do in the winter months the seasonality is multiplicative in nature. Multiplicative seasonality can be represented as a constant factor, not an absolute amount. ^{[14]}
Triple exponential smoothing was first suggested by Holt's student, Peter Winters, in 1960 after reading a signal processing book from the 1940s on exponential smoothing.^{[15]} Holt's novel idea was to repeat filtering an odd number of times (ignoring 1) . While recursive filtering had been used previously, it was applied twice and four times to coincide with the Hadamard conjecture, while triple application required more than double the operations of singular convolution.
Suppose we have a sequence of observations {x_{t}}, beginning at time t = 0 with a cycle of seasonal change of length L.
The method calculates a trend line for the data as well as seasonal indices that weight the values in the trend line based on where that time point falls in the cycle of length L.
{s_{t}} represents the smoothed value of the constant part for time t. {b_{t}} represents the sequence of best estimates of the linear trend that are superimposed on the seasonal changes. {c_{t}} is the sequence of seasonal correction factors. c_{t} is the expected proportion of the predicted trend at any time t mod L in the cycle that the observations take on. As a rule of thumb, a minimum of two full seasons (or 2L periods) of historical data is needed to initialize a set of seasonal factors.
The output of the algorithm is again written as F_{t+m}, an estimate of the value of x at time t+m, m>0 based on the raw data up to time t. Triple exponential smoothing is given by the formulas^{[1]}

\begin{align} s_0& = x_0\\ s_{t}& = \alpha \frac{x_{t}}{c_{tL}} + (1\alpha)(s_{t1} + b_{t1})\\ b_{t}& = \beta (s_t  s_{t1}) + (1\beta)b_{t1}\\ c_{t}& = \gamma \frac{x_{t}}{s_{t}}+(1\gamma)c_{tL}\\ F_{t+m}& = (s_t + mb_t)c_{tL+1+(m1)\mod L}, \end{align}
where α is the data smoothing factor, 0 < α < 1, β is the trend smoothing factor, 0 < β < 1, and γ is the seasonal change smoothing factor, 0 < γ < 1.
The general formula for the initial trend estimate b_{0} is:

\begin{align} b_0& = \frac{1}{L} \left(\frac{x_{L+1}x_1}{L} + \frac{x_{L+2}x_2}{L} + \ldots + \frac{x_{L+L}x_L}{L}\right) \end{align}
Setting the initial estimates for the seasonal indices c_{i} for i = 1,2,...,L is a bit more involved. If N is the number of complete cycles present in your data, then:

\begin{align} \\ c_i& = \frac{1}{N} \sum_{j=1}^{N} \frac{x_{L(j1)+i}}{A_j} \quad \forall i& = 1,2,\ldots,L \\ \end{align}
where

\begin{align} A_j& = \frac{\sum_{i=1}^{L} x_{L(j1)+i}}{L} \quad \forall j& = 1,2,\ldots,N \end{align}
Note that A_{j} is the average value of x in the jth cycle of your data.
See also
Notes

^ ^{a} ^{b} ^{c} "NIST/SEMATECH eHandbook of Statistical Methods". NIST. Retrieved 20100523.

^ ^{a} ^{b} Oppenheim, Alan V.; Schafer, Ronald W. (1975). Digital Signal Processing.

^ Brown, Robert G. (1956). Exponential Smoothing for Predicting Demand. Cambridge, Massachusetts: Arthur D. Little Inc. p. 15.

^

^ Brown, Robert Goodell (1963). Smoothing Forecasting and Prediction of Discrete Time Series. Englewood Cliffs, NJ: PrenticeHall.

^ "Averaging and Exponential Smoothing Models". Retrieved 26 July 2010.

^ "Production and Operations Analysis" Nahmias. 2009.

^ https://www.otexts.org/fpp/7/1

^ Nahmias, Steven. Production and Operations Analysis (6th ed.).

^ "Model: SecondOrder Exponential Smoothing".

^ Prajakta S. Kalekar. "Time series Forecasting using HoltWinters Exponential Smoothing" (PDF).

^ "6.4.3.3. Double Exponential Smoothing". itl.nist.gov. Retrieved 25 September 2011.

^ "Averaging and Exponential Smoothing Models". duke.edu. Retrieved 25 September 2011.

^ Kalehar, Prajakta S. "Time series Forecasting using HoltWinters Exponential Smoothing" (PDF). Retrieved 23 June 2014.

^ Winters, P. R. (April 1960). "Forecasting Sales by Exponentially Weighted Moving Averages".
External links
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.