Makridakis Competitions
The Makridakis Competitions (also known as the M Competitions or M-Competitions) is a term used for a series of competitions organized by teams led by forecasting researcher Spyros Makridakis (Official Site) and intended to evaluate and compare the accuracy of different forecasting methods.[1][2][3][4]
The competitions
Summary
No. | Informal name for competition | Year of publication of results | Number of time series used | Number of methods tested | Other features |
---|---|---|---|---|---|
1 | M Competition or M-Competition[1][5] | 1982 | 1001 (used a subsample of 111 for the methods where it was too difficult to run all 1001) | 15 (plus 9 variations) | Not real-time |
2 | M-2 Competition or M2-Competition[1][6] | 1993 | 29 (23 from collaborating companies, 6 from macroeconomic indicators) | 16 (including 5 human forecasters and 11 automatic trend-based methods) plus 2 combined forecasts and 1 overall average | Real-time, many collaborating organizations, competition announced in advance |
3 | M-3 Competition or M3-Competition[1] | 2000 | 3003 | 24 |
First competition in 1982
The first Makridakis Competition, held in 1982, and known in the forecasting literature as the M-Competition, used 1001 time series and 15 forecasting methods (with another nine variations of those methods included).[1][5] According to a later paper by the authors, the following were the main conclusions of the M-Competition:[1]
- Statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones.
- The relative ranking of the performance of the various methods varies according to the accuracy measure being used.
- The accuracy when various methods are combined outperforms, on average, the individual methods being combined and does very well in comparison to other methods.
- The accuracy of the various methods depends on the length of the forecasting horizon involved.
The findings of the study have been verified and replicated through the use of new methods by other researchers.[7][8][9]
Newbold (1883) was critical of the M-competition, and argued against the general idea of using a single competition to attempt to settle the complex issue.[10]
Before the first competition, the Makridakis - Hibon Study
Before the first M-Competition, Makridakis and Hibon[11] published in the Journal of the Royal Statistical Society (JRSS) an article showing that simple methods perform well in comparison to the more complex and statistically sophisticated ones. Statisticians at that time criticized the results claiming that they were not possible. Their criticism motivated the subsequent M, M2 and M3 Competitions that prove beyond the slightest doubt those of the Makridakis and Hibon Study.
Second competition, published in 1993
The second competition, called the M-2 Competition or M2-Competition, was conducted on a grander scale. A call to participate was published in the International Journal of Forecasting, announcements were made in the International Symposium of Forecasting, and a written invitation was sent to all known experts on the various time series methods. The M2-Competition was organized in collaboration with four companies and included six macroeconomic series, and was conducted on a real-time basis. Data was from the United States.[1] The results of the competition were published in a 1993 paper.[6] The results were claimed to be statistically identical to those of the M-Competition.[1]
The M2-Competition used much fewer time series than the original M-competition. Whereas the original M-competition had used 1001 time series, the M2-Competition used only 29, including 23 from the four collaborating companies and 6 macroeconomic series.[6] Data from the companies was obfuscated through the use of a constant multiplier in order to preserve proprietary privacy.[6] The purpose of the M2-Competition was to simulate real-world forecasting better in the following respects:[6]
- Allow forecasters to combine their trend-based forecasting method with personal judgment.
- Allow forecasters to ask additional questions requesting data from the companies involved in order to make better forecasts.
- Allow forecasters to learn from one forecasting exercise and revise their forecasts for the next forecasting exercise based on the feedback.
The Competition was organized as follows:[6]
- The first batch of data was sent to participating forecasters in the summer of 1987.
- Forecasters had the option of contacting the companies involved via an intermediary in order to gather additional information they considered relevant to making forecasts.
- In October 1987, forecasters were sent updated data.
- Forecasters were required to send in their forecasts by the end of November 1987.
- A year later, forecasters were sent an analysis of their forecasts and asked to submit their next forecast in November 1988.
- The final analysis and evaluation of the forecasts was done starting April 1991 when the actual, final values of the data including December 1990 were known to the collaborating companies.
In addition to the published results, many of the participants wrote short articles describing their experience participating in the competition and their reflections on what the competition demonstrated. Chris Chatfield praised the design of the competition, but said that despite the organizers' best efforts, he felt that forecasters still did not have enough access to the companies from the inside as he felt people would have in real-world forecasting.[12] Fildes and Makridakis (1995) argue that despite the evidence produced by these competitions, the implications continued to be ignored by theoretical statisticians.[13]
Third competition, published in 2000
The third competition, called the M-3 Competition or M3-Competition, was intended to both replicate and extend the features of the M-competition and M2-Competition, through the inclusion of more methods and researchers (particularly researchers in the area of neural networks) and more time series.[1] A total of 3003 time series was used. The paper documenting the results of the competition was published in the International Journal of Forecasting[1] in 2000 and the raw data was also made available on the International Institute of Forecasters website.[4] According to the authors, the conclusions from the M3-Competition were similar to those from the earlier competitions.[1]
The time series included yearly, quarterly, monthly, daily, and other time series. In order to ensure that enough data was available to develop an accurate forecasting model, minimum thresholds were set for the number of observations: 14 for yearly series, 16 for quarterly series, 48 for monthly series, and 60 for other series.[1]
Time series were in the following domains: micro, industry, macro, finance, demographic, and other.[1][4] Below is the number of time series based on the time interval and the domain:[1][4]
Time interval between successive observations | Micro | Industry | Macro | Finance | Demographic | Other | Total |
---|---|---|---|---|---|---|---|
Yearly | 146 | 102 | 83 | 58 | 245 | 11 | 645 |
Quarterly | 204 | 83 | 336 | 76 | 57 | 0 | 756 |
Monthly | 474 | 334 | 312 | 145 | 111 | 52 | 1428 |
Other | 4 | 0 | 0 | 29 | 0 | 141 | 174 |
Total | 828 | 519 | 731 | 308 | 413 | 204 | 3003 |
The five measures used to evaluate the accuracy of different forecasts were: symmetric mean absolute percentage error (also known as symmetric MAPE), average ranking, median symmetric absolute percentage error (also known as median symmetric APE), percentage better, and median RAE.[1]
A number of other papers have been published with different analyses of the data set from the M3-Competition.[2][3]
Offshoots
NN3-Competition
Although the organizers of the M3-Competition did contact researchers in the area of artificial neural networks to seek their participation in the competition, only one researcher participated, and that researcher's forecasts fared poorly. The reluctance of most ANN researchers to participate at the time was due to the computationally intensive nature of ANN-based forecasting and the huge time series used for the competition.[1] In 2005, Crone, Nikolopoulos and Hibon organized the NN-3 Competition, using 111 of the time series from the M3-Competition (not the same data, because it was shifted in time, but the same sources). The NN-3 Competition found that the best ANN-based forecasts performed comparably with the best known forecasting methods, but were far more computationally intensive. It was also noted that many ANN-based techniques fared considerably worse than simple forecasting methods, despite greater theoretical potential for good performance.[14]
Reception
In books for mass audiences
Nassim Nicholas Taleb, in his book The Black Swan, references the Makridakis Competitions as follows: "The most interesting test of how academic methods fare in the real world was provided by Spyros Makridakis, who spent part of his career managing competitions between forecasters who practice a "scientific method" called econometrics -- an approach that combines economic theory with statistical measurements. Simply put, he made people forecast in real life and then he judged their accuracy. This led to a series of "M-Competitions" he ran, with assistance from Michele Hibon, of which M3 was the third and most recent one, completed in 1999. Makridakis and Hibon reached the sad conclusion that "statistically sophisticated and complex methods do not necessarily provide more accurate forecasts than simpler ones.""[15]
In the book Everything is Obvious, Duncan Watts cites the work of Makridakis and Hibon as showing that "simple models are about as complex models in forecasting economic time series."[16]
References
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Makridakis, Spyros; Hibon, Michele (October–December 2000). "The M-3 Competition: results, conclusions, and implications" (PDF). International Journal of Forecasting (International Institute of Forecasters and Elsevier). doi:10.1016/S0169-2070(00)00057-1. Retrieved April 19, 2014.
- 1 2 Koning, Alex J.; Frances, Philip Hans; Hibon, Michele; Stekler, H. O. (July–September 2005). "The M3 competition: Statistical tests of the results". International Journal of Forecasting (International Institute of Forecasters in collaboration with Elsevier). doi:10.1016/j.ijforecast.2004.10.003.
- 1 2 Hyndman, Rob J.; Koehler, Anne B. (October–December 2006). "Another look at measures of forecast accuracy". International Journal of Forecasting (International Institute of Forecasters in collaboration with Elsevier) 22 (4).
- 1 2 3 4 "M3-competition (full data)". International Institute of Forecasters. Retrieved April 19, 2014.
- 1 2 Spyros Makridakis; et al. (April–June 1982). "The accuracy of extrapolation (time series) methods: results of a forecasting competition" 1 (2). Journal of Forecasting: 111–153. doi:10.1002/for.3980010202.
- 1 2 3 4 5 6 Spyros Makridakis; et al. (April 1993). "The M-2 Competition: a real-time judgmentally based forecasting study" 9. International Journal of Forecasting: 5–22. doi:10.1016/0169-2070(93)90044-N.
- ↑ Geurts, M. D.; Kelly, J. P. (1986). "Forecasting demand for special services" 2. International Journal of Forecasting: 261–272.
- ↑ Clemen, Robert T. (1989). "Combining forecasts: A review and annotated bibliography" (PDF). International Journal of Forecasting (International Institute of Forecasters) 5: 559–583. doi:10.1016/0169-2070(89)90012-5.
- ↑ Fildes, R.; Hibon, Michele; Makridakis, Spyros; Meade, N. (1998). "Generalising about univariate forecasting methods: further empirical evidence". International Journal of Forecasting. pp. 339–358. doi:10.1016/s0169-2070(98)00009-0.
- ↑ Newbold, Paul (1983). "The competition to end all competitions" 2. Journal of Forecasting: 276–279.
- ↑ Spyros Makridakis and Michele Hibon (1979). "Accurancy of Forcasting: An Empirical Investigation" 142. Journal of the Royal Statistical Society: 97–145.
- ↑ Chatfield, Chris (April 1993). "A personal view of the M2-competition". International Journal of Forecasting (International Institute of Forecasters in collaboration with Elsevier) 9 (1): 23–24. doi:10.1016/0169-2070(93)90045-O.
- ↑ Fildes, R.; Makridakis, Spyros (1995). "The impact of empirical accuracy studies on time series analysis and forecasting" (PDF). International Statistical Review 63: 289–308. doi:10.2307/1403481.
- ↑ Crone, Sven F.; Nikolopoulos, Konstantinos; Hibon, Michele (2005/6 (published online April 2008)). "Automatic Modelling and Forecasting with Artificial Neural Networks– A forecasting competition evaluation" (PDF). Retrieved April 23, 2014. Check date values in:
|date=
(help) - ↑ Nassim Nicholas Taleb. Fooled by Randomness. ISBN 0-8129-7521-9., Page 154, available for online viewing at Google Books
- ↑ Duncan Watts. Everything is Obvious. ISBN 978-0307951793., Page 315