Skip to: site menu | section menu | main content

first workshop on mining of non-conventional data

mincoda'09
Currently viewing: mincoda'09 » competition

Use the navigation tabs to access sections directly.

BIGSLARCA
fake header Sevilla

Our workshop includes a competition on time series forecasting. There are three time series that the participants have to forecast: temperature, electricity prices, and ozone levels. Each of these time series has been divided into two consecutive parts, the first constituting the training set and the second constituting the test set. Test sets will be made public after the submission deadline. Participants have to predict as accurately as possible the daily average temperature for the temperature time series, and hourly values for the electricity and ozone time series (read on for detailed descriptions on the data, objectives and evaluation). At the workshop, a presentation will be made of the top-scoring methods that entered the competition and their results. The winner will receive free registration to the conference.

Objectives

In this contest we use three time-series:
  1. Daily average measurements of temperature between Feb. 16, 2007 and Dec. 21, 2008
    (a total of 675 days = 675 values)
  2. Hourly measurements of electricity prices between Jan 1st, 2003 and Jun. 30th, 2006
    (a total of 24 values/day x 1279 days = 30696 values)
  3. Hourly measurements of ozone levels between Jan 8th, 2003 and Aug. 31st, 2008
    (a total of 24 values/day x 2063 days = 49512 values)
We have divided these three time series into two consecutive chunks, the first part constituting the training data, and the second part constituting the test data. Training sets are available in the training data section below. The test sets will be made public at some point after the submission deadline. There are three prediction objectives:
  1. Predict daily average temperatures, starting 22 Sep. 2008, and ending on 21 Dec. 2008 (91 values)
  2. Predict hourly electricity prices, starting 1 Jun. 2006 at 00:00, and ending on 30 Jun. 2006 23:00 (30 days x 24 values/day = 720 values)
  3. Predict hourly ozone levels, starting 1 Jun. 2008 at 01:00, and ending on 31 Aug. 2008 24:00 (92 days x 24 values/day = 2208 values)

Training data

A zip file containing the three time series can be downloaded here.
  • Temperature

    The following tab-delimited text file contains data from Feb. 16th, 2007 to Sep 21st, 2008. The first column contains dates and the second column daily average temperatures. The first row contains column titles.

  • Electricity

    The following tab-delimited text file contains data from Jan. 1st, 2003 to May 31st, 2006. Each row contains the data for one day (24 measurements). The first column contains the date. The first row contains the titles for the times of measurement.

  • Ozone

    The following tab-delimited text file contains data from Jan. 8th, 2003 to May 31st, 2008. Each row contains the data for one day (24 measurements). The first column contains the date. The first row contains the titles for the times of measurement. The file contains -1 measurements representing missing values.

Test dataset

A zip file containing the three time series can be downloaded here.

Evaluation

For each of the five objectives, we compute the mean relative error (MRE) measured as follows, where N is the number of values to predict, pred(i) is the participant's prediction and real(i) is the real value:
equation
The final score is the average of the three MRE scores:

final_score = 1/3 (MRE(1) + MRE(2) + MRE(3))

where MRE(j) refers to the MRE score for objective j.

The winner will be the participant with the lowest final score according to the formula above. In case of a tie, the prize will be split evenly among the winners.

Important dates

  • Submission deadline: 5 Jul. 2009 20 Jul. 2009
  • Presentation of results: 13 Nov. 2009 (at the workshop)

Submission instructions

  • Submissions should follow the workshop paper submission instructions
  • Authors should attach three tab-delimited text files containing the predictions, one for each of the data sets. The format of these files should be the same as in the training dataset files, with the exception of temperature, where only daily average values should be reported. Here are three sample files for each of the predictions (the values in these files are completely random): temperature, electricity, ozone.

Results

Winners will be announced at the workshop

Back to top