Title: An Online Learning System for the Prediction of Electricity Distribution Feeder Failures Abstract: We are using machine learning techniques for constructing a failure-susceptibility ranking of feeder cables that supply electricity to the boroughs of New York City. The electricity system is inherently dynamic, and thus our failure-susceptibility ranking system must be able to adapt to the latest conditions in real time, having to update its ranking accordingly. The feeders have a significant failure rate, and many resources are devoted to monitoring, maintenance and repair of feeders. The ability to predict failures allows the shifting from reactive to proactive maintenance, thus reducing costs. The feature set for each feeder includes a mixture of static data (e.g. age and composition of each feeder section) and dynamic data (e.g. electrical load data for a feeder and its transformers). The values of the dynamic features are captured at the time of training and therefore lead to different models depending on the time and day at which each model is trained. Previously, a framework was designed to train models using a new variant of boosting called Martingale Boosting, as well as Support Vector Machines. However, in this framework, an engineer had to decide whether to use the most recent data to build a new model, or use the latest model instead for future predictions. To avoid the need of human intervention, we have developed an "online" system that determines what model to use by monitoring past performance of previously trained models. In our new framework, we treat each batch-trained model as an expert, and use a measurement of its performance as the basis for reward or penalty of its quality score. We measure performance as a normalized average rank of failures. For example, in a ranking of 50 items with actual failures ranked #4 and #20, the performance is: 1 - (4 + 20) / (2*50) = 0.76. Our approach builds on the notion of learning from expert advice as formulated in the continuous version of the Weighted Majority algorithm. Since each model is analogous to an expert and our system runs live thus gathering new data and generating new models, we have to keep adding new experts to the existing ensemble throughout the algorithm's execution. To avoid having to monitor an ever-increasing set of experts, we drop poorly performing experts after each prediction. We had to address the following key issues in our solution: (1) how often and with what weight do we add new experts, and (2) what experts do we drop. Our simulations suggest that using the median of all current models' weights for new models works best. To drop experts we use a combination of age of the model and past performance. Finally, to make predictions we use a weighted average of the top-scoring experts. Our system is currently deployed and being tested by New York City's electricity distribution company. Results are highly encouraging, with 75% of the failures in the summer of 2005 being ranked in the top 26%, and 75% of failures in 2006 being ranked in the top 36%.