A smart elephant for a smart-grid : storage and analytics of (electrical) time-series within Hadoop
Marie-Luce PICARD
EDF R&D, Clamart, France
Abstract : Smart-grid projects imply the deployment of large amount of meters generating massive data streams utilities will have to face, as well as great potential business opportunity they will have to innovate with.
In France, the EDF Group will have to manage data coming from 35 millions of meters which are planned to be installed within a few years. New perspectives emerge for business teams dealing with energy management and smart grid issues, providing that they can take advantage of an agile exploitation of metering data. Business use cases require efficient analytical and point queries, as well as the ability to enhance more complex data analysis: classical data-mining tasks (clustering, classification, scoring) for customer insights or detection of non technical losses, home-made methods for specific applications such as forecasting, or pattern recognition in time series … Scalable and efficient solutions for storing and processing metering data are needed.
In this session we will describe a proactive work on designing and implementing a Hadoop based solution to answer these issues. We will focus on various technical aspects (architecture, data structure, Hive and HBase storage). As for the analytics part we will present our use of existing toolkits (e.g. Mahout) and of our own Hadoop On Time-series toolkit implementing well-known reduction methods (FOURIER, SAX) and innovative data-mining algorithms. We will present feedback from experiments carried out on 35 millions of load curves -represented by 1800 billions of records-, guided by business needs.
Biography : Marie-Luce Picard is a project manager and BI expert at EDF-R&D. She has been involved or managed different R&D projects dealing with business intelligence and information systems (advanced documentation systems, data-mining for enriching customers knowledge, …). She has also managed the EDF R&D team working on BI and analytics, and was the assistant general manager of the Joint Lab on BI between EDF and Telecom ParisTech (BILab). She is currently in charge of managing the EDF-R&D project dealing with Big Data to handle the evolutions of EDF information systems linked to the data deluge expected within a few years impacting all businesses of the Company.
|