Ter@tec

Running Hadoop Map Reduce Jobs on Lustre

Gabriele PACIUCCI
EMEA Lustre Solution Architect INTEL

Zhiqi TAO
ASMO Lustre Solution Architect INTEL

Abstract : This talk presents a tutorial of how to run Hadoop Map Reduce jobs on the Lustre file system and some results from Terasort and DFSIO benchmarks.

Hadoop is becoming increasingly popular on data analytics. The Map Reduce frame is an effective way to process lots of data and aggregate the results. Many researchers in the HPC field are interested in leveraging the Hadoop Map Reduce frame to solve the problems that also are involved with lots of data processing.

However, the default Hadoop deployment mode uses Hadoop File System (HDFS) and therefore requires local storage, which present a challenge in the HPC data centers as many HPC computing infrastructure has little or none local storage, e.g. blade systems.

The Intel Lustre team developed a Hadoop Lustre adapter to solve this challenge. In 2013 the Intel Engineer Omkar Kulkarni presented the first prototype of the Hadoop Adapter for Lustre and the early work.

This year Intel has integrated and expanded this effort in the HPC distribution for Hadoop. We like to present the tutorial of how to make Hadoop Map Reduce Jobs running on Lustre in details. Not only will we present the benefits, but we will also discuss the areas that the Hadoop Lustre Adapter might be useful for HPC customers.