|
Home > TERATEC FORUM > Workshop 7
Performance Comparison of SQL based Big Data Analytics using Lustre and HDFS file systems The performance benefits of parallel processing technology have led the migration of existing RDBMS applications to big data technologies such as Hadoop and Hive. This migration brings in additional challenges to catch up performance of parallel RDBMS using parallelism for data processing in commodity based nodes’ cluster- this raises the need to replace the traditional file systems such as HDFS with parallel file systems such as Lustre. Moreover, convergence of HPC with Big data motivates further to have unified file system to avoid data transfer across different subsystems. In this presentation, we share performance comparison of HDFS and Intel Lustre for FSI, Telecom and Insurance SQL workload evaluating the performance of the application on an integrated stack with Hive and Lustre through Hive extensions such as Hadoop Adapter for Lustre (HAL) developed by Intel, while comparing the performance against the Hadoop Distributed File System (HDFS). The environment used for this evaluation shall be hosted in the Intel BigData Lab in Swindon (UK). The cluster consists in 16 Intel Ivy Bridge nodes connected by an Intel TrueScale Infiniband network set up with CDH 5.2. Another similar cluster will be used to compare HDFS performance. We use Intel Enteprise Edition for Lustre 2.2 for the experiment based on Lustre 2.5 and Hadoop Adapter for Lustre 3.1. Both the systems will be evaluated on performance metric ‘query average response time’ for FSI workload. Tests will be run for application data volumes varying from 100 GB to 7 TB.
|
||||||||||
© Ter@tec - All rights reserved - Lawful mention |