CTBT: Science and Technology 2017 Conference

Name: CTBT: Science and Technology 2017 Conference
Start: 2017-06-26T02:00:00+02:00
End: 2017-06-30T20:00:00+02:00
Location: No location set

26–30 Jun 2017

Europe/Vienna timezone

Development of U.S. NDC Performance Metrics Through Large Scale Analysis of System Log Files With Hadoop Distributed File System Based Tools

Not scheduled

Poster 3. Advances in sensors, networks and processing

W. N. Junek (United States National Data Center)

Optimizing the performance of the United States National Data Center (U.S. NDC) geophysical data processing system is critical for efficiently monitoring international compliance to nuclear test ban treaties. The U.S. NDC software stores system performance information for each data processing interval in a collection of semi-structured alphanumeric log files. On average, the system generates 140,000 log files per day which are stored in different directories. Currently, acquisition of process specific performance information or isolation of error messages must be parsed from each log-file individually. This manual parsing process is time consuming and often leads to incomplete collections of system performance information. The U.S. NDC system has been modified to output log files in JavaScript Object Notation (JSON), which is a highly structured data format that can be easily parsed. Here, we show how U.S. NDC system performance information can be parsed from JSON files and analyzed using a collection of Hadoop Distributed File System (HDFS) based tools such as Hive, Zeppelin, and PySpark. The HDFS system architecture is designed to store and process large alphanumeric datasets, which can be easily scaled to accommodate our continuously growing collection of performance information extracted from U.S. NDC log files.

W. N. Junek (United States National Data Center)

T3.1-P20_Junek_posters_Development_of_U.S._NDC_P.pdf

CTBT: Science and Technology 2017 Conference

Development of U.S. NDC Performance Metrics Through Large Scale Analysis of System Log Files With Hadoop Distributed File System Based Tools

Speaker

Description

Author

Presentation materials

Choose timezone

CTBT: Science and Technology 2017 Conference

Speaker

Description

Author

Presentation materials