A Real-Time Intrusion Detection System based on Machine Learning and Big Data Techniques

Farah Jemili

Abstract

A Real-Time Intrusion Detection System based on Machine Learning and Big Data Techniques

Cybersecurity ventures expects cyber-attacks damage costs will rise to $11.5 billion in 2019 and that a business will fall victim to a cyber-attack every 14 seconds. Notice here that the time frame for such event is seconds. With data generated by peta bytes each day this is a challenging task for traditional intrusion detection systems (IDSs). Protecting sensitive information is a major concern for both businesses and governments. Therefore, the need for a real-time, large-scale and effective IDS is a must. In this work we present a cloud based, fault-tolerant, scalable and distributed IDS that uses Apache Spark Structured Streaming (PySpark) and its Machine Learning library (MLlib) to detect intrusions in real-time. To demonstrate the efficacy and effectivity of this system, we implement the proposed system within Microsoft Azure Cloud as it provides both processing power and storage capabilities. A decision tree algorithm is used to predict incoming data’s nature. For this task, the use of the MAWILab dataset as a data source will give better insights about the system capabilities against cyber-attacks. The experimental results showed a 99,95 % accuracy and more than 55175 events per second were processed by the proposed system on a small cluster.

Author(s): Farah Jemili

Abstract | Full-Text | PDF