Employing machine learning algorithms to detect unknown scanning and email worms

Shubair Abdulla; Sureswaran Ramadass; Altyeb Altaher; Amer Al-Nassiri

Employing machine learning algorithms to detect unknown scanning and email worms

Shubair Abdulla, Sureswaran Ramadass, Altyeb Altaher, Amer Al-Nassiri

Instructional & Learning Technologies

Research output: Contribution to journal › Article › peer-review

11 Citations (Scopus)

Abstract

We present a worm detection system that leverages the reliability of IP-Flow and the effectiveness of learning machines. Typically, a host infected by a scanning or an email worm initiates a significant amount of traffic that does not rely on DNS to translate names into numeric IP addresses. Based on this fact, we capture and classify NetFlow records to extract feature patterns for each PC on the network within a certain period of time. A feature pattern includes: No of DNS requests, no of DNS responses, no of DNS normals, and no of DNS anomalies. Two learning machines are used, K-Nearest Neighbors (KNN) and Naive Bayes (NB), for the purpose of classification. Solid statistical tests, the cross-validation and paired t-test, are conducted to compare the individual performance between the KNN and NB algorithms. We used the classification accuracy, false alarm rates, and training time as metrics of performance to conclude which algorithm is superior to another. The data set used in training and testing the algorithms is created by using 18 real-life worm variants along with a big amount of benign flows.

Original language	English
Pages (from-to)	140-148
Number of pages	9
Journal	International Arab Journal of Information Technology
Volume	11
Issue number	2
Publication status	Published - Mar 2014

Keywords

Email worms
IP-Flow
KNN
NB
Netflow
Scanning worms

ASJC Scopus subject areas

General Computer Science

Cite this

@article{852d522f88184acd96e0cd912218e62e,

title = "Employing machine learning algorithms to detect unknown scanning and email worms",

abstract = "We present a worm detection system that leverages the reliability of IP-Flow and the effectiveness of learning machines. Typically, a host infected by a scanning or an email worm initiates a significant amount of traffic that does not rely on DNS to translate names into numeric IP addresses. Based on this fact, we capture and classify NetFlow records to extract feature patterns for each PC on the network within a certain period of time. A feature pattern includes: No of DNS requests, no of DNS responses, no of DNS normals, and no of DNS anomalies. Two learning machines are used, K-Nearest Neighbors (KNN) and Naive Bayes (NB), for the purpose of classification. Solid statistical tests, the cross-validation and paired t-test, are conducted to compare the individual performance between the KNN and NB algorithms. We used the classification accuracy, false alarm rates, and training time as metrics of performance to conclude which algorithm is superior to another. The data set used in training and testing the algorithms is created by using 18 real-life worm variants along with a big amount of benign flows.",

keywords = "Email worms, IP-Flow, KNN, NB, Netflow, Scanning worms",

author = "Shubair Abdulla and Sureswaran Ramadass and Altyeb Altaher and Amer Al-Nassiri",

year = "2014",

month = mar,

language = "English",

volume = "11",

pages = "140--148",

journal = "International Arab Journal of Information Technology",

issn = "1683-3198",

publisher = "Zarqa University",

number = "2",

}

TY - JOUR

T1 - Employing machine learning algorithms to detect unknown scanning and email worms

AU - Abdulla, Shubair

AU - Ramadass, Sureswaran

AU - Altaher, Altyeb

AU - Al-Nassiri, Amer

PY - 2014/3

Y1 - 2014/3

N2 - We present a worm detection system that leverages the reliability of IP-Flow and the effectiveness of learning machines. Typically, a host infected by a scanning or an email worm initiates a significant amount of traffic that does not rely on DNS to translate names into numeric IP addresses. Based on this fact, we capture and classify NetFlow records to extract feature patterns for each PC on the network within a certain period of time. A feature pattern includes: No of DNS requests, no of DNS responses, no of DNS normals, and no of DNS anomalies. Two learning machines are used, K-Nearest Neighbors (KNN) and Naive Bayes (NB), for the purpose of classification. Solid statistical tests, the cross-validation and paired t-test, are conducted to compare the individual performance between the KNN and NB algorithms. We used the classification accuracy, false alarm rates, and training time as metrics of performance to conclude which algorithm is superior to another. The data set used in training and testing the algorithms is created by using 18 real-life worm variants along with a big amount of benign flows.

AB - We present a worm detection system that leverages the reliability of IP-Flow and the effectiveness of learning machines. Typically, a host infected by a scanning or an email worm initiates a significant amount of traffic that does not rely on DNS to translate names into numeric IP addresses. Based on this fact, we capture and classify NetFlow records to extract feature patterns for each PC on the network within a certain period of time. A feature pattern includes: No of DNS requests, no of DNS responses, no of DNS normals, and no of DNS anomalies. Two learning machines are used, K-Nearest Neighbors (KNN) and Naive Bayes (NB), for the purpose of classification. Solid statistical tests, the cross-validation and paired t-test, are conducted to compare the individual performance between the KNN and NB algorithms. We used the classification accuracy, false alarm rates, and training time as metrics of performance to conclude which algorithm is superior to another. The data set used in training and testing the algorithms is created by using 18 real-life worm variants along with a big amount of benign flows.

KW - Email worms

KW - IP-Flow

KW - KNN

KW - NB

KW - Netflow

KW - Scanning worms

UR - http://www.scopus.com/inward/record.url?scp=84899800864&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899800864&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84899800864

SN - 1683-3198

VL - 11

SP - 140

EP - 148

JO - International Arab Journal of Information Technology

JF - International Arab Journal of Information Technology

IS - 2

ER -

Employing machine learning algorithms to detect unknown scanning and email worms

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Cite this