An Improvised k-NN Respecting Diversity of Data for Network Intrusion Detection
Résumé
Network Intrusion Detection is a complex classification problem aimed at discriminating the legitimate
from illegitimate and potentially harmful network
connections over the communication network. What adds to the
complexity of the problem is the near real
-
time response to a threat, imbalanced datasets to deal with and finally the
data being mixed in nature with some features being numeric some discrete an
d some nominal. In this work we have
applied
Synthetic Minority Oversampling Technique
(
SMOTE
)
to balance the dataset and eliminate the skewness of
the class distribution. The success of
k
-
Nearest Neighbour (
k
-
NN
)
depends upon the set of neighbours deemed to be
very close or similar to a data point which is in turn determined by the similarity
/distance
metric empl
oyed, where
most of the metrics employed in literature deal with numeric data only, and either need con
version of categorical
features to numeric features or simply eliminated the categorical features, which often leads to reduction in the results.
As for
this work is considered, we take into consideration both the categories of features simultaneously by
replac
ing
the conventional Euclidean metric with Gower metric, which is better suited for mixed data
. Gower metric provides a
mechanism to deal with heterogeneous features differently and ultimately yields a quantifiable value that determines
the similarit
y of the two instances.
Experimental results show that improvised version of
k
-
NN outperforms its
conventional counterpart in terms of the Accuracy, Detection Rate, Precision, Recall, f
-
Measure, and
Receiver
Operating Characteristic (
ROC
) curve
.