A transfer-aware, deployment-oriented evaluation framework for NetFlow-based intrusion detection systems (TAN-IDS)

doi:10.1371/journal.pone.0346801

Table 1.

Representative related work and positioning of TAN-IDS.

More »

Expand

Table 2.

Related work matrix comparing evaluation capabilities across representative IDS studies.

More »

Expand

Fig 1.

Overview of the proposed TAN-IDS framework.

TAN-IDS unifies heterogeneous IDS datasets into a common NetFlow feature space and organizes domain-aware evaluation scenarios, including in-dataset, cross-dataset, mixed-domain, and transfer-aware fine-tuning.

More »

Expand

Fig 2.

Detailed pipeline of the TAN-IDS evaluation framework.

The scenario controller defines domain-aware evaluation protocols, including in-dataset, cross-dataset, mixed-domain, and transfer-aware fine-tuning, independently of the classifier choice. Pluggable classifiers (Random Forest, XGBoost, Multilayer Perceptron, with an optional Transformer-based baseline) are trained and evaluated within a unified 8-dimensional NetFlow feature space. All preprocessing statistics are computed exclusively on training splits to prevent data leakage.

More »

Expand

Table 3.

Transfer-aware evaluation scenarios supported by TAN-IDS.

More »

Expand

Table 4.

Statistics of the NetFlow datasets used in TAN-IDS.

More »

Expand

Table 5.

Mapping and description of the eight common NetFlow features between UNSW-NB15 (NetFlow) and NF-CSE-CIC-IDS2018. Non-continuous fields (proto, and tcp lags) are encoded deterministically; ports are treated as bounded numeric variables.

More »

Expand

Table 6.

Key hyperparameters for MLP, Random Forest, XGBoost, and FlowTransformer-lite (baseline) in TAN-IDS.

More »

Expand

Fig 3.

Performance comparison across evaluation scenarios.

(a–b) In-dataset performance measured by F1-macro and attack recall. (c–d) Cross-dataset evaluation results. Preprocessing statistics are computed on training splits only to avoid data leakage.

More »

Expand

Table 7.

In-dataset evaluation results (S1–S8).

More »

Expand

Table 8.

Cross-dataset evaluation results (S9–S16).

More »

Expand

Fig 4.

Transfer and adaptation performance across domains.

(a–b) Mixed-domain training results measured by F1-macro and attack recall. (c–d) Fine-tuning with limited labeled target-domain data.

More »

Expand

Fig 5.

Attack-class precision–recall across cross-dataset and fine-tuning scenarios.

The results illustrate how domain mismatch and limited target-domain adaptation jointly affect attack detection sensitivity and false-alarm propensity.

More »

Expand

Table 9.

Mixed-domain training results (S17–S24).

More »

Expand

Table 10.

Fine-tuning results with 5% labeled target subset (S25–S32).

More »

Expand