Benchmark dataset using Function Call Graphs (FCG) for malware detection and classification
The FCG-MFD dataset is a large-scale benchmark dataset designed for malware family detection and classification using Function Call Graphs (FCG).
It contains 100,000 samples (50K malware + 50K benign) collected from multiple real-world sources.
This dataset enables advanced research in:
- Malware detection
- Family classification
- Graph-based machine learning
- Cyber threat intelligence
- ✅ 100,000 total samples
- ✅ 50,000 malware samples
- ✅ 50,000 benign samples
- ✅ 35+ malware families
- ✅ Function Call Graph (FCG) representation
- ✅ Metadata + behavioral features
- Trojan
- Ransomware
- Worms
- Backdoor
- Botnet
- Virus
The dataset is constructed using:
-
Malware sources:
- VirusShare
- VirusSample
- MalwareBazaar
- VX-Underground
- theZoo
-
Analysis tools:
- VirusTotal
- Cuckoo Sandbox
-
Feature extraction:
- Function Call Graphs (FCG)
- Behavioral analysis
- Metadata (MD5, PE info)
- Malware detection
- Malware family classification
- Graph Neural Networks (GNN)
- Intrusion Detection Systems (IDS)
- Cybersecurity research
If you use this dataset, please cite:
@article{HADI2025104050,
title = {FCG-MFD: Benchmark function call graph-based dataset for malware family detection},
journal = {Journal of Network and Computer Applications},
volume = {233},
pages = {104050},
year = {2025},
issn = {1084-8045},
doi = {https://doi.org/10.1016/j.jnca.2024.104050},
url = {https://www.sciencedirect.com/science/article/pii/S1084804524002273},
author = {Hassan Jalil Hadi and Yue Cao and Sifan Li and Naveed Ahmad and Mohammed Ali Alshara},
keywords = {Malware detection, Malware family classification, Function Call Graph, Dataset},
}CyberSar Lab 🔗 https://cybersar.kaust.edu.sa/
🛡️ Advancing Malware Analysis with Graph-Based Intelligence
