Skip to content

CyberSaR-KAUST/Function-Call-Graph-Malware-Family-Detection-Dataset

Repository files navigation

🧬 FCG-MFD: Malware Family Detection Dataset

Benchmark dataset using Function Call Graphs (FCG) for malware detection and classification


📌 Overview

The FCG-MFD dataset is a large-scale benchmark dataset designed for malware family detection and classification using Function Call Graphs (FCG).

It contains 100,000 samples (50K malware + 50K benign) collected from multiple real-world sources.

This dataset enables advanced research in:

  • Malware detection
  • Family classification
  • Graph-based machine learning
  • Cyber threat intelligence

📊 Dataset Highlights

  • ✅ 100,000 total samples
  • ✅ 50,000 malware samples
  • ✅ 50,000 benign samples
  • ✅ 35+ malware families
  • ✅ Function Call Graph (FCG) representation
  • ✅ Metadata + behavioral features

🧬 Malware Categories

  • Trojan
  • Ransomware
  • Worms
  • Backdoor
  • Botnet
  • Virus

🧠 Methodology

The dataset is constructed using:

  • Malware sources:

    • VirusShare
    • VirusSample
    • MalwareBazaar
    • VX-Underground
    • theZoo
  • Analysis tools:

    • VirusTotal
    • Cuckoo Sandbox
  • Feature extraction:

    • Function Call Graphs (FCG)
    • Behavioral analysis
    • Metadata (MD5, PE info)


🧪 Applications

  • Malware detection
  • Malware family classification
  • Graph Neural Networks (GNN)
  • Intrusion Detection Systems (IDS)
  • Cybersecurity research


📜 Citation

If you use this dataset, please cite:

@article{HADI2025104050,
title = {FCG-MFD: Benchmark function call graph-based dataset for malware family detection},
journal = {Journal of Network and Computer Applications},
volume = {233},
pages = {104050},
year = {2025},
issn = {1084-8045},
doi = {https://doi.org/10.1016/j.jnca.2024.104050},
url = {https://www.sciencedirect.com/science/article/pii/S1084804524002273},
author = {Hassan Jalil Hadi and Yue Cao and Sifan Li and Naveed Ahmad and Mohammed Ali Alshara},
keywords = {Malware detection, Malware family classification, Function Call Graph, Dataset},
}

👨‍💻 Maintainer

CyberSar Lab 🔗 https://cybersar.kaust.edu.sa/


🛡️ Advancing Malware Analysis with Graph-Based Intelligence

About

Benchmark malware dataset with function call graphs (FCG) for malware family detection and classification using machine learning and graph-based methods.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors