APT CLASS

Attributing a piece of malware to its creator typically requires threat intelligence to attain a sufficient confidence level. Binary attribution increases the level of difficulty as it mostly relies upon the ability to disassemble binaries to gather relevant features and build a fingerprint to identify the author.

To date, most research focuses on source code authorship attribution and the application of similar techniques to benign and malicious binaries. However, this approach provides an opportunity for malicious authors to attack the authorship attribution models due to the stark differences between both source code and binaries and benign and malicious authors.

Our survey (joint work with S3Lab) explores the style of threat actors and the adversarial techniques used by them to remain anonymous. We examine the adversarial impact on state-of-the-art methods for binary authorship attribution. Through this approach, we identify key findings and explore the open research challenges to identifying authorship style within malicious binaries.

One major challenge is the lack of a ground truth dataset of malware and authors. To mitigate this issue for the community, we publish alongside this survey a meta-information dataset of 15,660 malware labeled to 164 threat actor groups. This is the largest and diverse dataset to date. Additionally, we identify a further 7,485 malicious samples currently linked to unknown groups.

Access

To request access to the dataset, please complete the following form:

> FORM TO REQUEST ACCESS <

We have already granted access to people from the following institutions (alphabetical order):

Amadeus IT Group, Spain
Beijing University of Posts and Telecommunications, China
Ben Gurion University, Israel
Bern University of Applied Sciences, Switzerland
BlackTruffle Security
Cybergeeks[.]tech
Delhi Technological University, India
Fraunhofer FKIE, Germany
Georgia Tech Research Institute, USA
Global Infotek, Inc, USA
Grammatech, USA
Harfanglab, France
HRL Laboratories, USA
International Business Machines (IBM), USA
Indian Institute of Technology Kanpur, India
Information Sciences Institute, University of Southern California, USA
Jinan University, China
Kennesaw State University, USA
Kudu Dynamics, USA
Lancaster University, UK
Nanyang Technological University - NTU Singapore
National University and Science and Technology Islamabad, Pakistan
National University of Singapore
NATO
Naval Research Laboratory, USA
OpenAnalysis Inc
Osaka Electro-Communication University, Japan
Recorded Future, USA
Rice University, USA
Royal Holloway University Of London, UK
Ruhr-Universität Bochum, Germany
Sabancı University, Turkey
Shahid Beheshti University, Iran
TU Wien, Austria
UC Berkeley, USA
University Institute of Information Technology, PMAS, Pakistan
University of Chinese Academy of Sciences, China
University of Illinois, USA
University of Kent, UK
University of New Brunswick, Canada
University of Saskatchewan, Canada
Westphalian University, Germany
Wuhan University, China
Zeropoint Dynamics, USA

Papers

Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets
Jason Gray, Daniele Sgandurra, Lorenzo Cavallaro
CoRR · arXiv CoRR, 2021

@article{gray2021aptclass,

  author    = {Jason Gray and Daniele Sgandurra and Lorenzo Cavallaro},

  title     = {Identifying Authorship Style in Malicious Binaries: Techniques, Challenges & Datasets},

  journal   = {CoRR},

  volume    = {abs/2101.06124},

  year      = {2021},

  url       = {http://arxiv.org/abs/2101.06124},

  eprint    = {2101.06124},

  archivePrefix = {arXiv}

}

People

Jason Gray, Ph.D. Student, King's College London & Royal Holloway, University of London
Daniele Sgandurra, Senior Lecturer (Associate Professor), Royal Holloway, University of London
Lorenzo Cavallaro, Full Professor of Computer Science, Chair in Cybersecurity (Systems Security), King's College London