Offensive Language Identification Dataset - OLID

This is the homepage for the Offensive Language Identification Dataset (OLID) by Zampieri et al. (2019).
OLID contains a collection of annotated tweets using an annotation model that encompasses following three levels:
A: Offensive Language Detection
B: Categorization of Offensive Language
C: Offensive Language Target Identification

Download OLID v1.0

The complete dataset OLID v1.0 dataset (train, test, and gold labels) is available for download from CodaLab.
Go to and follow the instructions to download it.



More information about the OLID dataset can be found in the NAACL 2019 paper:
If you used OLID, please cite this paper:

    title={{Predicting the Type and Target of Offensive Posts in Social Media}}, 
    author={Zampieri, Marcos and Malmasi, Shervin and Nakov, Preslav and Rosenthal, Sara and Farra, Noura and Kumar, Ritesh}, 
    booktitle={Proceedings of NAACL},