This is the homepage for the Offensive Language Identification Dataset (OLID) by Zampieri et al. (2019).
OLID contains a collection of annotated tweets using an annotation model that encompasses following three levels:
A: Offensive Language Detection
B: Categorization of Offensive Language
C: Offensive Language Target Identification
OLID was used in the OffensEval: Identifying and Categorizing Offensive Language in Social Media (SemEval 2019 - Task 6) shared task.
Download OLID v1.0
The complete dataset OLID v1.0 dataset (train, test, and gold labels) is available for download from CodaLab.
Go to https://competitions.codalab.org/competitions/20011 and follow the instructions to download it.
Publications
More information about the OLID dataset can be found in the NAACL 2019 paper:
If you used OLID, please cite this paper:
@inproceedings{zampierietal2019,
title={{Predicting the Type and Target of Offensive Posts in Social Media}},
author={Zampieri, Marcos and Malmasi, Shervin and Nakov, Preslav and Rosenthal, Sara and Farra, Noura and Kumar, Ritesh},
booktitle={Proceedings of NAACL},
year={2019}
}