Farig Sadeque, Stephen Rains, Yotam Shmargad, Kate Kenski, Kevin Coe, and Steven Bethard. 2019. “Incivility Detection in Online Comments.” In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019), Pp. 283–291. Minneapolis, Minnesota: Association for Computational Linguistics. Publisher's VersionAbstract
Incivility in public discourse has been a major concern in recent times as it can affect the quality and tenacity of the discourse negatively. In this paper, we present neural models that can learn to detect name-calling and vulgarity from a newspaper comment section. We show that in contrast to prior work on detecting toxic language, fine-grained incivilities like namecalling cannot be accurately detected by simple models like logistic regression. We apply the models trained on the newspaper comments data to detect uncivil comments in a Russian troll dataset, and find that despite the change of domain, the model makes accurate predictions.
Vikas Yadav, Farig Sadeque, Bryan Heidorn, and Hong Cui. 2018. “Where Are iSchools Heading?” In Transforming Digital Worlds, edited by Gobinda Chowdhury, Julie McLeod, Val Gillet, and Peter Willett, Pp. 665–670. Cham: Springer International Publishing.Abstract
iSchools are highly interdisciplinary in nature - hence the direction and vision of iSchools have attracted researchers from various disciplines in recent times. In this paper, we analyzed the contents of the courses offered by 22 iSchools from different parts of the world. Our system extracts information from the course descriptions offered by different iSchools and visualizes the current trend of offering more courses with substantially more emphasis on computation than other paradigms. The architecture of our system is simple yet powerful - which may encourage others to implement similar techniques in different iSchool-related research.
Prasha Shrestha, Nicolas Rey-Villamizar, Farig Sadeque, Ted Pedersen, Steven Bethard, and Thamar Solorio. 2016. “Age and Gender Prediction on Health Forum Data.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Pp. 3394–3401. Portoroż, Slovenia: European Language Resources Association (ELRA). Publisher's VersionAbstract
Health support forums have become a rich source of data that can be used to improve health care outcomes. A user profile, including information such as age and gender, can support targeted analysis of forum data. But users might not always disclose their age and gender. It is desirable then to be able to automatically extract this information from users' content. However, to the best of our knowledge there is no such resource for author profiling of health forum data. Here we present a large corpus, with close to 85,000 users, for profiling and also outline our approach and benchmark results to automatically detect a user's age and gender from their forum posts. We use a mix of features from a user's text as well as forum specific features to obtain accuracy well above the baseline, thus showing that both our dataset and our method are useful and valid.
