Artificial Intelligence struggles to moderate COVID Misinformation

Written by Manushi Siriwardana

The spread of COVID-19 misinformation is a major problem that emerged along with the spread of the virus itself. False information is misinformation, while deliberately false information is disinformation. COVID-19 symptoms, complications, transmission and vaccine are popular topics of COVID-19 discussion. Figure 1 displays a word cloud obtained using fake news related to COVID-19 shared on Facebook from the COVID-19 Fake News Infodemic Research Dataset (Saenz et al., 2021).

Figure 1: Word cloud for fake COVID-19 news on Facebook

According to the word cloud, some of the popular words related to COVID-19 fake news found on Facebook include lockdown, quarantine, hospital, vaccine and mask. COVID-19 misinformation can create fear, panic and chaos. It can have adverse effects on the decision process, behaviour and wellbeing of people (Loomba et al., 2021). Therefore, it is worthwhile to detect false information and take necessary steps to deal with them in an appropriate manner. Due to unlimited access to information, distinguishing reliable information from false information has become a demanding, yet a necessary endeavour. Artificial Intelligence (AI) has become an important tool in seeking solutions for many problems that emerged with the viral outbreak. According to IBM Cloud Education (2020) "artificial intelligence is a field, which combines computer science and robust datasets, to enable problem solving" (para. 5). Figure 2 provides a brief understanding about the terms; AI, Machine Learning (ML) and Deep Learning (DL) and their relationship.

Figure 2: Artificial Intelligence, Machine Learning and Deep Learning

Source: https://www.geospatialworld.net/blogs/difference-between-ai%EF%BB%BF-machine-learning-and-deep-learning/

Different organizations and parties have been utilizing AI for dealing with COVID-19 misinformation (Perry, 2020). Furthermore, many studies have been conducted to detect false information pertaining to COVID-19 using AI methods. Nevertheless, there have been challenges and limitations in using AI for detecting COVID-19 misinformation (Al-Rakhami & Al-Amri, 2020). The main purpose of this article is to review the existing literature on the use of AI methods in detecting false information related to COVID-19 while discussing their strengths and weaknesses and to explore challenges faced in utilizing AI methods for fighting COVID-19 misinformation.

 

Related Work

 

A number of studies carried out to detect COVID-19 misinformation using AI techniques has accumulated over this short period of time. There have been studies that utilized supervised learning techniques as well as unsupervised learning techniques (Al-Rakhami & Al-Amri, 2020; Hussna et al., 2021; Lee et al., 2020; Wani et al., 2021). Al-Rakhami and Al-Amri (2020) used tweet and user level features to determine the credibility of COVID-19 information extracted from Twitter. The proposed ensemble learning model appeared to show higher performance than the individual machine learning models. Furthermore, they stated the difficulties faced in obtaining and preparing the dataset for the study. Patwa et al. (2021) applied logistic regression, Support Vector Machine (SVM) with linear kernel, Decision Tree (DT) and Gradient Boost (GDBT) for detecting misinformation using social media posts and news related to COVID-19 and obtained a F1-score of 93.32% using the SVM. They suggested considering multilingual data and the reason why information is true or false in future studies. Hussna et al. (2021) have applied multinomial Naive Bayes classifier, logistic regression, SVM classifier and a deep learning based algorithm named distil BERT to detect fake news regarding the pandemic using social media posts. It should be noted that the application of supervised learning techniques requires a labelled dataset with real and false information. Lee et al. (2020) proposed an unsupervised approach for identifying COVID-19 misinformation utilizing perplexity. Studies that make use of unsupervised learning techniques appear to be rare among literature related to detecting COVID-19 misinformation. This area can be further explored in future work. Natural Language Processing (NLP) concepts have been used in many studies for detecting COVID-19 related misinformation. Serrano et al. (2020) used NLP in order to identify COVID-19 misinformation videos on Youtube utilizing user comments. However, application of NLP concepts to detect COVID-19 misinformation can be a difficult task due to many reasons which include unique words having different meanings based on their context and the existence of many human languages. Moreover, use of language, style of writing and content may change depending on where the information is shared. It might not be adequate to rely solely on the content to identify misinformation. A suitable combination of features that captures details on the content, reaction of users, source of information and time aspects can be used. These concepts appear to be inadequately addressed in existing literature.

 

Challenges faced in utilizing AI for fighting COVID-19 misinformation

 

Numerous problems can arise in the application of AI methods for fighting COVID-19 misinformation during different stages such as data acquisition, development of models and implementation. ML and DL methods make use of data for developing models. In order to generalise findings to real world settings, data must be adequate, unbiased and representative. Even at present, data is only available for a short period of time which is less than two years. Generally, models need sufficient amounts of data for training, validation and testing. Moreover, performance of models and insights drawn can vary depending on the extracted data which may come from different platforms, geographical regions etc. Preparation and use of a dataset may require fact checking and appropriate addressing of legal and ethical aspects which can be challenging. If a supervised learning technique is to be employed, a labelled dataset with both real and false information is needed. Reliable fact-checking websites can be utilized in preparing the datasets. Improper labelling can lead to biased results and misleading outcomes. This can be disastrous due to the severity of the disease. Furthermore, datasets being imbalanced is a common problem faced in many classification problems. Similarly, with regard to classifying COVID-19 information, the proportion of false information available can be low compared to the proportion of true information available for the study or vice versa (Al-Rakhami & Al-Amri, 2020). Resampling techniques can be used to overcome this problem. Moreover, unsupervised learning techniques which do not require labelled datasets can be applied in an appropriate manner.

 

It is highly important that models perform well given the nature of the pandemic and consequences of incorrect predictions. Predicting false information as real and real information as false can lead to disastrous outcomes. If false information is identified as real, necessary measures will not be taken against the information and incorrectly accepted information will continue to mislead people. If real information is identified as false, unnecessary measures will be taken against the information and people may not be able to access vital information. The performance of models can depend on the features used in fitting the models. Hence, another challenging task is the selection of a suitable combination of features for detecting COVID-19 misinformation. Singh and Sharma (2021) used a deep learning based multi-modal approach to identify fake images on social media using text and image feature learning. Mazzeo et al. (2021) discussed the importance of considering Uniform Resource Locator (URL) based features along with other frequently used features for detecting fake news. They found that the efficiency and performance of models can be improved by considering both textual and URL features. URLs can be useful in capturing information regarding the reliability and credibility of sources. According to the study conducted by Gupta et al. (2020), social media is a potential source of inaccurate information regarding COVID-19. Hence, it is important to consider the reliability and credibility of the source of information along with the reliability of the information itself. Figures 3 & 4 compare the number of likes observed for real and fake microblogs on Weibo, obtained from the CHECKED: Chinese COVID-19 fake news dataset (Yang et al., 2021). The boxplots portray that the number of likes observed for fake microblogs is comparatively lower. The number of likes observed might show the users’ reaction to information.

Figure 3: Number of likes for real microblogs

Figure 4: Number of likes for fake microblogs

Note: There were many outliers in the data. Therefore, limits of the vertical axis of plots were selected such that patterns of other data points can be clearly examined.

According  to Wu et al. (2019), use of text content alone can be inadequate for identifying false information since content may be intentionally made true and correct. They further divided the misinformation detection methods as content-based, context-based, propagation-based and early detection of misinformation. In future studies, features can be selected incorporating aforementioned concepts and an approach utilizing features related to these four methods can be employed to examine whether performance of models can be improved. A feature selection method can be used to determine the most appropriate combination of features. Ahmad et al. (2020) utilized machine learning models and ensemble techniques to classify news articles and evaluated the performance of the aforementioned methods on four real world datasets. According to their study, ensemble learners performed better than individual learners in terms of all performance metrics considered. Therefore, ensemble techniques can be applied in detecting COVID-19 misinformation to examine whether they can yield higher accuracy.

 

Implementing AI concepts in a real world setting can be demanding. Use of black box models can lead to lack of understanding about the functioning of algorithms and interpretability (Khemasuwan & Colt, 2021). Generally, users of the developed models may not have a comprehensive understanding about AI concepts which might make it difficult for them to make proper use of them. Furthermore, users might unknowingly ignore the warnings or flags given against misinformation in some cases. Aldwairi and Alwahedi (2018) utilized features of the title and post for detecting fake posts and proposed a method in which users can install a tool into their browsers which can be used to identify and filter out sites that consist of inaccurate information. A similar tool can be used to detect and filter out COVID-19 misinformation. Furthermore, implementation of these AI concepts in a practical setting requires studies to validate accuracy and performance on independent and representative data (Khemasuwan & Colt, 2021). Validation of results across multiple datasets can be of great importance since performance of models and conclusions drawn can depend on data used. Hence, development and implementation of AI concepts can be time consuming and may require resources which include high computation power.

 

Conclusion

 

Due to various factors, there have been challenges and limitations for the use of AI in fighting COVID-19 misinformation. It is important to apply AI concepts within appropriate contexts in an appropriate manner to make the maximum benefits. With continuous research, it might be possible to improve existing methods and develop new methods for detecting and dealing with COVID-19 misinformation using AI.

 

References

 

Ahmad, I., Yousaf, M., Yousaf, S., & Ahmad, M. O. (2020). Fake news detection using machine learning ensemble methods. Complexity, 2020.

 

Al-Rakhami, M. S., & Al-Amri, A. M. (2020). Lies kill, facts save: detecting COVID-19 misinformation in twitter. Ieee Access, 8, 155961-155970.

 

Aldwairi, M., & Alwahedi, A. (2018). Detecting fake news in social media networks. Procedia Computer Science, 141, 215-222.

 

Dhande, M. (2020, July 3). What is the difference between AI, machine learning and deep learning? Geospatial World. 

https://www.geospatialworld.net/blogs/difference-between-ai%EF%BB%BF-mac...

 

Gupta, L., Gasparyan, A. Y., Misra, D. P., Agarwal, V., Zimba, O., & Yessirkepov, M. (2020). Information and misinformation on COVID-19: a cross-sectional survey study. Journal of Korean medical science, 35(27).

 

Hussna, A. U., Trisha, I. I., Karim, M. S., & Alam, M. G. R. (2021, August). COVID-19 Fake News Prediction On Social Media Data. In 2021 IEEE Region 10 Symposium (TENSYMP) (pp. 1-5). IEEE.

 

IBM Cloud Education. (2020, June 3). Artificial Intelligence (AI). IBM Cloud. https://www.ibm.com/cloud/learn/what-is-artificial-intelligence

 

Khemasuwan, D., & Colt, H. G. (2021). Applications and challenges of AI-based algorithms in the COVID-19 pandemic. BMJ Innovations, 7(2).

 

Lee, N., Bang, Y., Madotto, A., & Fung, P. (2020). Misinformation has high perplexity. arXiv preprint arXiv:2006.04666.

 

Loomba, S., de Figueiredo, A., Piatek, S. J., de Graaf, K., & Larson, H. J. (2021). Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nature human behaviour, 5(3), 337-348.

 

Mazzeo, V., Rapisarda, A., & Giuffrida, G. (2021). Detection of fake news on CoViD-19 on Web Search Engines. arXiv preprint arXiv:2103.11804.

 

Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., ... & Chakraborty, T. (2021, February). Fighting an infodemic: Covid-19 fake news dataset. In International Workshop on​ Combating On​line Ho​st​ile Posts in​ Regional Languages during Emergency Situation (pp. 21-29). Springer, Cham.

 

Perry, T. S. (2020, May 12). How Facebook Is Using AI to Fight COVID-19 Misinformation. IEEE Spectrum. https://spectrum.ieee.org/how-facebook-is-using-ai-to-fight-covid19-misinformation

 

Saenz, J. A., Gopal, S. R. K., & Shukla, D. (2021). Covid-19 Fake News Infodemic Research Dataset (CoVID19-FNIR Dataset). IEEE Dataport. https://dx.doi.org/10.21227/b5bt-5244

 

Serrano, J. C. M., Papakyriakopoulos, O., & Hegelich, S. (2020, July). NLP-based feature extraction for the detection of COVID-19 misinformation videos on YouTube. In Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020.

 

Singh, B., & Sharma, D. K. (2021). Predicting image credibility in fake news over social media using multi-modal approach. Neural Computing and Applications, 1-15.

 

Wani, A., Joshi, I., Khandve, S., Wagh, V., & Joshi, R. (2021, February). Evaluating deep learning approaches for covid19 fake news detection. In International Workshop on​ Combating On​line Ho​st​ile Posts in​ Regional Languages dur​ing Emergency Situation (pp. 153-163). Springer, Cham.

 

Wu, L., Morstatter, F., Carley, K. M., & Liu, H. (2019). Misinformation in social media: definition, manipulation, and detection. ACM SIGKDD Explorations Newsletter, 21(2), 80-90.

 

Yang, C., Zhou, X., & Zafarani, R. (2021). CHECKED: Chinese COVID-19 fake news dataset. Social Network Analysis and Mining, 11(1), 1-8.