Model for opinion spam detection based on multi-iteration graph structure

Noekhah, Shirin

Model for opinion spam detection based on multi-iteration graph structure

dc.contributor.author	Noekhah, Shirin
dc.date.accessioned	2024-02-21T23:45:10Z
dc.date.available	2024-02-21T23:45:10Z
dc.date.issued	2020
dc.description	Thesis (PhD. (Computer Science))
dc.description.abstract	Opinion spam detection can be done by using either Machine Learning (ML) or graph-based approaches by analyzing of the spamming activities that exist among entities such as reviews, reviewers, groups of reviewers, and products. Compared to machine learning techniques, which depend on a large number of a labeled dataset and human annotators, graph-based approaches effectively consider all entities within a unified structure and reveal the relationships that exist among them. However, the existing graph-based techniques do not evaluate the spamming activities for all entities. Moreover, they have applied a few numbers of features which cannot capture many behavioral and linguistic concepts of reviews and reviewers. Most existing techniques use Amazon Mechanical Turk (AMT) to produce spam reviews, while the spam reviews produced by this method cannot reflect real-world spam reviews’ characteristics. This study addresses these issues by developing a graph-based model using a multi-iterative algorithm that considers all entities and their relationships simultaneously in the Amazon dataset. Besides exploiting the most useful set of behavior-based and content-based features, additional new sets of features were proposed such as sentiment-rate difference, review group agreement, rate for a trend change, and reviewer burstiness status to enhance the detection accuracy. The results proved that by adding the proposed novel features, the accuracy of opinion spam detection could be enhanced by 1.9%. A multi-iterative algorithm has been utilized to deal with different entities’ relationships and features. It extracts the implicit and explicit relationships based on the graph structure and updates the spamicity score of entities during a finite number of iterations based on their effects on each other. Furthermore, the model was extended and evaluated on the new labeled synthetic dataset to assess the usefulness of the model for both real-world and synthetic spam reviews. The findings of this study showed that Multi-iterative Graph-based opinion Spam Detection (MGSD) model could improve the accuracy of state-of-the-art ML (e.g., Deep semantic frame-based and Deep Level Linguistic models) and graph-based techniques (e.g., NetSpam and Factor graph-based models) by around 5.6% and 4.8%, respectively. Besides, an accuracy of 93% for the detection of spam detection in the synthetic crowdsourced dataset and 95.3% for Ott's crowdsourced dataset were also achieved. Therefore, the proposed model is a domain-independent model as it not only can perform well on real-world opinionated documents but also detect the synthetic spam reviews, produced by fake reviewers with acceptable accuracy. Finally, the state-of-the-art graph-based methods were implemented on the datasets, and the results proved that the MGSD outperformed these techniques with an accuracy of 91.2%.
dc.description.sponsorship	Faculty of Engineering - School of Computing
dc.identifier.uri	http://openscience.utm.my/handle/123456789/1012
dc.language.iso	en
dc.publisher	Universiti Teknologi Malaysia
dc.subject	Spam (Electronic mail)
dc.subject	Spam filtering (Electronic mail)
dc.subject	Electronic mail systems
dc.title	Model for opinion spam detection based on multi-iteration graph structure
dc.type	Thesis
dc.type	Dataset