Model for opinion spam detection based on multi-iteration graph structure
Loading...
Date
2020
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Teknologi Malaysia
Abstract
Opinion spam detection can be done by using either Machine Learning (ML) or graph-based approaches by analyzing of the spamming activities that exist among entities such as reviews, reviewers, groups of reviewers, and products. Compared to machine learning techniques, which depend on a large number of a labeled dataset and human annotators, graph-based approaches effectively consider all entities within a unified structure and reveal the relationships that exist among them. However, the existing graph-based techniques do not evaluate the spamming activities for all entities. Moreover, they have applied a few numbers of features which cannot capture many behavioral and linguistic concepts of reviews and reviewers. Most existing techniques use Amazon Mechanical Turk (AMT) to produce spam reviews, while the spam reviews produced by this method cannot reflect real-world spam reviews’ characteristics. This study addresses these issues by developing a graph-based model using a multi-iterative algorithm that considers all entities and their relationships simultaneously in the Amazon dataset. Besides exploiting the most useful set of behavior-based and content-based features, additional new sets of features were proposed such as sentiment-rate difference, review group agreement, rate for a trend change, and reviewer burstiness status to enhance the detection accuracy. The results proved that by adding the proposed novel features, the accuracy of opinion spam detection could be enhanced by 1.9%. A multi-iterative algorithm has been utilized to deal with different entities’ relationships and features. It extracts the implicit and explicit relationships based on the graph structure and updates the spamicity score of entities during a finite number of iterations based on their effects on each other. Furthermore, the model was extended and evaluated on the new labeled synthetic dataset to assess the usefulness of the model for both real-world and synthetic spam reviews. The findings of this study showed that Multi-iterative Graph-based opinion Spam Detection (MGSD) model could improve the accuracy of state-of-the-art ML (e.g., Deep semantic frame-based and Deep Level Linguistic models) and graph-based techniques (e.g., NetSpam and Factor graph-based models) by around 5.6% and 4.8%, respectively. Besides, an accuracy of 93% for the detection of spam detection in the synthetic crowdsourced dataset and 95.3% for Ott's crowdsourced dataset were also achieved. Therefore, the proposed model is a domain-independent model as it not only can perform well on real-world opinionated documents but also detect the synthetic spam reviews, produced by fake reviewers with acceptable accuracy. Finally, the state-of-the-art graph-based methods were implemented on the datasets, and the results proved that the MGSD outperformed these techniques with an accuracy of 91.2%.
Description
Thesis (PhD. (Computer Science))
Keywords
Spam (Electronic mail), Spam filtering (Electronic mail), Electronic mail systems