Model for opinion spam detection based on multi-iteration graph structure

dc.contributor.authorNoekhah, Shirin
dc.date.accessioned2024-02-21T23:45:10Z
dc.date.available2024-02-21T23:45:10Z
dc.date.issued2020
dc.descriptionThesis (PhD. (Computer Science))
dc.description.abstractOpinion spam detection can be done by using either Machine Learning (ML) or graph-based approaches by analyzing of the spamming activities that exist among entities such as reviews, reviewers, groups of reviewers, and products. Compared to machine learning techniques, which depend on a large number of a labeled dataset and human annotators, graph-based approaches effectively consider all entities within a unified structure and reveal the relationships that exist among them. However, the existing graph-based techniques do not evaluate the spamming activities for all entities. Moreover, they have applied a few numbers of features which cannot capture many behavioral and linguistic concepts of reviews and reviewers. Most existing techniques use Amazon Mechanical Turk (AMT) to produce spam reviews, while the spam reviews produced by this method cannot reflect real-world spam reviews’ characteristics. This study addresses these issues by developing a graph-based model using a multi-iterative algorithm that considers all entities and their relationships simultaneously in the Amazon dataset. Besides exploiting the most useful set of behavior-based and content-based features, additional new sets of features were proposed such as sentiment-rate difference, review group agreement, rate for a trend change, and reviewer burstiness status to enhance the detection accuracy. The results proved that by adding the proposed novel features, the accuracy of opinion spam detection could be enhanced by 1.9%. A multi-iterative algorithm has been utilized to deal with different entities’ relationships and features. It extracts the implicit and explicit relationships based on the graph structure and updates the spamicity score of entities during a finite number of iterations based on their effects on each other. Furthermore, the model was extended and evaluated on the new labeled synthetic dataset to assess the usefulness of the model for both real-world and synthetic spam reviews. The findings of this study showed that Multi-iterative Graph-based opinion Spam Detection (MGSD) model could improve the accuracy of state-of-the-art ML (e.g., Deep semantic frame-based and Deep Level Linguistic models) and graph-based techniques (e.g., NetSpam and Factor graph-based models) by around 5.6% and 4.8%, respectively. Besides, an accuracy of 93% for the detection of spam detection in the synthetic crowdsourced dataset and 95.3% for Ott's crowdsourced dataset were also achieved. Therefore, the proposed model is a domain-independent model as it not only can perform well on real-world opinionated documents but also detect the synthetic spam reviews, produced by fake reviewers with acceptable accuracy. Finally, the state-of-the-art graph-based methods were implemented on the datasets, and the results proved that the MGSD outperformed these techniques with an accuracy of 91.2%.
dc.description.sponsorshipFaculty of Engineering - School of Computing
dc.identifier.urihttp://openscience.utm.my/handle/123456789/1012
dc.language.isoen
dc.publisherUniversiti Teknologi Malaysia
dc.subjectSpam (Electronic mail)
dc.subjectSpam filtering (Electronic mail)
dc.subjectElectronic mail systems
dc.titleModel for opinion spam detection based on multi-iteration graph structure
dc.typeThesis
dc.typeDataset
Files
Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
ShirinNoekhahPSC2020_B.pdf
Size:
220.19 KB
Format:
Adobe Portable Document Format
Description:
Samples of the Expert Product Reviews
Loading...
Thumbnail Image
Name:
ShirinNoekhahPSC2020_C.pdf
Size:
18.51 KB
Format:
Adobe Portable Document Format
Description:
List of Data mining Features
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: