Large-Scale Automatic Classification of Phishing Pages

Colin Whittaker; Brian Ryner; Marria Nazif

Large-Scale Automatic Classification of Phishing Pages

Colin Whittaker

Brian Ryner

Marria Nazif

NDSS '10 (2010)

Download Google Scholar

Abstract

Phishing websites, fraudulent sites that trick viewers into interacting with them, continue to cost Internet users over a billion dollars each year. In this paper, we describe the design and performance characteristics of a scalable machine learning classifier we developed to detect phishing web sites. We use this classifier to maintain Google's phishing blacklist automatically. Our classifier analyzes millions of pages a day, examining the URL and the contents of a page to determine whether or not a page is phishing. Unlike previous work in this field, we train the classifier on a noisy dataset consisting of millions of samples from previously collected live classification data. Despite the noise in the training data, our classifier learns a robust model for identifying phishing pages which correctly classifies more than 90% of phishing pages several weeks after training concludes.

Research Areas

Anti abuse

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Large-Scale Automatic Classification of Phishing Pages

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs