Quantifying comedy on YouTube: why the number of o’s in your LOL matter
February 9, 2012
Posted by Sanketh Shetty, YouTube Slam Team, Google Research
Quick links
In a previous post, we talked about quantification of musical talent using machine learning on acoustic features for YouTube Music Slam. We wondered if we could do the same for funny videos, i.e. answer questions such as: is a video funny, how funny do viewers think it is, and why is it funny? We noticed a few audiovisual patterns across comedy videos on YouTube, such as shaky camera motion or audible laughter, which we can automatically detect. While content-based features worked well for music, identifying humor based on just such features is AI-Complete. Humor preference is subjective, perhaps even more so than musical taste.
Fortunately, at YouTube, we have more to work with. We focused on videos uploaded in the comedy category. We captured the uploader’s belief in the funniness of their video via features based on title, description and tags. Viewers’ reactions, in the form of comments, further validate a video’s comedic value. To this end we computed more text features based on words associated with amusement in comments. These included (a) sounds associated with laughter such as hahaha, with culture-dependent variants such as hehehe, jajaja, kekeke, (b) web acronyms such as lol, lmao, rofl, (c) funny and synonyms of funny, and (d) emoticons such as :), ;-), xP. We then trained classifiers to identify funny videos and then tell us why they are funny by categorizing them into genres such as “funny pets”, “spoofs or parodies”, “standup”, “pranks”, and “funny commercials”.
Next we needed an algorithm to rank these funny videos by comedic potential, e.g. is “Charlie bit my finger” funnier than “David after dentist”? Raw viewcount on its own is insufficient as a ranking metric since it is biased by video age and exposure. We noticed that viewers emphasize their reaction to funny videos in several ways: e.g. capitalization (LOL), elongation (loooooool), repetition (lolololol), exclamation (lolllll!!!!!), and combinations thereof. If a user uses an “loooooool” vs an “loool”, does it mean they were more amused? We designed features to quantify the degree of emphasis on words associated with amusement in viewer comments. We then trained a passive-aggressive ranking algorithm using human-annotated pairwise ground truth and a combination of text and audiovisual features. Similar to Music Slam, we used this ranker to populate candidates for human voting for our Comedy Slam.
So far, more than 75,000 people have cast more than 700,000 votes, making comedy our most popular slam category. Give it a try!
Further reading:
- “Opinion Mining and Sentiment Analysis,” by Bo Pang and Lillian Lee.
- “A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews,” by Oren Tsur, Dmitry Davidov, and Ari Rappoport.
- “That’s What She Said: Double Entendre Identification,” by Chloe Kiddon and Yuriy Brun.