II-1 Scraping, Cleaning & Static Feature Extraction

Contents

Slides


Code Demos

Demo 1: Web Scraping Example

Demo 2: Twitter Scraping Example

Demo 3: Scraping with Python additional material

Demo 4: Regular Expressions

Demo 5: Basic Text Cleaning

Demo 6: Static Feature Extraction


Exercises

Exercise 1: Scraping ๐Ÿ’ก Solution

Exercise 2: Regular Expressions ๐Ÿ’ก Solution

Exercise 3: Static Feature Extraction ๐Ÿ’ก Solution


Data

Twitter data set

Twitter corpus

Twitter dfm

Static features

Global polarity clues (extracted from here)


Further Reading

Wickham, H., and Grolemund, G. (2017): R for Data Science

rtweet Tutorial

CSS Tutorial

SelectorGadget

Python Web Scraping Tutorial (beautifulsoup)

Python Web Scraping (tweepy)

Regex Guide

Regex Cheatsheet

Regex Checking Interface

Somewhat Ugly But Super Extensive Regex Guide