Social Media Political Analysis 🇺🇸

Springboard Open-ended Capstone

Objectives

Problems & Questions

How can we better develop educational materials to meet kids where they are?

Is it worth it to spend money to advertise to youth for political campaigns - are they engaging with current events?
What politics & policies are The Youth™ talking about & why?

Goals

to analyze how age/youth impacts political indoctrination and participation
to track social impacts of political events
to understand colloquial knowledge of political concepts

Overview:

Use NewsAPI to find top news by day
Parse news story title & article into individual words/phrases
Count most important individual words & phrases
Use top 3 most important words & phrases to search TikTok & Twitter
Count number of tweets & TikToks mentioning key words & phrases

Installation & Use

config.ini should have the following layout and info:

[database]
  host = <hostname>
  database = <db name>
  user = <db username>
  password = <db password>

[newsAuth]
  api_key = <NewsAPI.org API key>

[tiktokAuth]
  s_v_web_id = <s_v_web_id>
  tt_web_id = <tt_web_id>

[twitterAuth]
  access_token = <access_token>
  access_token_secret = <access_token_secret>
  consumer_key = <consumer_key>
  consumer_secret = <consumer_secret>

Note: headers and keys/variables in config.ini file don't need to be stored as strings, but when calling them in program, enclose references with quotes

To scrape the web without getting blocked:

Clone the following url into your project directory using Git or checkout with SVN: https://github.com/tamimibrahim17/List-of-user-agents.git. These .txt files contain User Agents and are specified by browser (shout out Timam Ibrahim!). They will be randomized to avoid detection by web browsers.

To find your `s_v_web_id` & `tt_web_id` for TikTokAPI access:

Go to the TikTok website & login
If using Google Chrome, open the developer console
Go to the 'Application' tab
Find & click 'Cookies' in the left-hand panel →
On the resulting screen, look for s_v_web_id and tt_web_id under the 'name' column

Find more information about .ini configuration files in Python documentation: https://docs.python.org/3/library/configparser.html

Exploratory Data Analysis

This step was guided by my anticipation that the data will be used for trend graphing, sentiment analysis, age inference, and correlation between user characteristics and extent of participation in responding to political events. With this in mind, I plan to optimize query speed via limiting storage by geographic location of both users and events to the US (though this categorization may be loose at times because of US involvement on the world stage). My PostgreSQL database will also be sharded by datetime, as the analytical window references 3 days before and 3 days after the political event of interest.

At this point, I've found that my GET requests for popular news articles return various formats which must be converted, unpacked, or otherwise transformed. ~~Because of this added step, I opted to keep the article text in a separate table within the database.~~ The article text in the news table will serve as data sources from which to extract our top 3 keywords (per article) using Term Frequency-Inverse Document Frequency (TF-IDF) calculations. These keywords will be applied to our searches for related TikToks and tweets. For some resources on this, check out TF-IDF in the real world and a step-by-step guide by Prachi Prakash.

With data from social media adding a pop-cultural context to political news, we inch closer to an understanding of TikTok and Twitter as novel forms of youth political engagement!

Getting Tweets

This project uses Tweepy's tweet search method to search for tweets within the past seven (7) days using the keywords produced from the .get_all_news() method. A separate Tweepy Stream Listener subclass catches tweets (statuses) that contain our keywords of interest as they are tweeted. The max stream rate for Twitter's API (upon which Tweepy is based) is 450.

Setting Up Kafka Streaming Application (Scala ==> Python)

application.conf file should look like:

com.ram.batch {
spark {
  app-name = <app name>
  master = "local"
  log-level = "INFO"
}
postgres { 
  url = "postgresql://localhost/<db name>"
  username = <db user>
  password = <db pwd>
}
}

Note: these are strings and must be enclosed in quotation marks.

Make sure you customize your connection string url to the database you use
The tweetStream Python class invoked in my pipeline includes a Kafka broker: self.producer = KakfaProducer(bootstrap_servers='localhost')
The broker & the Kafka streaming application are two separate files & entities

Scala applications require 'application'.sbt files that include the name of the app, the version, the scalaVersion, and library dependencies:

```
name := "Tweet Stream"

version := "1.0"

scalaVersion := "3.0.2"

libraryDependencies += "org.apache.spark" %% "spark-core" % "3.1.1"
```

Find your Spark version by using spark-submit --version on the command line.

M1 Processor Issues

I had to find a compatible SDK to install sbt:

  curl -s "https://get.sdkman.io" | zsh 
  source "$HOME/.sdkman/bin/sdkman-init.sh" 
  sdk version
  sdk install java
  sdk install sbt
  sbt compile

Then finally run sbt compile on the command line from my project directory.

Common Tweet Streaming Issues:

Make sure any json file that is being used to store tweets is opened with the 'a' designator for 'append' or else each tweet will overwrite the last
If sbt won't compile, ensure that your .sbt file dependencies are the correct versions
Make sure your application directory mirrors what is found in the Spark documentation so it can compile properly.

Getting TikToks

This project uses Avilash Kumar's TikTokAPI. Refer to their GitHub for further information.

Common TikTok Streaming Issues:

TBA

Check out this project's slide deck ⤵

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
List-of-user-agents		List-of-user-agents
helper_functions		helper_functions
tests		tests
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
database.py		database.py
errors_exceptions.py		errors_exceptions.py
keywords.py		keywords.py
news.py		news.py
poetry.lock		poetry.lock
political_analysis.ipynb		political_analysis.ipynb
political_analysis.py		political_analysis.py
political_analysis.toml		political_analysis.toml
tiktok.py		tiktok.py
timer.py		timer.py
tweets.py		tweets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Social Media Political Analysis 🇺🇸

Springboard Open-ended Capstone

Objectives

Problems & Questions

Goals

Overview:

Installation & Use

To scrape the web without getting blocked:

To find your `s_v_web_id` & `tt_web_id` for TikTokAPI access:

Exploratory Data Analysis

Getting Tweets

Setting Up Kafka Streaming Application (Scala ==> Python)

Common Tweet Streaming Issues:

Getting TikToks

Common TikTok Streaming Issues:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Social Media Political Analysis 🇺🇸

Springboard Open-ended Capstone

Objectives

Problems & Questions

Goals

Overview:

Installation & Use

To scrape the web without getting blocked:

To find your s_v_web_id & tt_web_id for TikTokAPI access:

Exploratory Data Analysis

Getting Tweets

Setting Up Kafka Streaming Application (Scala ==> Python)

Common Tweet Streaming Issues:

Getting TikToks

Common TikTok Streaming Issues:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

To find your `s_v_web_id` & `tt_web_id` for TikTokAPI access:

Packages