Skip to content

nhanle1111/BDA594-NhanLe

Repository files navigation

Student Information

Name: Nhan Le

Class: BDA 594

This repository showcases two applied data analytics projects completed for SDSU’s BDA 594: Big Data Analytics course. Each project focuses on real-world datasets and demonstrates skills in data cleaning, exploratory data analysis, visualization, natural language processing, and social network analysis. Together, they highlight a well-rounded set of capabilities relevant to data analyst and data science roles.

📊 1. R Data Analysis Project: Public Health & Text Mining

This project explores public health trends in San Diego County using the dataset Leading_Causes_of_Death_in_SD_2011_2016.csv. Key components include:

  • Importing, cleaning, and exploring a multi-year mortality dataset

  • Computing summary statistics and identifying patterns across regions and time

  • Creating clear visualizations with ggplot2 to communicate insights

  • Performing text mining using custom corpora, including:

    • Class definitions of “Big Data”
    • A historical text (England Opium Monopoly.txt)
    • Generating two sets of word clouds using tm, wordcloud, and RColorBrewer
    • Building reproducible R scripts covering data wrangling, visualization, and NLP

This project demonstrates proficiency in R programming, EDA, statistical reasoning, and natural language processing.

🕸 2. Social Network Analysis with Gephi

This project constructs and analyzes a social network based on Twitter conversations about vaccine-exemption topics. The workflow includes:

  • Cleaning raw tweet data using OpenRefine, including Clojure-based text extraction
  • Extracting user mentions and retweets to build a usable EdgeList
  • Final data preparation and validation in Excel
  • Importing the network into Gephi and generating interactive network layouts
  • Applying layout algorithms such as Fruchterman-Reingold and ForceAtlas
  • Computing key network metrics: in-degree, out-degree, modularity, density, and network diameter
  • Producing final visualizations for both In-Degree and Out-Degree networks

This project highlights skills in social network analysis, graph theory, data cleaning, and visualization using OpenRefine and Gephi.

About

The repository is created as a learning project for SDSU BDA594 class.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages