|
1 | | -# CodeSamples |
| 1 | +# CodeSamples |
| 2 | + |
| 3 | +This repository contains two different code samples. The first, `VehicularAccidents_311ServiceRequestAnalysis`, adapts work I completed for a final project in my master’s program. The second folder, `CapstoneDataProcessing`, is the core of my code for my capstone project, which is still in progress as of November 2025. |
| 4 | + |
| 5 | +Together, these notebooks show how I work end to end with messy public-sector data: acquiring and cleaning datasets, joining across sources, building features, doing exploratory analysis, and organizing code in a way that is easy for others to follow. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## VehicularAccidents_311ServiceRequestAnalysis |
| 10 | + |
| 11 | +#### NYC Accidents & 311 Complaints.ipynb |
| 12 | + |
| 13 | +**Overall goal** |
| 14 | +Explore how New York City vehicular accidents and 311 service complaints line up in space and time, to see where traffic safety issues and resident-reported problems overlap. |
| 15 | + |
| 16 | +**What this notebook does** |
| 17 | +1. Loads open data on motor vehicle collisions and 311 complaints for New York City. |
| 18 | +2. Cleans and standardizes fields such as dates, times, locations, and complaint categories. |
| 19 | +3. Joins and aggregates the datasets by location and time (for example, neighborhood or borough and time of day). |
| 20 | +4. Builds features that describe accident patterns and complaint patterns, such as counts, rates, and complaint types. |
| 21 | +5. Uses charts and tables to highlight hotspots and possible relationships between accidents and specific 311 complaint categories. |
| 22 | + |
| 23 | +**Skills showcased** |
| 24 | +1. Working with real-world city open data in Jupyter. |
| 25 | +2. Data cleaning and feature engineering on multi-source datasets. |
| 26 | +3. Joining and aggregating data across time and geography. |
| 27 | +4. Exploratory data analysis and clear visualization to tell a story around safety and service delivery. |
| 28 | + |
| 29 | +--- |
| 30 | + |
| 31 | +## CapstoneDataProcessing (PROJECT IN PROGRESS) |
| 32 | + |
| 33 | +The notebooks in this folder support my capstone project, which looks at the relationship between urban trees and building energy performance in New York City. The focus is on building a reliable, analysis-ready dataset from multiple open data sources. NOTE: This project is currently in progress, and has been simplified to serve a sample. However, it still is a bit rough. |
| 34 | + |
| 35 | +#### BuildingWork_Part1.ipynb |
| 36 | + |
| 37 | +**Overall goal** |
| 38 | +Create a building-level panel dataset of New York City energy and water benchmarking results that can be linked to building geometry and later joined to tree canopy measures. |
| 39 | + |
| 40 | +**What this notebook does** |
| 41 | +1. Pulls several years of Local Law 84 benchmarking data for New York City buildings. |
| 42 | +2. Cleans and standardizes building identifiers across years, including handling missing or inconsistent IDs. |
| 43 | +3. Geocodes buildings and assigns them to tax lots and building footprints. |
| 44 | +4. Joins in additional building attributes such as zoning, height, and other physical characteristics. |
| 45 | +5. Outputs a consistent building-level file that is ready for downstream spatial analysis and modeling. |
| 46 | + |
| 47 | +**Sources used in this notebook include** |
| 48 | +1. New York City Open Data building energy and water disclosure datasets for Local Law 84 (multiple years). |
| 49 | +2. New York City Open Data building historic records. |
| 50 | +3. New York City Open Data building elevation and subgrade data. |
| 51 | +4. New York City Planning MapPLUTO building footprint and tax lot data. |
| 52 | +5. New York City Planning tree canopy change data derived from LIDAR. |
| 53 | +6. New York City Planning Labs geosearch service for address cleaning and geocoding. |
| 54 | + |
| 55 | +**Skills showcased** |
| 56 | +1. Building robust ETL pipelines for large civic datasets. |
| 57 | +2. Cleaning and reconciling multi-year records at the building level. |
| 58 | +3. Basic geospatial processing and joining tabular data to spatial layers. |
| 59 | +4. Preparing high-quality, reusable datasets for analysis. |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +#### TreeWork_Part2.ipynb |
| 64 | + |
| 65 | +**Overall goal** |
| 66 | +Construct a tree-level dataset that tracks which street trees exist in each year and when they are removed, based on inventory and work order records. |
| 67 | + |
| 68 | +**What this notebook does** |
| 69 | +1. Loads New York City street tree inventory data and forestry work order records from open data. |
| 70 | +2. Cleans and aligns tree identifiers, locations, and key fields across the two datasets. |
| 71 | +3. Uses work order histories to infer tree removal dates and statuses. |
| 72 | +4. Restricts analysis to closed tickets to ensure reliable outcomes. |
| 73 | +5. Builds a yearly tree-level file that can be joined to nearby buildings for panel analysis. |
| 74 | + |
| 75 | +**Sources used in this notebook include** |
| 76 | +1. New York City Open Data street tree inventory and point location data. |
| 77 | +2. New York City Open Data tree work order and forestry service request records. |
| 78 | +3. New York City Open Data geographic reference layers for streets and neighborhoods. |
| 79 | +4. New York City planning and open data portals for reference building and location context. |
| 80 | + |
| 81 | +**Skills showcased** |
| 82 | +1. Working with operational city data that is noisy and event based. |
| 83 | +2. Designing rule-based logic to infer entity status over time (for example, whether a tree is still present). |
| 84 | +3. Combining event histories with inventory data to build panel-style datasets. |
| 85 | +4. Preparing features that can be linked to other spatial units such as buildings. |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +#### Analysis_Part3.ipynb |
| 90 | + |
| 91 | +**Overall goal** |
| 92 | +Bring together the building and tree datasets from Parts 1 and 2 to study how changes in tree canopy relate to building energy performance. This analysis is in progress for this semester. |
| 93 | + |
| 94 | +**What this notebook does** |
| 95 | +1. Merges the building-level dataset with nearby tree and canopy information. |
| 96 | +2. Filters to buildings with usable energy use and building characteristics. |
| 97 | +3. Creates final features that capture tree canopy exposure and changes over time at the building level. |
| 98 | +4. Runs exploratory analysis and first-pass models to examine relationships between canopy metrics and building energy intensity. |
| 99 | +5. Documents open questions and next steps for improving the models and interpretation. |
| 100 | + |
| 101 | +**Status and skills showcased** |
| 102 | +1. This analysis is still in progress for the current semester, and I am continuing to refine the modeling and diagnostics. |
| 103 | +2. Demonstrates the ability to stitch multi-step data pipelines into a single analysis. |
| 104 | +3. Shows familiarity with panel-style data and modeling energy outcomes as a function of environmental and built-environment features. |
| 105 | +4. Highlights communication of intermediate results, limitations, and planned future work. |
0 commit comments