Skip to content

Commit f444fb5

Browse files
committed
adding inprogress capstone data
1 parent 991ef94 commit f444fb5

File tree

9 files changed

+34075
-45
lines changed

9 files changed

+34075
-45
lines changed

CapstoneDataProcessing/.ipynb_checkpoints/Analysis_Part3-checkpoint.ipynb

Lines changed: 6088 additions & 0 deletions
Large diffs are not rendered by default.

CapstoneDataProcessing/.ipynb_checkpoints/BuildingWork_Part1-checkpoint.ipynb

Lines changed: 9375 additions & 0 deletions
Large diffs are not rendered by default.

CapstoneDataProcessing/.ipynb_checkpoints/TreeWork_Part2-checkpoint.ipynb

Lines changed: 1498 additions & 0 deletions
Large diffs are not rendered by default.

CapstoneDataProcessing/Analysis_Part3.ipynb

Lines changed: 6088 additions & 0 deletions
Large diffs are not rendered by default.

CapstoneDataProcessing/BuildingWork_Part1.ipynb

Lines changed: 9375 additions & 0 deletions
Large diffs are not rendered by default.

CapstoneDataProcessing/TreeWork_Part2.ipynb

Lines changed: 1498 additions & 0 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 105 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,105 @@
1-
# CodeSamples
1+
# CodeSamples
2+
3+
This repository contains two different code samples. The first, `VehicularAccidents_311ServiceRequestAnalysis`, adapts work I completed for a final project in my master’s program. The second folder, `CapstoneDataProcessing`, is the core of my code for my capstone project, which is still in progress as of November 2025.
4+
5+
Together, these notebooks show how I work end to end with messy public-sector data: acquiring and cleaning datasets, joining across sources, building features, doing exploratory analysis, and organizing code in a way that is easy for others to follow.
6+
7+
---
8+
9+
## VehicularAccidents_311ServiceRequestAnalysis
10+
11+
#### NYC Accidents & 311 Complaints.ipynb
12+
13+
**Overall goal**
14+
Explore how New York City vehicular accidents and 311 service complaints line up in space and time, to see where traffic safety issues and resident-reported problems overlap.
15+
16+
**What this notebook does**
17+
1. Loads open data on motor vehicle collisions and 311 complaints for New York City.
18+
2. Cleans and standardizes fields such as dates, times, locations, and complaint categories.
19+
3. Joins and aggregates the datasets by location and time (for example, neighborhood or borough and time of day).
20+
4. Builds features that describe accident patterns and complaint patterns, such as counts, rates, and complaint types.
21+
5. Uses charts and tables to highlight hotspots and possible relationships between accidents and specific 311 complaint categories.
22+
23+
**Skills showcased**
24+
1. Working with real-world city open data in Jupyter.
25+
2. Data cleaning and feature engineering on multi-source datasets.
26+
3. Joining and aggregating data across time and geography.
27+
4. Exploratory data analysis and clear visualization to tell a story around safety and service delivery.
28+
29+
---
30+
31+
## CapstoneDataProcessing (PROJECT IN PROGRESS)
32+
33+
The notebooks in this folder support my capstone project, which looks at the relationship between urban trees and building energy performance in New York City. The focus is on building a reliable, analysis-ready dataset from multiple open data sources. NOTE: This project is currently in progress, and has been simplified to serve a sample. However, it still is a bit rough.
34+
35+
#### BuildingWork_Part1.ipynb
36+
37+
**Overall goal**
38+
Create a building-level panel dataset of New York City energy and water benchmarking results that can be linked to building geometry and later joined to tree canopy measures.
39+
40+
**What this notebook does**
41+
1. Pulls several years of Local Law 84 benchmarking data for New York City buildings.
42+
2. Cleans and standardizes building identifiers across years, including handling missing or inconsistent IDs.
43+
3. Geocodes buildings and assigns them to tax lots and building footprints.
44+
4. Joins in additional building attributes such as zoning, height, and other physical characteristics.
45+
5. Outputs a consistent building-level file that is ready for downstream spatial analysis and modeling.
46+
47+
**Sources used in this notebook include**
48+
1. New York City Open Data building energy and water disclosure datasets for Local Law 84 (multiple years).
49+
2. New York City Open Data building historic records.
50+
3. New York City Open Data building elevation and subgrade data.
51+
4. New York City Planning MapPLUTO building footprint and tax lot data.
52+
5. New York City Planning tree canopy change data derived from LIDAR.
53+
6. New York City Planning Labs geosearch service for address cleaning and geocoding.
54+
55+
**Skills showcased**
56+
1. Building robust ETL pipelines for large civic datasets.
57+
2. Cleaning and reconciling multi-year records at the building level.
58+
3. Basic geospatial processing and joining tabular data to spatial layers.
59+
4. Preparing high-quality, reusable datasets for analysis.
60+
61+
---
62+
63+
#### TreeWork_Part2.ipynb
64+
65+
**Overall goal**
66+
Construct a tree-level dataset that tracks which street trees exist in each year and when they are removed, based on inventory and work order records.
67+
68+
**What this notebook does**
69+
1. Loads New York City street tree inventory data and forestry work order records from open data.
70+
2. Cleans and aligns tree identifiers, locations, and key fields across the two datasets.
71+
3. Uses work order histories to infer tree removal dates and statuses.
72+
4. Restricts analysis to closed tickets to ensure reliable outcomes.
73+
5. Builds a yearly tree-level file that can be joined to nearby buildings for panel analysis.
74+
75+
**Sources used in this notebook include**
76+
1. New York City Open Data street tree inventory and point location data.
77+
2. New York City Open Data tree work order and forestry service request records.
78+
3. New York City Open Data geographic reference layers for streets and neighborhoods.
79+
4. New York City planning and open data portals for reference building and location context.
80+
81+
**Skills showcased**
82+
1. Working with operational city data that is noisy and event based.
83+
2. Designing rule-based logic to infer entity status over time (for example, whether a tree is still present).
84+
3. Combining event histories with inventory data to build panel-style datasets.
85+
4. Preparing features that can be linked to other spatial units such as buildings.
86+
87+
---
88+
89+
#### Analysis_Part3.ipynb
90+
91+
**Overall goal**
92+
Bring together the building and tree datasets from Parts 1 and 2 to study how changes in tree canopy relate to building energy performance. This analysis is in progress for this semester.
93+
94+
**What this notebook does**
95+
1. Merges the building-level dataset with nearby tree and canopy information.
96+
2. Filters to buildings with usable energy use and building characteristics.
97+
3. Creates final features that capture tree canopy exposure and changes over time at the building level.
98+
4. Runs exploratory analysis and first-pass models to examine relationships between canopy metrics and building energy intensity.
99+
5. Documents open questions and next steps for improving the models and interpretation.
100+
101+
**Status and skills showcased**
102+
1. This analysis is still in progress for the current semester, and I am continuing to refine the modeling and diagnostics.
103+
2. Demonstrates the ability to stitch multi-step data pipelines into a single analysis.
104+
3. Shows familiarity with panel-style data and modeling energy outcomes as a function of environmental and built-environment features.
105+
4. Highlights communication of intermediate results, limitations, and planned future work.

VehicularAccidents_311ServiceRequestAnalysis/.ipynb_checkpoints/NYC Accidents & 311 Complaints-checkpoint.ipynb

Lines changed: 24 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,12 @@
1111
},
1212
{
1313
"cell_type": "markdown",
14-
"id": "eaa8a1b0",
14+
"id": "e4360856",
1515
"metadata": {},
1616
"source": [
17-
"###### NOTE: This code sample was taken from a part of my final project for one of my classes this semester (Spring 2025). The analysis, initially completed in R, was a custom analysis using publicly sourced data. This is not to be considered a full report / analysis, but only a partial one in order to provide a sample. Thanks. "
17+
"###### NOTE: This code sample was taken from a part of my final project for one of my classes this semester (Spring 2025). The analysis, initially completed in R, was a custom analysis using publicly sourced data. This is not to be considered a full report / analysis, but only a partial one in order to provide a sample. Thanks. \n",
18+
"\n",
19+
"##### The GitHub link to this file is here: https://github.com/jhnboyy/CodeSamples/blob/991ef94e90bb1cf96189fe68f3e488e31d5b0901/VehicularAccidents_311ServiceRequestAnalysis/NYC%20Accidents%20%26%20311%20Complaints.ipynb"
1820
]
1921
},
2022
{
@@ -2349,7 +2351,7 @@
23492351
{
23502352
"cell_type": "code",
23512353
"execution_count": 21,
2352-
"id": "7d3e3bb4",
2354+
"id": "d529c8fa",
23532355
"metadata": {},
23542356
"outputs": [
23552357
{
@@ -2396,7 +2398,7 @@
23962398
{
23972399
"cell_type": "code",
23982400
"execution_count": 20,
2399-
"id": "b1a84fea",
2401+
"id": "62b1c764",
24002402
"metadata": {},
24012403
"outputs": [
24022404
{
@@ -2432,7 +2434,7 @@
24322434
{
24332435
"cell_type": "code",
24342436
"execution_count": null,
2435-
"id": "15e921cd",
2437+
"id": "3a19c7f2",
24362438
"metadata": {},
24372439
"outputs": [],
24382440
"source": []
@@ -3197,7 +3199,7 @@
31973199
{
31983200
"cell_type": "code",
31993201
"execution_count": 33,
3200-
"id": "fe8ac782",
3202+
"id": "354a989a",
32013203
"metadata": {
32023204
"scrolled": true
32033205
},
@@ -3536,7 +3538,7 @@
35363538
{
35373539
"cell_type": "code",
35383540
"execution_count": null,
3539-
"id": "0a8f104c",
3541+
"id": "258e417e",
35403542
"metadata": {},
35413543
"outputs": [],
35423544
"source": [
@@ -4213,7 +4215,7 @@
42134215
{
42144216
"cell_type": "code",
42154217
"execution_count": 70,
4216-
"id": "5d53e98f",
4218+
"id": "444f050f",
42174219
"metadata": {},
42184220
"outputs": [
42194221
{
@@ -4258,7 +4260,7 @@
42584260
},
42594261
{
42604262
"cell_type": "markdown",
4261-
"id": "e4ddc484",
4263+
"id": "f4ad2de0",
42624264
"metadata": {},
42634265
"source": [
42644266
"#### Overdispersion Test"
@@ -4267,7 +4269,7 @@
42674269
{
42684270
"cell_type": "code",
42694271
"execution_count": 72,
4270-
"id": "db9b6878",
4272+
"id": "3e321870",
42714273
"metadata": {},
42724274
"outputs": [
42734275
{
@@ -4318,7 +4320,7 @@
43184320
{
43194321
"cell_type": "code",
43204322
"execution_count": null,
4321-
"id": "2b439094",
4323+
"id": "467f9389",
43224324
"metadata": {},
43234325
"outputs": [],
43244326
"source": [
@@ -4328,7 +4330,7 @@
43284330
},
43294331
{
43304332
"cell_type": "markdown",
4331-
"id": "f9ab2813",
4333+
"id": "0078463b",
43324334
"metadata": {},
43334335
"source": [
43344336
"#### Splitting the data into Training and Test Sets "
@@ -4337,7 +4339,7 @@
43374339
{
43384340
"cell_type": "code",
43394341
"execution_count": 116,
4340-
"id": "491a61f5",
4342+
"id": "c321bbf5",
43414343
"metadata": {},
43424344
"outputs": [],
43434345
"source": [
@@ -4354,7 +4356,7 @@
43544356
},
43554357
{
43564358
"cell_type": "markdown",
4357-
"id": "879c836c",
4359+
"id": "897b6e8f",
43584360
"metadata": {},
43594361
"source": [
43604362
"#### Modeling Start "
@@ -4363,7 +4365,7 @@
43634365
{
43644366
"cell_type": "code",
43654367
"execution_count": 122,
4366-
"id": "d0007c85",
4368+
"id": "a48fcacb",
43674369
"metadata": {},
43684370
"outputs": [
43694371
{
@@ -4409,7 +4411,7 @@
44094411
{
44104412
"cell_type": "code",
44114413
"execution_count": 121,
4412-
"id": "fa5f9ecf",
4414+
"id": "1d41c76d",
44134415
"metadata": {},
44144416
"outputs": [
44154417
{
@@ -4484,7 +4486,7 @@
44844486
{
44854487
"cell_type": "code",
44864488
"execution_count": 123,
4487-
"id": "f0535e26",
4489+
"id": "71351ba2",
44884490
"metadata": {},
44894491
"outputs": [
44904492
{
@@ -4513,7 +4515,7 @@
45134515
{
45144516
"cell_type": "code",
45154517
"execution_count": 125,
4516-
"id": "4d60ba2f",
4518+
"id": "82c911a2",
45174519
"metadata": {},
45184520
"outputs": [
45194521
{
@@ -4563,7 +4565,7 @@
45634565
{
45644566
"cell_type": "code",
45654567
"execution_count": 126,
4566-
"id": "1f5cac7b",
4568+
"id": "a9907dc9",
45674569
"metadata": {},
45684570
"outputs": [
45694571
{
@@ -4689,7 +4691,7 @@
46894691
{
46904692
"cell_type": "code",
46914693
"execution_count": null,
4692-
"id": "5e19aa88",
4694+
"id": "0b172286",
46934695
"metadata": {},
46944696
"outputs": [],
46954697
"source": [
@@ -4725,7 +4727,7 @@
47254727
{
47264728
"cell_type": "code",
47274729
"execution_count": 129,
4728-
"id": "c5a5e642",
4730+
"id": "193e767b",
47294731
"metadata": {},
47304732
"outputs": [
47314733
{
@@ -4764,7 +4766,7 @@
47644766
{
47654767
"cell_type": "code",
47664768
"execution_count": null,
4767-
"id": "5c1129f5",
4769+
"id": "de94b43f",
47684770
"metadata": {},
47694771
"outputs": [],
47704772
"source": [

0 commit comments

Comments
 (0)