This folder contains all of the data used in The Pudding essay A Tale of Two Cities published in March 2018.
Below you'll find metadata for each file.
- What is this?: Data representing the 5 business types that over-index the most in each of Seattle's and New York City's neighborhoods (respectively).
- Source(s) & Methods: Each of Seattle and New York City's neighborhoods were identified using Zillow's neighborhood-level shapefiles. We then used thesed neighborhood names to query the Yelp API for any available Yelp business categories using a radius of 2 miles (3210m). Using the latitude/longitude coordinates, we mapped each business to a neighborhood, and compared the ratio of businesses from each category in each neighborhood to their ratio in the city as a whole (e.g., if hairdressers made up 10% of a particular neighborhood's businesses but only 5% of the businesses in the city overall, they were judged to "over-index" in that neighborhood). If a single business had more than one category (e.g.,
Outdoor Wear,Sports Wear, andBikes) that business would be repeated three times, once for each category. Thus, a single business could be counted in more than one category, but never more than once in the same category. - Last Modified: Seattle - February 15, 2018. New York City - March 10, 2018.
- Contact Information: Amber Thomas
- Spatial Applicability: These data apply to businesses inside of the Seattle and New York City limits, as defined by Zillow's neighborhood-level shapefiles.
- Temporal Applicability: These data represent a collection of all operating businesses with at least one review on Yelp at the time of collection (Seattle: February 13, 2018 - February 15, 2018, New York City: March 6, 2018 - March 10, 2018).
- Observations (Rows): Each row represents a single category within a single neighborhood.
- Variables (Columns):
| Header | Description | Data Type |
|---|---|---|
neighborhood |
The plain text name of a neighborhood (as defined by Zillow) | text |
yelpAlias |
The Yelp-specified string to denote a specific category. These strings contain no spaces or special characters. Full list | text |
yelpTitle |
The cleaned and human-readable version of the yelpAlias. Full list |
text |
nCount |
The number of businesses in a specific neighborhood that match a specific yelpAlias. |
number |
neighborhoodTotal |
The total number of businesses operating in that neighborhood. | number |
cCount |
The number of businesses in the entire city that match a specific yelpAlias. |
number |
cityTotal |
The total number of businesses operating in the entire city. | number |
incidence |
The amount that a business-type over-indexes in a neighborhood compared to the city. This is calculated as (nCount / cCount) * (cityTotal / neighborhoodTotal). Note: Any business-type that did not make up at least 1% of the neighborhood's businesses was removed from the analysis. |
number |
rank |
A ranking of the businesses that over-index the most in a specific neighborhood. The business that over-indexes the most has rank = 1. |
number |
- Other Notes: Any use of these data needs to include a link back to Yelp as attribution.