{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Wrangling OpenStreetMap Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OpenStreetMap is a community built map of the world, similar in nature to how content on Wiki sites is generated and maintained. \n", "\n", "[http://www.openstreetmap.org](http://www.openstreetmap.org)\n", "\n", "Users can map things such as polylines of roads, draw polygons of buildings or areas of interest, or insert nodes for landmarks. These map elements can be further tagged with details such as street addresses or amenity type. Map data is stored in an XML format. More details about the OSM XML can be found here:\n", "\n", "[http://wiki.openstreetmap.org/wiki/OSM_XML](http://wiki.openstreetmap.org/wiki/OSM_XML)\n", "\n", "Some highlights of the OSM XML format relevent to this project are:\n", "\n", "* OSM XML is list of instances of data primatives (nodes, ways, and relations) found within a given bounds\n", "* nodes represent dimensionless points on the map\n", "* ways contain node references to form either a polyline or polygon on the map\n", "* nodes and ways both contain children tag elements that represent key value pairs of descriptive information about a given node or way\n", "\n", "As with any user generated content, there is likely going to be dirty data. In this project I'll attempt to do some auditing, cleaning, and data summarizing tasks with Python and MongoDB." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chosen Map Area" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this project, I chose to analyze data from Volusia and Flagler Counties, Florida. I grew up in this relatively rural area. I figure that my familiarity with the area and my hunch that this rural area has yet to be be thouroughly audited on the OpenStreetMap platform make it a good candidate for analysis. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
View Larger Map" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.display import HTML\n", "HTML('
View Larger Map')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used the Overpass API to download the OpenStreetMap XML for the corresponding bounding box:\n", "\n", "[http://overpass-api.de/api/map?bbox=-81.5600,28.8400,-80.7400,29.6713](http://overpass-api.de/api/map?bbox=-81.5600,28.8400,-80.7400,29.6713)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import requests\n", "\n", "url = 'http://overpass-api.de/api/map?bbox=-81.5600,28.8400,-80.7400,29.6713'\n", "filename = 'volusia_flagler.osm'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While easy to use, the Python requests library does not have save functionality. The below code is modified from this stackoverflow post:\n", "\n", "[http://stackoverflow.com/a/16696317](http://stackoverflow.com/a/16696317)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def download_file(url, local_filename):\n", " # stream = True allows downloading of large files; prevents loading entire file into memory\n", " r = requests.get(url, stream = True)\n", " with open(local_filename, 'wb') as f:\n", " for chunk in r.iter_content(chunk_size=1024): \n", " if chunk: # filter out keep-alive new chunks\n", " f.write(chunk)\n", " f.flush()\n", " \n", "download_file(url, filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Auditing the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the OSM XML file downloaded, I will parse through it with ElementTree and find the number of each type of element. In this project, I will use iterative parsing as the XML download can be too large to work with in memory." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'bounds': 1,\n", " 'member': 26682,\n", " 'meta': 1,\n", " 'nd': 362449,\n", " 'node': 308736,\n", " 'note': 1,\n", " 'osm': 1,\n", " 'relation': 331,\n", " 'tag': 214998,\n", " 'way': 28371}\n" ] } ], "source": [ "import xml.etree.ElementTree as ET\n", "import pprint\n", "\n", "tags = {}\n", "\n", "for event, elem in ET.iterparse(filename):\n", " if elem.tag in tags:\n", " tags[elem.tag] += 1\n", " else:\n", " tags[elem.tag] = 1\n", " \n", "pprint.pprint(tags) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here I have built three regular expressions: ```lower```, ```lower_colon```, and ```problemchars```. \n", "\n", "* ```lower```: matches strings containing lower case characters\n", "* ```lower_colon```: matches strings containing lower case characters and a single colon within the string\n", "* ```problemchars```: matches characters that cannot be used within keys in MongoDB\n", "\n", "Here is a sample of OSM XML:\n", "\n", "```xml\n", "\n", " \n", " \n", "\n", "```\n", "\n", "Within the ```node``` element there are two ```tag``` children. The key for both of these children begins with ```addr:```. Later in this notebook I will use the ```lower_colon``` regex to help find these keys so I can build a single ```address``` document within a larger json document." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'lower': 79451, 'lower_colon': 128361, 'other': 7185, 'problemchars': 1}\n" ] } ], "source": [ "import re\n", "\n", "lower = re.compile(r'^([a-z]|_)*$')\n", "lower_colon = re.compile(r'^([a-z]|_)*:([a-z]|_)*$')\n", "problemchars = re.compile(r'[=\\+/&<>;\\'\"\\?%#$@\\,\\. \\t\\r\\n]')\n", "\n", "def key_type(element, keys):\n", " if element.tag == \"tag\":\n", " \n", " if problemchars.search(element.attrib['k']):\n", " keys['problemchars'] += 1\n", " elif lower.search(element.attrib['k']):\n", " keys['lower'] += 1\n", " elif lower_colon.search(element.attrib['k']):\n", " keys['lower_colon'] += 1 \n", " else:\n", " keys['other'] += 1\n", " \n", " return keys\n", "\n", "def process_map(filename):\n", " keys = {\"lower\": 0, \"lower_colon\": 0, \"problemchars\": 0, \"other\": 0}\n", " \n", " for _, element in ET.iterparse(filename):\n", " keys = key_type(element, keys)\n", "\n", " return keys\n", "\n", "keys = process_map(filename)\n", "pprint.pprint(keys)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I will redefine ```process_map``` to build a set of unique userid's found within the XML. I will then output the length of this set, representing the number of unique users making edits in the chosen map area." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "262" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def process_map(filename):\n", " users = set()\n", " for _, element in ET.iterparse(filename):\n", " if \"uid\" in element.attrib and element.tag in ('node', 'way'):\n", " users.add(element.attrib[\"uid\"])\n", "\n", " return users\n", "\n", "users = process_map(filename)\n", "len(users)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problems With the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Street Names" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The majority of this project will be devoted to auditing and cleaning street names seen within the OSM XML. Street types used by users in the process of mapping are quite often abbreviated. I will attempt to find these abbreviations and replace them with their full text form. The plan of action is as follows:\n", "\n", "* Build a regex to match the last token in a string (with an optional '.') as this is typically where you would find the street type in an address\n", "* Build a list of expected street types that do not need to be cleaned\n", "* Parse through the XML looking for ```tag``` elements with ```k=\"addr:street\"``` attributes\n", "* Perform a search using the regex on the value of the ```v``` attribute of these elements (the street name string)\n", "* Build a dictionary with keys that are matches to the regex (street types) and a set of street names where the particular key was found as the value. This will allow us to determine what needs to be cleaned.\n", "* Build a second dictionary that contains a map from an offending street type to a clean street type\n", "* Build a second regex that will match these offending street types anywhere in a string\n", "* Build a function that will return a clean string using the mapping dictionary and this second regex\n", "\n", "The first step is to build a regex to match the last token in a string optionally ending with a period. I will also build a list of street types I expect to see in a clean street name." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from collections import defaultdict\n", "\n", "street_type_re = re.compile(r'\\b\\S+\\.?$', re.IGNORECASE)\n", "\n", "expected_street_types = [\"Avenue\", \"Boulevard\", \"Commons\", \"Court\", \"Drive\", \"Lane\", \"Parkway\", \n", " \"Place\", \"Road\", \"Square\", \"Street\", \"Trail\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ```audit_string``` function will take in the dictionary of street types we are building, a string to audit, a regex to match against that string, and the list of expected street types.\n", "\n", "The function will search the string for the regex. If there is a match and the match is not in our list of expected street types, add the match as a key to the dictionary and add the string to the set." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def audit_string(match_set_dict, string_to_audit, regex, expected_matches):\n", " \n", " m = regex.search(string_to_audit)\n", " \n", " if m:\n", " match_string = m.group()\n", " \n", " if match_string not in expected_matches:\n", " match_set_dict[match_string].add(string_to_audit)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I will define an ```audit``` function to do the parsing and auditing of the street names.\n", "\n", "I have defined this function so that it not only audits ```tag``` elements where ```k=\"addr:street\"```, but whichever ```tag``` elements match the ```tag_filter``` function. The ```audit``` function also takes in a regex and the list of expected matches." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def audit(osmfile, tag_filter, regex, expected_matches = []):\n", " \n", " osm_file = open(osmfile, \"r\")\n", " match_sets = defaultdict(set)\n", " \n", " # iteratively parse the mapping xml\n", " for event, elem in ET.iterparse(osm_file, events=(\"start\",)):\n", " # node and way tags are of special interest\n", " if elem.tag == \"node\" or elem.tag == \"way\":\n", " # iterate the \"tag\" tags within a node or way\n", " for tag in elem.iter(\"tag\"): \n", " if tag_filter(tag):\n", " audit_string(match_sets, tag.attrib['v'], regex, expected_matches)\n", " \n", " return match_sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function ```is_street_name``` determines if an element contains an attribute ```k=\"addr:street\"```. I will use ```is_street_name``` as the ```tag_filter``` when I call the ```audit``` function to audit street names." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def is_street_name(elem):\n", " return (elem.attrib['k'] == \"addr:street\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now lets pretty print the output of ```audit```" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'1': set(['US 1']),\n", " '100': set(['W. Highway 100']),\n", " 'Ave': set(['Central Ave',\n", " 'E Arizona Ave',\n", " 'E Euclid Ave',\n", " 'E MIchigan Ave',\n", " 'E Michigan Ave',\n", " 'E Minnesota Ave',\n", " 'E Pennsylania Ave',\n", " 'E Pennsylvania Ave',\n", " 'E Stetson Ave',\n", " 'E University Ave',\n", " 'N Amelia Ave',\n", " 'Rhode Island Ave',\n", " 'W Minnesota Ave',\n", " 'W Pennsylvania Ave']),\n", " 'BLVD': set(['North Clyde Morris BLVD']),\n", " 'Blvd': set(['Commerce Blvd',\n", " 'Harley Strickland Blvd',\n", " 'Howland Blvd',\n", " 'Mahogany Blvd',\n", " 'N Woodland Blvd',\n", " 'S Clyde Morris Blvd',\n", " 'S Woodland Blvd',\n", " 'Seasame Blvd',\n", " 'Town Center Blvd',\n", " 'W International Speedway Blvd',\n", " 'W Intl Speedway Blvd',\n", " 'West Granada Blvd']),\n", " 'Blvd.': set(['West International Speedway Blvd.']),\n", " 'Cir': set(['Fraternity Cir']),\n", " 'Circle': set(['Huntington Village Circle']),\n", " 'Dr': set(['Cypress Edge Dr',\n", " 'E Bert Fish Dr',\n", " 'Flagler Plaza Dr',\n", " 'N Bert FIsh Dr',\n", " 'N Bert Fish Dr',\n", " 'Rymfire Dr']),\n", " 'East': set(['Palm Coast Pkwy East']),\n", " 'Ln': set(['Bainbridge Ln']),\n", " 'N': set(['Old Kings Rd N']),\n", " 'Pkwy': set(['City Center Pkwy', 'Palm Coast Pkwy']),\n", " 'Pky': set(['Pine Lakes Pky']),\n", " 'Rd': set(['W Highbanks Rd']),\n", " 'Rd.': set(['North Nova Rd.']),\n", " 'Run': set(['Wolf Pack Run']),\n", " 'South': set(['Ibis Court South']),\n", " 'Speedway': set([\"W. Int'l Speedway\"]),\n", " 'St': set(['10th St']),\n", " 'St.': set(['Third St.']),\n", " 'Way': set(['Kings Way']),\n", " 'West': set(['Palm Coast Pkwy West'])}\n" ] } ], "source": [ "street_types = audit(filename, tag_filter = is_street_name, regex = street_type_re, \n", " expected_matches = expected_street_types)\n", "\n", "pprint.pprint(dict(street_types))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I have a list of some abbreviated street types (as well as clean street types I did not expect, cardinal directions, and highway numbers). This is by no means a comprehensive list of all of the abbreviated street types used within the XML as all of these matches occur only as the last token at the end of a street name, but it is a very good first swipe at the problem.\n", "\n", "To replace these abbreviated street types, I will define an ```update``` function that takes a string to update, a mapping dictionary, and a regex to search." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def update(string_to_update, mapping, regex):\n", "\n", " m = regex.search(string_to_update)\n", " \n", " if m:\n", " match = m.group()\n", " \n", " if match in mapping:\n", " string_to_update = re.sub(regex, mapping[match], string_to_update)\n", " \n", " return string_to_update" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the results of the audit, I will build a dictionary to map abbreviations to their full, clean representations." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "map_street_types = \\\n", " {\n", " \"Ave\" : \"Avenue\",\n", " \"BLVD\" : \"Boulevard\",\n", " \"Blvd\" : \"Boulevard\",\n", " \"Blvd.\" : \"Boulevard\",\n", " \"Cir\" : \"Circle\",\n", " \"Dr\" : \"Drive\",\n", " \"Ln\" : \"Lane\",\n", " \"Pkwy\" : \"Parkway\",\n", " \"Rd\" : \"Road\",\n", " \"Rd.\" : \"Road\",\n", " \"St\" : \"Street\",\n", " \"St.\" : \"Street\"\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I now want to replace the keys of the map anywhere in the string. I'll build a new regex to do so." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Take the keys from the map and create a string joined by a pipe\n", "# Replace '.' with the empty string, as the regex will handle the optional periods \n", "bad_streets = \"|\".join(map_street_types.keys()).replace('.', '')\n", "\n", "# The pipe will cause the regex to search for any of the keys, lazily matching the first it finds\n", "street_type_updater_re = re.compile(r'\\b(' + bad_streets + r')\\b\\.?', re.IGNORECASE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see how this works, I will traverse the ```street_types``` dictionary from above" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Bainbridge Ln => Bainbridge Lane\n", "Third St. => Third Street\n", "W Highbanks Rd => W Highbanks Road\n", "North Clyde Morris BLVD => North Clyde Morris Boulevard\n", "North Nova Rd. => North Nova Road\n", "Rymfire Dr => Rymfire Drive\n", "N Bert Fish Dr => N Bert Fish Drive\n", "N Bert FIsh Dr => N Bert FIsh Drive\n", "Cypress Edge Dr => Cypress Edge Drive\n", "Flagler Plaza Dr => Flagler Plaza Drive\n", "E Bert Fish Dr => E Bert Fish Drive\n", "Palm Coast Pkwy => Palm Coast Parkway\n", "City Center Pkwy => City Center Parkway\n", "Fraternity Cir => Fraternity Circle\n", "10th St => 10th Street\n", "West International Speedway Blvd. => West International Speedway Boulevard\n", "W International Speedway Blvd => W International Speedway Boulevard\n", "Seasame Blvd => Seasame Boulevard\n", "N Woodland Blvd => N Woodland Boulevard\n", "S Woodland Blvd => S Woodland Boulevard\n", "Harley Strickland Blvd => Harley Strickland Boulevard\n", "Howland Blvd => Howland Boulevard\n", "Commerce Blvd => Commerce Boulevard\n", "West Granada Blvd => West Granada Boulevard\n", "Town Center Blvd => Town Center Boulevard\n", "Mahogany Blvd => Mahogany Boulevard\n", "W Intl Speedway Blvd => W Intl Speedway Boulevard\n", "S Clyde Morris Blvd => S Clyde Morris Boulevard\n", "E Minnesota Ave => E Minnesota Avenue\n", "E Euclid Ave => E Euclid Avenue\n", "E Arizona Ave => E Arizona Avenue\n", "Central Ave => Central Avenue\n", "E MIchigan Ave => E MIchigan Avenue\n", "E Stetson Ave => E Stetson Avenue\n", "W Pennsylvania Ave => W Pennsylvania Avenue\n", "E Michigan Ave => E Michigan Avenue\n", "W Minnesota Ave => W Minnesota Avenue\n", "N Amelia Ave => N Amelia Avenue\n", "E University Ave => E University Avenue\n", "E Pennsylvania Ave => E Pennsylvania Avenue\n", "Rhode Island Ave => Rhode Island Avenue\n", "E Pennsylania Ave => E Pennsylania Avenue\n" ] } ], "source": [ "for street_type, ways in street_types.iteritems():\n", " if street_type in map_street_types:\n", " for name in ways:\n", " better_name = update(name, map_street_types, street_type_updater_re)\n", " print name, \"=>\", better_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like the abbreviated street types updated as expected.\n", "\n", "Upon closer inspection, I see another problem: cardinal directions. North, South, East, and West appear to be universally abbreviated. Lets apply similar techniques to replace these abbreviated cardinal directions.\n", "\n", "First, I will create a new regex matching the set of characters NSEW at the beginning of a string, followed by an optional period" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cardinal_dir_re = re.compile(r'^[NSEW]\\b\\.?', re.IGNORECASE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To audit, I can use the same function with this new regex" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'E': set(['E Arizona Ave',\n", " 'E Bert Fish Dr',\n", " 'E Euclid Ave',\n", " 'E MIchigan Ave',\n", " 'E Michigan Ave',\n", " 'E Minnesota Ave',\n", " 'E Pennsylania Ave',\n", " 'E Pennsylvania Ave',\n", " 'E Stetson Ave',\n", " 'E University Ave']),\n", " 'N': set(['N Amelia Ave',\n", " 'N Bert FIsh Dr',\n", " 'N Bert Fish Dr',\n", " 'N Woodland Blvd']),\n", " 'S': set(['S Clyde Morris Blvd', 'S Woodland Blvd']),\n", " 'W': set(['W Highbanks Rd',\n", " 'W International Speedway Blvd',\n", " 'W Intl Speedway Blvd',\n", " 'W Minnesota Ave',\n", " 'W Pennsylvania Ave']),\n", " 'W.': set(['W. Highway 100', \"W. Int'l Speedway\"])}\n" ] } ], "source": [ "cardinal_directions = audit(filename, is_street_name, cardinal_dir_re)\n", "\n", "pprint.pprint(dict(cardinal_directions))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like we found E, N, S, W, and W. at beginning of the street names. Informative, but I can just create an exhaustive mapping for this issue" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "map_cardinal_directions = \\\n", " {\n", " \"E\" : \"East\",\n", " \"E.\" : \"East\",\n", " \"N\" : \"North\",\n", " \"N.\" : \"North\",\n", " \"S\" : \"South\",\n", " \"S.\" : \"South\",\n", " \"W\" : \"West\",\n", " \"W.\" : \"West\"\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I now want to replace the keys anywhere in the string. I'll build a new regex to do so" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [], "source": [ "bad_directions = \"|\".join(map_cardinal_directions.keys()).replace('.', '')\n", "cardinal_dir_updater_re = re.compile(r'\\b(' + bad_directions + r')\\b\\.?', re.IGNORECASE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, I will traverse the ```cardinal_directions``` dictionary and apply the updates for both street type and cardinal direction" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "W. Int'l Speedway => W. Int'l Speedway => West Int'l Speedway\n", "W. Highway 100 => W. Highway 100 => West Highway 100\n", "S Woodland Blvd => S Woodland Boulevard => South Woodland Boulevard\n", "S Clyde Morris Blvd => S Clyde Morris Boulevard => South Clyde Morris Boulevard\n", "E Minnesota Ave => E Minnesota Avenue => East Minnesota Avenue\n", "E Pennsylania Ave => E Pennsylania Avenue => East Pennsylania Avenue\n", "E Arizona Ave => E Arizona Avenue => East Arizona Avenue\n", "E MIchigan Ave => E MIchigan Avenue => East MIchigan Avenue\n", "E Stetson Ave => E Stetson Avenue => East Stetson Avenue\n", "E Michigan Ave => E Michigan Avenue => East Michigan Avenue\n", "E University Ave => E University Avenue => East University Avenue\n", "E Pennsylvania Ave => E Pennsylvania Avenue => East Pennsylvania Avenue\n", "E Bert Fish Dr => E Bert Fish Drive => East Bert Fish Drive\n", "E Euclid Ave => E Euclid Avenue => East Euclid Avenue\n", "W International Speedway Blvd => W International Speedway Boulevard => West International Speedway Boulevard\n", "W Pennsylvania Ave => W Pennsylvania Avenue => West Pennsylvania Avenue\n", "W Highbanks Rd => W Highbanks Road => West Highbanks Road\n", "W Minnesota Ave => W Minnesota Avenue => West Minnesota Avenue\n", "W Intl Speedway Blvd => W Intl Speedway Boulevard => West Intl Speedway Boulevard\n", "N Woodland Blvd => N Woodland Boulevard => North Woodland Boulevard\n", "N Amelia Ave => N Amelia Avenue => North Amelia Avenue\n", "N Bert Fish Dr => N Bert Fish Drive => North Bert Fish Drive\n", "N Bert FIsh Dr => N Bert FIsh Drive => North Bert FIsh Drive\n" ] } ], "source": [ "for cardinal_direction, ways in cardinal_directions.iteritems():\n", " if cardinal_direction in map_cardinal_directions:\n", " for name in ways:\n", " better_name = update(name, map_street_types, street_type_updater_re)\n", " best_name = update(better_name, map_cardinal_directions, cardinal_dir_updater_re)\n", " print name, \"=>\", better_name, \"=>\", best_name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Lack of Street Address Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Besides dirty data within the ```addr:street``` field, there is an apparent lack of data on street addresses altogether. Here I will count the total number of ```nodes``` and ```ways``` that contain a ```tag``` child with ```k=\"addr:street\"```" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "190" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "osm_file = open(filename, \"r\")\n", "address_count = 0\n", "\n", "for event, elem in ET.iterparse(osm_file, events=(\"start\",)):\n", " if elem.tag == \"node\" or elem.tag == \"way\":\n", " for tag in elem.iter(\"tag\"): \n", " if is_street_name(tag):\n", " address_count += 1\n", "\n", "address_count" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even for this relatively rural area, this is a very small number of locations on the map to have their street addresses tagged" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparing for MongoDB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To load the XML data into MongoDB, I will have to transform the data into json documents structured like this:\n", "\n", "```json\n", "{\n", " \"id\": \"2406124091\",\n", " \"type: \"node\",\n", " \"visible\":\"true\",\n", " \"created\": {\n", " \"version\":\"2\",\n", " \"changeset\":\"17206049\",\n", " \"timestamp\":\"2013-08-03T16:43:42Z\",\n", " \"user\":\"linuxUser16\",\n", " \"uid\":\"1219059\"\n", " },\n", " \"pos\": [41.9757030, -87.6921867],\n", " \"address\": {\n", " \"housenumber\": \"5157\",\n", " \"postcode\": \"60625\",\n", " \"street\": \"North Lincoln Ave\"\n", " },\n", " \"amenity\": \"restaurant\",\n", " \"cuisine\": \"mexican\",\n", " \"name\": \"La Cabana De Don Luis\",\n", " \"phone\": \"1 (773)-271-5176\"\n", "}\n", "```\n", "\n", "The transform will follow these rules:\n", "\n", "* Process only 2 types of top level tags: ```node``` and ```way```\n", "* All attributes of ```node``` and ```way``` should be turned into regular key/value pairs, except:\n", " * The following attributes should be added under a key ```created```: ```version```, ```changeset```, ```timestamp```, ```user```, ```uid```\n", " * Attributes for latitude and longitude should be added to a ```pos``` array, for use in geospacial indexing. Make sure the values inside ```pos``` array are floats and not strings. \n", "* If second level ```tag``` \"k\" value contains problematic characters, it should be ignored\n", "* If second level ```tag``` \"k\" value starts with \"addr:\", it should be added to a dictionary ```address```\n", "* If second level ```tag``` \"k\" value does not start with \"addr:\", but contains \":\", you can process it same as any other tag.\n", "* If there is a second \":\" that separates the type/direction of a street, the tag should be ignored, for example:\n", "\n", "```xml\n", "\n", "\n", "\n", "\n", "\n", "\n", "```\n", "should be turned into:\n", "\n", "```json\n", "{\n", " \"address\": {\n", " \"housenumber\": 5158,\n", " \"street\": \"North Lincoln Avenue\"\n", " },\n", " \"amenity\": \"pharmacy\"\n", "}\n", "```\n", "\n", "* For \"way\" specifically:\n", "\n", "```xml\n", "\n", "\n", "```\n", "should be turned into:\n", "\n", "```json\n", "{\n", " \"node_refs\": [\"305896090\", \"1719825889\"]\n", "}\n", "```\n", "\n", "To do this transformation, I will define a function ```shape_element``` that processes an element. Within this function I will use the ```update``` function with the regexes and mapping dictionaries defined above to clean street addresses. Additionally, I will store ```timestamp``` as a Python ```datetime``` rather than as a string. The format of the ```timestamp``` can be found here:\n", "\n", "http://overpass-api.de/output_formats.html" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from datetime import datetime\n", "\n", "def shape_element(element):\n", " node = {}\n", " CREATED = [\"version\", \"changeset\", \"timestamp\", \"user\", \"uid\"]\n", " \n", " if element.tag == \"node\" or element.tag == \"way\" :\n", " node['type'] = element.tag\n", "\n", " # Parse attributes\n", " for a in element.attrib:\n", " \n", " # Parse details of data creation\n", " if a in CREATED:\n", " if 'created' not in node:\n", " node['created'] = {}\n", "\n", " if a == \"timestamp\":\n", " node['created'][a] = datetime.strptime(element.attrib[a], '%Y-%m-%dT%H:%M:%SZ')\n", " else:\n", " node['created'][a] = element.attrib[a]\n", " \n", " # Parse coordinates\n", " elif a in ['lat', 'lon']:\n", " if 'pos' not in node:\n", " node['pos'] = [None, None]\n", "\n", " if a == 'lat':\n", " node['pos'][0] = float(element.attrib[a])\n", " else:\n", " node['pos'][1] = float(element.attrib[a])\n", " \n", " else:\n", " node[a] = element.attrib[a]\n", "\n", " # Iterate tag children\n", " for tag in element.iter(\"tag\"):\n", " if not problemchars.search(tag.attrib['k']):\n", "\n", " # Tags with single colon and beginning with addr\n", " if lower_colon.search(tag.attrib['k']) and tag.attrib['k'].find('addr') == 0:\n", " if 'address' not in node:\n", " node['address'] = {}\n", "\n", " sub_attr = tag.attrib['k'].split(':', 1)\n", "\n", " if is_street_name(tag):\n", " # Do some cleaning\n", " better_name = update(tag.attrib['v'], map_street_types, street_type_updater_re)\n", " best_name = update(better_name, map_cardinal_directions, cardinal_dir_updater_re)\n", "\n", " node['address'][sub_attr[1]] = best_name\n", " else: \n", " node['address'][sub_attr[1]] = tag.attrib['v']\n", "\n", " # All other tags that don't begin with \"addr\"\n", " elif not tag.attrib['k'].find('addr') == 0:\n", " if tag.attrib['k'] not in node:\n", " node[tag.attrib['k']] = tag.attrib['v']\n", " else:\n", " node[\"tag:\" + tag.attrib['k']] = tag.attrib['v']\n", " \n", " # Iterate nd children building a list\n", " for nd in element.iter(\"nd\"):\n", " if 'node_refs' not in node:\n", " node['node_refs'] = []\n", " \n", " node['node_refs'].append(nd.attrib['ref'])\n", "\n", " return node\n", " else:\n", " return None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now parse the XML, shape the elements, and write to a json file" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import json\n", "from bson import json_util \n", "\n", "def process_map(file_in, pretty = False):\n", "\n", " file_out = \"{0}.json\".format(file_in)\n", " \n", " with open(file_out, \"wb\") as fo:\n", " for _, element in ET.iterparse(file_in):\n", " el = shape_element(element)\n", " if el:\n", " if pretty:\n", " fo.write(json.dumps(el, indent=2, default=json_util.default)+\"\\n\")\n", " else:\n", " fo.write(json.dumps(el, default=json_util.default) + \"\\n\")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [], "source": [ "process_map(filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Overview of the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets look at the size of this file" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The downloaded file is 71.758865 MB\n" ] } ], "source": [ "import os\n", "print \"The downloaded file is {} MB\".format(os.path.getsize(filename)/1.0e6) # convert from bytes to megabytes" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The json file is 77.760245 MB\n" ] } ], "source": [ "print \"The json file is {} MB\".format(os.path.getsize(filename + \".json\")/1.0e6) # convert from bytes to megabytes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To import this json file to MongoDB, I will use the ```subprocess``` module to run shell commands.\n", "\n", "The first task is to execute ```mongod``` to run MongoDB" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import signal\n", "import subprocess\n", "\n", "# The os.setsid() is passed in the argument preexec_fn so\n", "# it's run after the fork() and before exec() to run the shell.\n", "pro = subprocess.Popen(\"mongod\", preexec_fn = os.setsid) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, connect to the database with ```pymongo```" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from pymongo import MongoClient\n", "\n", "db_name = \"osm\"\n", "\n", "client = MongoClient('localhost:27017')\n", "db = client[db_name]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The recommended method of importing large amounts of data is with ```mongoimport```.\n", "\n", "First I will build a mongoimport command, then use ```subprocess.call``` to execute" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dropping collection\n", "Executing: mongoimport --db osm --collection volusia_flagler --file /Users/jasondamiani/Developer/IPython/Notebooks/volusia_flagler.osm.json\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Build mongoimport command\n", "collection = filename[:filename.find(\".\")]\n", "working_directory = \"/Users/jasondamiani/Developer/IPython/Notebooks/\"\n", "json_file = filename + \".json\"\n", "\n", "mongoimport_cmd = \"mongoimport --db \" + db_name + \\\n", " \" --collection \" + collection + \\\n", " \" --file \" + working_directory + json_file\n", "\n", "# Before importing, drop collection if it exists\n", "if collection in db.collection_names():\n", " print \"dropping collection\"\n", " db[collection].drop()\n", "\n", "# Execute the command\n", "print \"Executing: \" + mongoimport_cmd\n", "subprocess.call(mongoimport_cmd.split())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After importing, get the collection from the database" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [], "source": [ "volusia_flagler = db[collection]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number of Documents" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "337107" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.find().count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number of Unique Users" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "262" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(volusia_flagler.distinct('created.user'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top Contributing User" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{u'_id': u'SteveDorries', u'count': 88239}]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.aggregate([{\"$group\" : {\"_id\" : \"$created.user\", \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"count\" : -1}}, \\\n", " {\"$limit\" : 1}])['result']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number of Nodes and Ways" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{u'_id': u'way', u'count': 28371}, {u'_id': u'node', u'count': 308736}]" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.aggregate({\"$group\" : {\"_id\" : \"$type\", \"count\" : {\"$sum\" : 1}}})['result']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Most Referenced Node" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{u'_id': ObjectId('54d47c3a32f1da11a31e33e8'),\n", " u'created': {u'changeset': u'5347582',\n", " u'timestamp': datetime.datetime(2010, 7, 29, 15, 14, 53),\n", " u'uid': u'168862',\n", " u'user': u'SteveDorries',\n", " u'version': u'3'},\n", " u'id': u'97386075',\n", " u'pos': [29.5729028, -81.5138062],\n", " u'type': u'node'}\n" ] } ], "source": [ "node_id = volusia_flagler.aggregate([{\"$unwind\" : \"$node_refs\"}, \\\n", " {\"$group\" : {\"_id\" : \"$node_refs\", \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"count\" : -1}}, \\\n", " {\"$limit\" : 1}])['result'][0]['_id']\n", "\n", "pprint.pprint(volusia_flagler.find({\"id\" : node_id})[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Number of Documents Containing a Street Address" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "195" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.find({\"address.street\" : {\"$exists\" : 1}}).count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Zip Codes" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{u'_id': u'32720', u'count': 60},\n", " {u'_id': u'32114', u'count': 28},\n", " {u'_id': u'32164', u'count': 15},\n", " {u'_id': u'32137', u'count': 15},\n", " {u'_id': u'32724', u'count': 12},\n", " {u'_id': u'32118', u'count': 11},\n", " {u'_id': u'32167', u'count': 8},\n", " {u'_id': u'32174', u'count': 8},\n", " {u'_id': u'32127', u'count': 4},\n", " {u'_id': u'32110', u'count': 4},\n", " {u'_id': u'32763', u'count': 4},\n", " {u'_id': u'32168', u'count': 4},\n", " {u'_id': u'32725', u'count': 3},\n", " {u'_id': u'32119', u'count': 2},\n", " {u'_id': u'32713', u'count': 2},\n", " {u'_id': u'32764', u'count': 2},\n", " {u'_id': u'32738', u'count': 2},\n", " {u'_id': u'32117', u'count': 2},\n", " {u'_id': u'32723', u'count': 1},\n", " {u'_id': u'32720-1917', u'count': 1},\n", " {u'_id': u'32763-9124', u'count': 1},\n", " {u'_id': u'32128', u'count': 1},\n", " {u'_id': u'32118-4101', u'count': 1}]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.aggregate([{\"$match\" : {\"address.postcode\" : {\"$exists\" : 1}}}, \\\n", " {\"$group\" : {\"_id\" : \"$address.postcode\", \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"count\" : -1}}])['result']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These all appear to be valid ZIP codes for the map region, although there are three occurences of ZIP+4 codes in the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top 5 Most Common Cities" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{u'_id': u'Daytona Beach', u'count': 40},\n", " {u'_id': u'Palm Coast', u'count': 30},\n", " {u'_id': u'Ormond Beach', u'count': 5},\n", " {u'_id': u'Orange City', u'count': 4},\n", " {u'_id': u'Bunnell', u'count': 4}]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.aggregate([{\"$match\" : {\"address.city\" : {\"$exists\" : 1}}}, \\\n", " {\"$group\" : {\"_id\" : \"$address.city\", \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"count\" : -1}}, \\\n", " {\"$limit\" : 5}])['result']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top 10 Amenities" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{u'_id': u'place_of_worship', u'count': 387},\n", " {u'_id': u'parking', u'count': 248},\n", " {u'_id': u'school', u'count': 94},\n", " {u'_id': u'restaurant', u'count': 83},\n", " {u'_id': u'fast_food', u'count': 77},\n", " {u'_id': u'fire_station', u'count': 62},\n", " {u'_id': u'university', u'count': 55},\n", " {u'_id': u'grave_yard', u'count': 45},\n", " {u'_id': u'fuel', u'count': 43},\n", " {u'_id': u'emergency_phone', u'count': 34}]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.aggregate([{\"$match\" : {\"amenity\" : {\"$exists\" : 1}}}, \\\n", " {\"$group\" : {\"_id\" : \"$amenity\", \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"count\" : -1}}, \\\n", " {\"$limit\" : 10}])['result']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top Religions with Denominations" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{u'_id': {u'religion': u'christian'}, u'count': 176},\n", " {u'_id': {u'denomination': u'baptist', u'religion': u'christian'},\n", " u'count': 99},\n", " {u'_id': {u'denomination': u'methodist', u'religion': u'christian'},\n", " u'count': 31},\n", " {u'_id': {u'denomination': u'lutheran', u'religion': u'christian'},\n", " u'count': 16},\n", " {u'_id': {u'denomination': u'catholic', u'religion': u'christian'},\n", " u'count': 14},\n", " {u'_id': {}, u'count': 13},\n", " {u'_id': {u'denomination': u'presbyterian', u'religion': u'christian'},\n", " u'count': 12},\n", " {u'_id': {u'denomination': u'jehovahs_witness', u'religion': u'christian'},\n", " u'count': 12},\n", " {u'_id': {u'denomination': u'pentecostal', u'religion': u'christian'},\n", " u'count': 5},\n", " {u'_id': {u'denomination': u'mormon', u'religion': u'christian'},\n", " u'count': 2},\n", " {u'_id': {u'denomination': u'united_methodist', u'religion': u'christian'},\n", " u'count': 1},\n", " {u'_id': {u'denomination': u'seventh_day_adventist',\n", " u'religion': u'christian'},\n", " u'count': 1},\n", " {u'_id': {u'denomination': u'disciples_of_christ',\n", " u'religion': u'christian'},\n", " u'count': 1},\n", " {u'_id': {u'religion': u'muslim'}, u'count': 1},\n", " {u'_id': {u'denomination': u'evangelical', u'religion': u'christian'},\n", " u'count': 1},\n", " {u'_id': {u'denomination': u'Lutheran', u'religion': u'christian'},\n", " u'count': 1},\n", " {u'_id': {u'denomination': u'episcopal', u'religion': u'christian'},\n", " u'count': 1}]\n" ] } ], "source": [ "religions = \\\n", "volusia_flagler.aggregate([{\"$match\" : {\"amenity\" : \"place_of_worship\"}}, \\\n", " {\"$group\" : {\"_id\" : {\"religion\" : \"$religion\", \"denomination\" : \"$denomination\"}, \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"count\" : -1}}])['result']\n", "\n", "pprint.pprint(religions)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Top 10 Leisures" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{u'_id': u'pitch', u'count': 184},\n", " {u'_id': u'park', u'count': 73},\n", " {u'_id': u'sports_centre', u'count': 33},\n", " {u'_id': u'swimming_pool', u'count': 27},\n", " {u'_id': u'playground', u'count': 22},\n", " {u'_id': u'stadium', u'count': 13},\n", " {u'_id': u'golf_course', u'count': 12},\n", " {u'_id': u'slipway', u'count': 6},\n", " {u'_id': u'garden', u'count': 5},\n", " {u'_id': u'marina', u'count': 2}]" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.aggregate([{\"$match\" : {\"leisure\" : {\"$exists\" : 1}}}, \\\n", " {\"$group\" : {\"_id\" : \"$leisure\", \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"count\" : -1}}, \\\n", " {\"$limit\" : 10}])['result']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Other Ideas About the Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on my experience with exploring the OpenStreetMap data, I believe that the data structure is flexible enough to incorporate a vast multitude of user generated quantitative and qualitative data beyond that of simply defining a virtual map. I believe that extending this open source project to include data such as user reviews of establishments, subjective areas of what bound a good and bad neighborhood, housing price data, school reviews, walkability/bikeability, quality of mass transit, and on would form a solid foundation of robust recommender systems. These recommender systems could aid users in anything from finding a new home or apartment to helping a user decide where to spend a weekend afternoon.\n", "\n", "Unfortunately, it appears that, at least for the area of the world that I analyzed, the mapping data is far too incomplete to be able to implement such recommender systems. I believe that the OpenStreetMap project would greatly benefit from visualizing data on content generation within their maps. For example, a heat map layer could be overlayed on the map showing how frequently or how recently certain regions of the map have been updated. These map layers could help guide users towards areas of the map that need attention in order to help more fully complete the data set.\n", "\n", "Next I will cover a couple of queries that are aligned with these ideas about the velocity and volume of content generation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Count of Nodes Elements Created by Day of Week" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will use the ```$dayOfWeek``` operator to extract the day of week from the ```created.timestamp``` field. 1 is Sunday, 7 is Saturday: \n", "\n", "http://docs.mongodb.org/manual/reference/operator/aggregation/dayOfWeek/" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[{u'_id': 1, u'count': 16926},\n", " {u'_id': 2, u'count': 89939},\n", " {u'_id': 3, u'count': 26663},\n", " {u'_id': 4, u'count': 30427},\n", " {u'_id': 5, u'count': 53204},\n", " {u'_id': 6, u'count': 78839},\n", " {u'_id': 7, u'count': 41109}]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "volusia_flagler.aggregate([{\"$project\" : {\"dayOfWeek\" : {\"$dayOfWeek\" : \"$created.timestamp\"}}}, \\\n", " {\"$group\" : {\"_id\" : \"$dayOfWeek\", \"count\" : {\"$sum\" : 1}}}, \\\n", " {\"$sort\" : {\"_id\" : 1}}])['result']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It appears that users were more active on Mondays and Fridays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Age of Elements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will determine how many days ago elements were created in the XML using the ```created.timestamp``` field. I want to visualize this data, so I will push the calculated values into a list" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ageDict = volusia_flagler.aggregate([{\"$project\" : {\"ageInMilliseconds\" : {\"$subtract\" : [datetime.now(), \"$created.timestamp\"]}}}, \\\n", " {\"$project\" : {\"_id\" : 0, \"ageInDays\" : {\"$divide\" : [\"$ageInMilliseconds\", 1000*60*60*24]}}}, \\\n", " {\"$group\" : {\"_id\" : 1, \"ageInDays\" : {\"$push\" : \"$ageInDays\"}}}, \\\n", " {\"$project\" : {\"_id\" : 0, \"ageInDays\" : 1}}])['result'][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now I have a dictionary with one key, ```ageInDays```, and a list of floats as the value. Next, I will create a pandas dataframe from this dictionary" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " ageInDays\n", "0 1115.141735\n", "1 1778.249362\n", "2 1778.249362\n", "3 1778.249362\n", "4 1859.374350\n" ] } ], "source": [ "from pandas import DataFrame\n", "\n", "ageDF = DataFrame.from_dict(ageDict)\n", "print ageDF.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will plot a histogram of this series with ggplot. The binwidth is set to 7 (1 week)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": [ "iVBORw0KGgoAAAANSUhEUgAAA7cAAALRCAYAAACAvlTwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n", "AAAPYQAAD2EBqD+naQAAIABJREFUeJzs3d9z1PW9+PFXfrOwARWMxIBT+RFrQsUmw4TBds5ItNhe\n", "MLVOmTmjdTjTnvamF73opV550T/Ai46d8RTtlBk8nanj1FpM6ahVGA/oaKYGKAVrgBCDUCSJgWzI\n", "fi/8ZsuSBBIlbN7J43Ej+9n3fj7v3X0n8ZnP7qYsn8/nAwAAABJWXuoJAAAAwJclbgEAAEieuAUA\n", "ACB54hYAAIDkiVsAAACSJ24BAABInrgFAAAgeeIWAACA5IlbAAAAkiduAQAASF7lVAeePHky3nvv\n", "vfjnP/8Z586di4ULF8aKFSti8+bNsXTp0sK43//+9/H++++Pu/2yZcvipz/96bjt7777buzduzfO\n", "nTsXixcvjra2tmhraxs3bmhoKDo6OuLQoUORy+WioaEhtmzZEvX19ePGdnd3R0dHR/T29kZNTU00\n", "NzdHe3t7VFdXT/XuAgAAkJApx+2bb74ZJ06ciKamprjttttiYGAg/u///i+eeeaZ+NGPfhR1dXX/\n", "3mllZWzdurXo9gsWLBi3zwMHDsQf/vCHaGpqik2bNsVHH30Ur7zySuRyufjGN75RGDc6Oho7d+6M\n", "jz/+OO67777IZDKxf//+2LFjR/z4xz8uiutTp07F888/H7feemts2bIlzp8/H3v37o0zZ87EY489\n", "Nq0HBwAAgDRMOW43bdoUt99+e1RUVBS2NTc3xy9/+ct4880343vf+15he3l5edxzzz1X3V8ul4s9\n", "e/ZEY2NjbNu2LSIiWlpaIp/PxxtvvBGtra2RyWQiIqKrqyuOHz8e27Zti6ampsKxn3766Xjttdfi\n", "kUceKex3z549kclkYvv27VFTUxMRETfddFO89NJLcfTo0Vi9evVU7zIAAACJmPJ7bleuXFkUthER\n", "S5cujVtvvTU++eSTceNHR0fjwoULk+7vww8/jKGhodiwYUPR9g0bNsTw8HAcOXKksK2rqyuy2Wwh\n", "bCMiFi1aFM3NzXHo0KG4dOlSRERcuHAhjh07Fvfcc08hbCMi1q9fH9XV1fHBBx9M9e4CAACQkCmf\n", "uZ1IPp+PgYGBuO2224q253K5+MUvfhG5XC4ymUysW7cuHnzwwaL3vPb29kZExO2331502/r6+igr\n", "K4ve3t7C2d/e3t4J31vb0NAQ77zzTpw5cybq6uqir68vRkdHx+2zoqIili9fHqdOnfoydxcAAIBZ\n", "6kvFbWdnZ/T398fmzZsL22pra+O+++6L+vr6yOfz8Y9//CP2798fH3/8cWzfvj3Kyz8/Wdzf3x/l\n", "5eWxaNGi4glVVsbChQujv7+/sK2/vz++8pWvjDt+NpstXF9XV1e4TW1t7YRju7u7v8zdBQAAYJb6\n", "wnF7+vTp+OMf/xgrV66Me++9t7D9gQceKBq3bt26WLp0aezZsye6urpi3bp1ERExMjIy7mXOYyoq\n", "KiKXyxUuTza2svLz6Y+NHRkZKdx+orFj1wMAADC3fKG47e/vj507d8aCBQti27ZtUVZWdtXxGzdu\n", "jL/85S9x7NixQtxWVlYW3it7pZGRkaiqqipcrqqqmnDsWKyOjR2L3cnGXr7Pp556atL5Pvnkk1e9\n", "PwAAAMwu047bCxcuxG9/+9u4ePFi/Nd//deELwG+UlVVVWQymRgaGipsq62tjdHR0RgcHCx6afLI\n", "yEgMDQ0V7TebzRa9THnMwMBAYV+X/3eysVOZa8Tn7yUGAADgi7nWCdCZMK24zeVysXPnzjh79mw8\n", "/vjjceutt07pdhcvXozPPvusKGKXL18eERE9PT2xdu3awvaenp7I5/OF68fGdnd3Rz6fL3qQTpw4\n", "EdXV1YW/c1tXVxfl5eXR09MTzc3NhXEjIyPR29tbOGsccfWzs/l83odPTSKbzRZ+qcC/jX3gmXUz\n", "MetmYtbN1Vk3E7Nurs66mZh1c3XWzcSsm6uzbiY30YcB3whT/lNAo6Oj8bvf/S5OnjwZ3//+92PF\n", "ihXjxoyMjMTFixfHbX/99dcjImLNmjWFbXfeeWdkMpnYv39/0dgDBw5EdXV1UfA2NTXFwMBAHDx4\n", "sLBtcHAwurq6orGxsfAe2wULFsSqVauis7OzaB6dnZ0xPDxc9KeEAAAAmDumfOZ29+7dcfjw4bjr\n", "rrvis88+i/fff7/o+vXr10d/f38888wz8bWvfa1wNvXo0aNx5MiRWLt2bXz1q18tjK+qqorNmzfH\n", "yy+/HC+88EKsXr06uru7o7OzM9rb2yOTyRTGNjU1xYoVK+LFF1+M06dPF6I4n8/H/fffXzSP9vb2\n", "ePbZZ2PHjh3R0tIS58+fj3379sWaNWuK4hoAAIC5Y8px+/HHH0dZWVkcPnw4Dh8+XHRdWVlZrF+/\n", "PjKZTDQ2NsbRo0fjvffei3w+H7fccks88MADsWnTpnH73LBhQ5SXl8e+ffvi8OHDsWTJknjooYdi\n", "48aNRePKy8vj0UcfjY6Ojnj77bcjl8tFQ0NDPPzww4WIHlNfXx+PP/54dHR0xO7du6OmpiZaWlrG\n", "fYozAAAAc0dZ3qcnjeM9t5Pz3oKJeU/K1Vk3E7Nurs66mZh1c3XWzcSsm6uzbiZm3VyddTO5+vr6\n", "knyg1JTfcwsAAACzlbgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJIn\n", "bgEAAEieuAUAACB54hYAgBti165dpZ4CMIeJWwAAboi+vr5STwGYw8QtAAAAyRO3AAAAJE/cAgAA\n", "kDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA\n", "8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAJG7Xrl2lngJAyYlbAIDE\n", "9fX1lXoKACUnbgEAAEieuAUAACB54hYAAIDkiVsAAACSJ24BAABInrgFAAAgeeIWAACA5IlbAAAA\n", "kiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsAAACSJ24BAABI\n", "nrgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB5\n", "laWewGw0MDAQ2Wy21NOYtTw24w0MDESEx+ZqPDbjWTfX5rEZz7q5tvn02Fy6dCmGhoYKlzOZTFRU\n", "VIwbV+p1M9V5ltJ8WjdTVep1kwKPzcQGBgaitrb2hh9X3E4gm83GqVOnSj2NWSmbzRa+0fFv9fX1\n", "ERHWzSSsm4lZN1dn3UzMurm6+b5uLg/Iy822dTPZPEtlvq+bycy2dTPbWDeTG1s7N5qXJQMAAJA8\n", "cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLE\n", "LQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3\n", "AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wC\n", "AACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsA\n", "AEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtADDOrl27Sj0FAJgWcQsAjNPX11fqKQDAtIhb\n", "AAAAkiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsAAACSJ24B\n", "AABIXuVUB548eTLee++9+Oc//xnnzp2LhQsXxooVK2Lz5s2xdOnSorGnT5+OP/3pT3H8+PGoqKiI\n", "tWvXxpYtW2LRokXj9vvuu+/G3r1749y5c7F48eJoa2uLtra2ceOGhoaio6MjDh06FLlcLhoaGmLL\n", "li1RX18/bmx3d3d0dHREb29v1NTURHNzc7S3t0d1dfVU7y4AAAAJmfKZ2zfffDMOHToUq1atim9/\n", "+9vR2toaH330UTzzzDPR19dXGPfpp5/Gr3/96/jXv/4V7e3tsWnTpjhy5Ej85je/iUuXLhXt88CB\n", "A/HSSy9FXV1dfOc734mVK1fGK6+8Em+++WbRuNHR0di5c2f87W9/i7a2tnjwwQdjcHAwduzYEWfO\n", "nCkae+rUqXj++edjZGQktmzZEi0tLfHOO+/ECy+88EUeHwAAABIw5TO3mzZtittvvz0qKioK25qb\n", "m+OXv/xlvPnmm/G9730vIiL++te/Ri6Xi5/85CexZMmSiIhoaGiI559/Pt57771obW2NiIhcLhd7\n", "9uyJxsbG2LZtW0REtLS0RD6fjzfeeCNaW1sjk8lERERXV1ccP348tm3bFk1NTYVjP/300/Haa6/F\n", "I488UpjTnj17IpPJxPbt26OmpiYiIm666aZ46aWX4ujRo7F69eov/GABAAAwO035zO3KlSuLwjYi\n", "YunSpXHrrbfGJ598Uth28ODBaGxsLIRtRMSqVati6dKl8cEHHxS2ffjhhzE0NBQbNmwo2ueGDRti\n", "eHg4jhw5UtjW1dUV2Wy2ELYREYsWLYrm5uY4dOhQ4YzwhQsX4tixY3HPPfcUwjYiYv369VFdXV10\n", "fAAAAOaOL/WBUvl8PgYGBmLhwoUREXH+/PkYHByM22+/fdzYhoaGOHXqVOFyb29vRMS4sfX19VFW\n", "Vla4fmzsRO+tbWhoiFwuV3hpcl9fX4yOjo7bZ0VFRSxfvrzo+AAAAMwdXypuOzs7o7+/P5qbmyMi\n", "or+/PyIiamtrx43NZrMxNDRUOMva398f5eXl4z5kqrKyMhYuXFjY19jYyfZ5+XGvdfzL9wkAAMDc\n", "8YXj9vTp0/HHP/4xVq5cGffee29ERIyMjEREjHv5csTn0Rrx+Xttx8ZONG7s9mPjrjZ2on1e7fhj\n", "1wMAADC3TPkDpS7X398fO3fujAULFsS2bduirKzs8539/9i88lORI/4dnlVVVYWxE40bGzs2buw2\n", "U93n1Y5/+T6feuqpSe/fE088MeHLoPncRGfG+Zx1MznrZnLWzeSsm8nN1LrJ5XJFn6WxbNmyop+f\n", "KZhP62a6z1epvt+ksK7m07qZLj+nJmfdzC7TPnN74cKF+O1vfxsXL16Mxx57rOgJHfv3RC//HXtv\n", "7thZ1dra2hgdHY3BwcGicSMjIzE0NFS038leUjwwMFB03Gsd3+IDAACYm6Z15jaXy8XOnTvj7Nmz\n", "8fjjj8ett95adP3ixYtj0aJF0dPTM+62J0+ejOXLlxcuj/27p6cn1q5dW9je09MT+Xx+3Nju7u7I\n", "5/OFs8QRESdOnIjq6upYunRpRETU1dVFeXl59PT0FN4HHPF5MPf29sa6desK25588slJ72c+n/fh\n", "U5PIZrOFXyrwb2O/0bRuJmbdTMy6uTrrZmI3et1cfrYtBfN93Uz2fM227zezbV3N93Uzmdm2bmYb\n", "62ZypTrbP+Uzt6Ojo/G73/0uTp48Gd///vdjxYoVE467++674+9//3t8+umnhW3Hjh2LM2fOFP0p\n", "nzvvvDMymUzs37+/6PYHDhyI6urqouBtamqKgYGBOHjwYGHb4OBgdHV1RWNjY+Fs8IIFC2LVqlXR\n", "2dkZFy9eLIzt7OyM4eHhouMDAAAwd0z5zO3u3bvj8OHDcdddd8Vnn30W77//ftH169evj4iIb37z\n", "m9HV1RXPPfdctLW1xfDwcLz11ltx2223xde//vXC+Kqqqti8eXO8/PLL8cILL8Tq1auju7s7Ojs7\n", "o729PTKZTGFsU1NTrFixIl588cU4ffp0IYrz+Xzcf//9RfNob2+PZ599Nnbs2BEtLS1x/vz52Ldv\n", "X6xZsybWrFnzhR4kAAAAZrcpx+3HH38cZWVlcfjw4Th8+HDRdWVlZYW4XbJkSWzfvj12794df/7z\n", "n6OysjIaGxtjy5Yt4z7FeMOGDVFeXh779u2Lw4cPx5IlS+Khhx6KjRs3Fo0rLy+PRx99NDo6OuLt\n", "t9+OXC4XDQ0N8fDDDxdekjymvr4+Hn/88ejo6Ijdu3dHTU1NtLS0xAMPPDCtBwYAAIB0TDlut2/f\n", "PuWd1tXVxQ9+8IMpjW1tbY3W1tZrjstkMrF169bYunXrNcfecccd8cMf/nBKxwcAACB9X/jv3AIA\n", "AMBsIW4BAABInrgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEA\n", "AEieuAUAACB54hYAAIDkiVsAAACSJ24BAABInrgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64BQAA\n", "IHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsAAACSJ24BAABInrgFAAAgeeIWAACA\n", "5IlbAAAAkiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsAAACS\n", "J24BAABInrgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEie\n", "uAUAACB54hYAAIDkiVsAAACSJ24BAABInrgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64BQAAIHni\n", "FgAAgOSJWwAAAJInbgEAAEheZaknMBsNDAxENpst9TRmLY/NeAMDAxHhsbkaj8141s21eWzGm+l1\n", "c+nSpRgaGipczmQyUVFRMSPHminzad1M9fkq9febFNbVfFo3U1XqdZMCj83EBgYGora29oYfV9xO\n", "IJvNxqlTp0o9jVkpm80WvtHxb/X19RER1s0krJuJWTdXZ91M7Eavm8uDJAXzfd1M9nzNtu83s21d\n", "zfd1M5nZtm5mG+tmcmNr50bzsmQAAACSJ24BAABInrgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64\n", "BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsAAACSJ24BAABInrgFAAAgeeIW\n", "AACA5IlbAAAAkiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsA\n", "AACSJ24BAABInrgFAAAgeeIWAACA5IlbAAAAkiduAQAASJ64BQAAIHmVpZ4AAAAzK5/Px8jISKmn\n", "ATCjnLkFAJjjRkZG4pNPPin1NABmlLgFAJgndu3aVeopAMwYcQsAME/09fWVegoAM0bcAgAAkDxx\n", "CwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQt\n", "AAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJq5zO\n", "4OHh4XjrrbfixIkTcfLkybhw4UJ897vfjXvvvbdo3O9///t4//33x91+2bJl8dOf/nTc9nfffTf2\n", "7t0b586di8WLF0dbW1u0tbWNGzc0NBQdHR1x6NChyOVy0dDQEFu2bIn6+vpxY7u7u6OjoyN6e3uj\n", "pqYmmpubo729Paqrq6dzlwEAAEjAtOJ2cHAwXn/99bjpppti+fLl8c9//nPyHVdWxtatW4u2LViw\n", "YNy4AwcOxB/+8IdoamqKTZs2xUcffRSvvPJK5HK5+MY3vlEYNzo6Gjt37oyPP/447rvvvshkMrF/\n", "//7YsWNH/PjHP46lS5cWxp46dSqef/75uPXWW2PLli1x/vz52Lt3b5w5cyYee+yx6dxlAAAAEjCt\n", "uK2trY2f//znkc1mo6enJ371q19NOra8vDzuueeeq+4vl8vFnj17orGxMbZt2xYRES0tLZHP5+ON\n", "N96I1tbWyGQyERHR1dUVx48fj23btkVTU1NERDQ3N8fTTz8dr732WjzyyCOF/e7ZsycymUxs3749\n", "ampqIiLipptuipdeeimOHj0aq1evns7dBgAAYJab1ntuKysrI5vNRkREPp+/5vjR0dG4cOHCpNd/\n", "+OGHMTQ0FBs2bCjavmHDhhgeHo4jR44UtnV1dUU2my2EbUTEokWLorm5OQ4dOhSXLl2KiIgLFy7E\n", "sWPH4p577imEbUTE+vXro7q6Oj744IOp3VkAAACSMa0zt9ORy+XiF7/4ReRyuchkMrFu3bp48MEH\n", "i97z2tvbGxERt99+e9Ft6+vro6ysLHp7ewtnf3t7eyd8b21DQ0O88847cebMmairq4u+vr4YHR0d\n", "t8+KiopYvnx5nDp16nrfVQAAAEpsRuK2trY27rvvvqivr498Ph//+Mc/Yv/+/fHxxx/H9u3bo7z8\n", "8xPG/f39UV5eHosWLSqeVGVlLFy4MPr7+wvb+vv74ytf+cq4Y42dSe7v74+6urrCbWpraycc293d\n", "fb3uJgAAALPEjMTtAw88UHR53bp1sXTp0tizZ090dXXFunXrIiJiZGQkKioqJtxHRUVF5HK5wuXJ\n", "xlZWfn4XxsaOjIwUbj/R2LHrAQAAmDtm7GXJV9q4cWP85S9/iWPHjhXitrKysvBe2SuNjIxEVVVV\n", "4XJVVdWEY8didWzsWOxONnZs3FNPPTXpXJ944okJXwLN5yY6K87nrJvJWTeTs24mZ91MbqbWTS6X\n", "i08++aRwedmyZUU/j1Mwn9bNVJ+vy08YXG3cTElhXc2ndTNdfk5NzrqZXab1gVJfRlVVVWQymRga\n", "Gipsq62tjdHR0RgcHCwaOzIyEkNDQ0WLJZvNFr1MeczAwEBhX5f/d7KxFiAAAMDcc8PO3F68eDE+\n", "++yzovfXLl++PCIienp6Yu3atYXtPT09kc/nC9ePje3u7o58Ph9lZWWF7SdOnIjq6urC37mtq6uL\n", "8vLy6Onpiebm5sK4kZGR6O3tLZw1fvLJJyedaz6f98FTk8hms4VfKPBvY7/RtG4mZt1MzLq5Outm\n", "Yjd63Vx+ti0F833dTPZ8LVu2bErjbpRSH/9K833dTMbPqauzbiZXqrP91/3M7cjISFy8eHHc9tdf\n", "fz0iItasWVPYduedd0Ymk4n9+/cXjT1w4EBUV1cXBW9TU1MMDAzEwYMHC9sGBwejq6srGhsbC++x\n", "XbBgQaxatSo6OzuL5tHZ2RnDw8NFf0oIAACAuWHaZ27ffvvtuHDhQuFlv4cPH45PP/00IiLa2tpi\n", "aGgonnnmmfja175WOJt69OjROHLkSKxduza++tWvFvZVVVUVmzdvjpdffjleeOGFWL16dXR3d0dn\n", "Z2e0t7dHJpMpjG1qaooVK1bEiy++GKdPny5EcT6fj/vvv79oju3t7fHss8/Gjh07oqWlJc6fPx/7\n", "9u2LNWvWFMU1AAAAc8O043bfvn1x7ty5iIgoKyuLQ4cOxcGDB6OsrCzWr18fmUwmGhsb4+jRo/He\n", "e+9FPp+PW265JR544IHYtGnTuP1t2LAhysvLY9++fXH48OFYsmRJPPTQQ7Fx48aiceXl5fHoo49G\n", "R0dHvP3225HL5aKhoSEefvjhQkSPqa+vj8cffzw6Ojpi9+7dUVNTEy0tLeM+xRkAAIC5Ydpx+7Of\n", "/eyaY773ve9Na5+tra3R2tp6zXGZTCa2bt0aW7duvebYO+64I374wx9Oax4AAACk6YZ9WjIAAADM\n", "FHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDy\n", "xC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkT\n", "twAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/c\n", "AgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHEL\n", "AABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0A\n", "AADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAA\n", "ACRP3AIAAJA8cQsAAEDyKks9gdloYGAgstlsqacxa3lsxhsYGIgIj83VeGzGs26uzWMz3kyvm0uX\n", "LsXQ0FDhciaTiYqKihk51kyZT+tmqs/XZ599VnT5Rj+vKayr+bRupsrPqWvz2ExsYGAgamtrb/hx\n", "xe0EstlsnDp1qtTTmJWy2WzhGx3/Vl9fHxFh3UzCupmYdXN11s3EbvS6uTxIUjDf181kz9eyZcuK\n", "ArfUz2upj3+l+b5uJuPn1NVZN5MbWzs3mpclAwAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8\n", "cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLE\n", "LQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3\n", "AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wC\n", "AACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsA\n", "AEDyxC0AAADJE7cAAAAkT9wCAACQPHELfCm7du0q9RQAAEDcAl9OX19fqacAAADiFgAAgPSJWwAA\n", "AJInbgEAAEieuAUAACB54hYAAIDkiVsAAACSJ24BAABIXuV0Bg8PD8dbb70VJ06ciJMnT8aFCxfi\n", "u9/9btx7773jxp4+fTr+9Kc/xfHjx6OioiLWrl0bW7ZsiUWLFo0b++6778bevXvj3LlzsXjx4mhr\n", "a4u2trZx44aGhqKjoyMOHToUuVwuGhoaYsuWLVFfXz9ubHd3d3R0dERvb2/U1NREc3NztLe3R3V1\n", "9XTuMgC4agKGAAAgAElEQVQAAAmY1pnbwcHBeP311+PMmTOxfPnyScd9+umn8etf/zr+9a9/RXt7\n", "e2zatCmOHDkSv/nNb+LSpUtFYw8cOBAvvfRS1NXVxXe+851YuXJlvPLKK/Hmm28WjRsdHY2dO3fG\n", "3/72t2hra4sHH3wwBgcHY8eOHXHmzJmisadOnYrnn38+RkZGYsuWLdHS0hLvvPNOvPDCC9O5uwAA\n", "ACRiWmdua2tr4+c//3lks9no6emJX/3qVxOO++tf/xq5XC5+8pOfxJIlSyIioqGhIZ5//vl47733\n", "orW1NSIicrlc7NmzJxobG2Pbtm0REdHS0hL5fD7eeOONaG1tjUwmExERXV1dcfz48di2bVs0NTVF\n", "RERzc3M8/fTT8dprr8UjjzxSOP6ePXsik8nE9u3bo6amJiIibrrppnjppZfi6NGjsXr16uncbQAA\n", "AGa5aZ25raysjGw2GxER+Xx+0nEHDx6MxsbGQthGRKxatSqWLl0aH3zwQWHbhx9+GENDQ7Fhw4ai\n", "22/YsCGGh4fjyJEjhW1dXV2RzWYLYRsRsWjRomhubo5Dhw4VzghfuHAhjh07Fvfcc08hbCMi1q9f\n", "H9XV1UXHBwAAYG647h8odf78+RgcHIzbb7993HUNDQ1x6tSpwuXe3t6IiHFj6+vro6ysrHD92NiJ\n", "3lvb0NAQuVyu8NLkvr6+GB0dHbfPioqKWL58edHxAQAAmBuue9z29/dHxOcvYb5SNpuNoaGhwlnW\n", "/v7+KC8vH/chU5WVlbFw4cLCvsbGTrbPy497reNfvk8AAADmhusetyMjIxHx+ZnSK1VWfv4W31wu\n", "Vxg70bix24+Nu9rYifZ5teOPXQ8AAMDcMa0PlJrSDv9/bF75qcgR/w7PqqqqwtiJxo2NHRs3dpup\n", "7vNqxx8b99RTT016H5544okJXwLN5yY6K87n5tO6yeVy8cknnxQuL1u2rOhr9krWzeTm07qZLutm\n", "cjO1bqb7tT0bzad1M9Xn6/ITBlcbN1NSWFfzad1Ml59Tk7NuZpfrfuZ27Ame6OW/AwMDsXDhwsJZ\n", "1dra2hgdHY3BwcGicSMjIzE0NFS0WCZ7SfHAwEDRca91fAsQAABg7rnuZ24XL14cixYtip6ennHX\n", "nTx5sujv4479u6enJ9auXVvY3tPTE/l8ftzY7u7uyOfzUVZWVth+4sSJqK6ujqVLl0ZERF1dXZSX\n", "l0dPT080NzcXxo2MjERvb2+sW7cuIiKefPLJSe9DPp/3wVOTyGazhV8o8G9jv9Gcz+vm8t/IX8m6\n", "mZh1c3XWzcRu9Lq52tf2bDTf181kz9eyZcumNO5GKfXxrzTf181k/Jy6OutmcqU623/dz9xGRNx9\n", "993x97//PT799NPCtmPHjsWZM2eK/pTPnXfeGZlMJvbv3190+wMHDkR1dXVR8DY1NcXAwEAcPHiw\n", "sG1wcDC6urqisbGxcDZ4wYIFsWrVqujs7IyLFy8WxnZ2dsbw8HDR8QEAAJgbpn3m9u23344LFy4U\n", "XvZ7+PDhQsS2tbXFggUL4pvf/GZ0dXXFc889F21tbTE8PBxvvfVW3HbbbfH1r3+9sK+qqqrYvHlz\n", "vPzyy/HCCy/E6tWro7u7Ozo7O6O9vT0ymUxhbFNTU6xYsSJefPHFOH36dCGK8/l83H///UVzbG9v\n", "j2effTZ27NgRLS0tcf78+di3b1+sWbMm1qxZ84UeKAAAAGavacftvn374ty5cxERUVZWFocOHYqD\n", "Bw9GWVlZrF+/PhYsWBBLliyJ7du3x+7du+PPf/5zVFZWRmNjY2zZsmXcpxhv2LAhysvLY9++fXH4\n", "8OFYsmRJPPTQQ7Fx48aiceXl5fHoo49GR0dHvP3225HL5aKhoSEefvjhwkuSx9TX18fjjz8eHR0d\n", "sXv37qipqYmWlpZ44IEHpnt3AQAASMC04/ZnP/vZlMbV1dXFD37wgymNbW1tjdbW1muOy2QysXXr\n", "1ti6des1x95xxx3xwx/+cErHBwAAIG0z8p5bAAAAuJHELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDy\n", "xC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkT\n", "t8wru3btKvUUAACAGSBumVf6+vpKPQUAAGAGiFsAAACSJ24BAABInrgFAAAgeeIWAACA5IlbAAAA\n", "kiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsAAACSJ24BAABI\n", "nrgFAAAgeeIWAACA5IlbAAAAkidugSTs2rWr1FMAAGAWE7dAEvr6+ko9BQAAZjFxCwAAQPLELQAA\n", "AMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtc9quXbtKPQUA\n", "AOAGELfMaX19faWeAgAAcAOIWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVsAgET4KwAAkxO3AACJ\n", "8FcAACYnbgEAAEieuAUAACB54pY5rbKystRTAAAAbgBxy5x28803l3oKAADADSBumRdeffXVUk8B\n", "AACYQeKWeeHs2bOlngIAADCDvCFxAgMDA5HNZks9jVkrhcfm0qVLMTQ0NOn1mUwmKioqrtvxBgYG\n", "IiKNx+Z6ufIxvtZj+kUfm+keJyXzcd1Ml8dmvJleN3Pha24urpvJnpepPl+fffZZ0eUb/bymsK7m\n", "4rr5svycujaPzcQGBgaitrb2hh9X3E4gm83GqVOnSj2NWSmbzRa+0aXsauH7RdTX10dEzOt1c7XH\n", "9Hqum+v93JWSdXN1c+X7zfV2o9dNal9z82XdTPa8TLZ92bJlRYFb6ue11Me/0nxZN9Pl59TVWTeT\n", "G1s7N5qXJQMAAJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/c\n", "AgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cQsAAEDyxC0AwBw3MjJS6ikAzDhxCwAwx507\n", "d67UUwCYceIWAGCOe/XVV0s9BYAZJ24BAOa4s2fPlnoKADNO3AIAAJA8cQsAAEDyxC0AAADJE7cA\n", "AAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIA\n", "AJA8cQsAAEDyxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3wKz26quvlnoKAAAkQNwC\n", "s9rZs2dLPQUAABIgbgEAAEieuAXgutu1a1eppwAAzDPiFoDrrq+vr9RTAADmGXELAABA8sQtAAAA\n", "yRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwAAACRP3AIAAJA8cZu4Xbt2lXoKAAAAJSduE9fX\n", "11fqKQBck1/EAQAzTdwCMOP8Ig4AmGniFgAAgORVzsROP/zww3juuecmvO5HP/pRrFixonD59OnT\n", "8ac//SmOHz8eFRUVsXbt2tiyZUssWrRo3G3ffffd2Lt3b5w7dy4WL14cbW1t0dbWNm7c0NBQdHR0\n", "xKFDhyKXy0VDQ0Ns2bIl6uvrr9+dBAAAYNaYkbgds3Hjxrj99tuLtt1yyy2Ff3/66afx61//OhYs\n", "WBDt7e0xPDwce/fujb6+vvjv//7vqKioKIw9cOBA/OEPf4impqbYtGlTfPTRR/HKK69ELpeLb3zj\n", "G4Vxo6OjsXPnzvj444/jvvvui0wmE/v3748dO3bEj3/841i6dOlM3mVmiVdffTW+9a1vlXoaAADA\n", "DTKjcXvHHXdEU1PTpNf/9a9/jVwuFz/5yU9iyZIlERHR0NAQzz//fLz33nvR2toaERG5XC727NkT\n", "jY2NsW3btoiIaGlpiXw+H2+88Ua0trZGJpOJiIiurq44fvx4bNu2rXDs5ubmePrpp+O1116LRx55\n", "ZCbvMrPE2bNnSz0FAADgBprx99xevHgxLl26NOF1Bw8ejMbGxkLYRkSsWrUqli5dGh988EFh24cf\n", "fhhDQ0OxYcOGottv2LAhhoeH48iRI4VtXV1dkc1mi6J60aJF0dzcHIcOHZp0LgAAAKRrRs/cvvji\n", "izE8PBzl5eVxxx13xLe+9a3Cy5TPnz8fg4OD4162HPH52dvLg7W3tzciYtzY+vr6KCsri97e3rjn\n", "nnsKYyd6b21DQ0O88847cebMmairq7tu9xEAAIDSm5Ezt5WVldHU1BTf/va34z//8z9j8+bN0dfX\n", "F//zP/8Tp06dioiI/v7+iIiora0dd/tsNhtDQ0OFs6z9/f1RXl4+7kOmKisrY+HChYV9jY2dbJ+X\n", "HxcAAIC5Y0bO3K5cuTJWrlxZuHzXXXdFU1NT/PKXv4w9e/bEY489FiMjIxERRR8aVZhU5efTyuVy\n", "UVFRESMjIxOOG7t9LpcrXJ5s7OX7BAAAYG6Z0ZclX+6WW26Ju+66Kw4ePBj5fL4QmxO9B3YsfKuq\n", "qj6fZGXlpO+VHRkZKYwbu81U9vnUU09NOtcnnnhi1v/ZoFwuF5988knh8rJly4oeh5k00Znx2eLK\n", "x2UyM/V4zfZ1cz1Ndw1Od91M9lzeyLV+o8yldXPl83bTTTfFuXPnCpen+/zN5u83pTZT66aUP1+u\n", "l7m4biZ7Xq71fM2W76UprKu5uG6ul7n0c+p6s25mlxn/QKnLLV68OC5duhTDw8OFhTDRy4QHBgZi\n", "4cKFhTOwtbW1MTo6GoODg0XjRkZGYmhoqGhRZbPZSfc5ti8AAADmlht25jYi4l//+ldUVVVFTU1N\n", "1NTUxKJFi6Knp2fcuJMnT8by5csLl8f+3dPTE2vXri1s7+npiXw+P25sd3d35PP5KCsrK2w/ceJE\n", "VFdXF/7O7ZNPPjnpPPP5fOG9wamYytnK6yGbzRZ+UZCy6/14jf1GM7V1cz1d7TG9nuvmRq31G2E+\n", "rJvLz9pGTO/5myvfb663G71uUvuamy/rZrLnZarPV6mf11If/0rzZd1M13z4OfVlWDeTK9XZ/hk5\n", "c3vlGdaIzz/F+PDhw7F69erCtrvvvjv+/ve/x6efflrYduzYsThz5kzRn/K58847I5PJxP79+4v2\n", "eeDAgaiuri4K3qamphgYGIiDBw8WzaerqysaGxsnfe8uAAAA6ZqRM7f/+7//G1VVVbFy5cpYtGhR\n", "nD59Ot55552orq6OBx54oDDum9/8ZnR1dcVzzz0XbW1tMTw8HG+99Vbcdttt8fWvf70wrqqqKjZv\n", "3hwvv/xyvPDCC7F69ero7u6Ozs7OaG9vj0wmUxjb1NQUK1asiBdffDFOnz5diOJ8Ph/333//TNxd\n", "AAAASmxG4vbuu++Ozs7O2LdvX1y8eDEWLVoUTU1N8R//8R9xyy23FMYtWbIktm/fHrt3744///nP\n", "UVlZGY2NjbFly5ZxZ1g3bNgQ5eXlsW/fvjh8+HAsWbIkHnroodi4cWPRuPLy8nj00Uejo6Mj3n77\n", "7cjlctHQ0BAPP/xw4SXJAAAAzC0zErdtbW3R1tY2pbF1dXXxgx/8YEpjW1tbo7W19ZrjMplMbN26\n", "NbZu3Tql/QIAAJC2G/ppyQAAADATxC0AAADJE7cAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3ADNo\n", "165dpZ4CAMC8IG4BZlBfX1+ppwAAMC+IWwAAAJInbgEAAEieuAXmJO91BQCYX8QtMCd5rysAwPwi\n", "bgEAAEieuAW+lMrKylJPAQAAxC3w5dx8882lngIAAIhb4PrwAU4AAJSSuAWuCx/gBABAKYlbAAAA\n", "kiduAQAASJ64BQAAIHniFgAAgOSJWwAAAJInbgEAAEieuAUAACB54hYAAIDkiVuAL2HXrl2lngIA\n", "ACFuAb6Uvr6+Uk8BAIAQtwAAAMwB4hYAYJby1geAqRO3AACzlLc+AEyduAXguqusrCz1FGBWc0YW\n", "4PoTtwBcdzfffHOppwCzmjOyANefuAVgxrz66qulngIAME+IWwBmzNmzZ0s9BQBgnhC3AMCUea8o\n", "ALOVuAUApsx7RQGYrcQtAAAAyRO3JMXL4QAAgImIW5Li5XAAAMBExC0AX5hXUwAAs4W4BeAL82oK\n", "AGC2ELcAAAAkT9wCAACQPHELAABA8sQtAAAAyRO3AAAAJE/cAgAAkDxxCwAAQPLELQAAAMkTtwDM\n", "SpcuXSr1FACAhIhbAGaloaGhUk8BkrVr165STwHghhO3AMw6/sccvpy+vr5STwHghhO3AMw6/scc\n", "AJgucQsAcINVVlaWegoAc464BUrCy06B+ezmm28u9RQA5hxxC5SEl50C+EUfwPUkboEb6tVXXy31\n", "FABmDb/oA7h+xC1zinCa/c6ePVvqKTCDfA2mz3MIQKp8msEEBgYGIpvNlnoaV3Xp0qWivwGZyWSi\n", "oqLihhy7FI/Nte7v2PVTDafr/XgNDAxERGkem1K58jkZ+3CUmpqaKC8vj/Ly8igrKytcn8lkJvy7\n", "pZM9F1fu/1rjJ7vdTH1tTHVNXm0eKa+bK+9fTU1NXLx48Zpfg7Pl+UvZTK2byb6Pjj0HKT03s/lr\n", "aqrf2yb7Grty/LXGXXl5suPNtBTWz2xeN6WS8s+pG8VjM7GBgYGora294ccVtxPIZv9fe3ceHEWZ\n", "/3H8M5PEQAgJJiC5QEmAQDjkkFtFIgTYXVFAKc8VF69lraXc0qot1/UHHiussruWWyrIKmDprsqi\n", "e4AgQgAFFAiQSDjkEhLClYSEADkmyfz+SM0sM5lhZpKZzPV+VVmY7p6ep7u//fTz7efpnlidOnXK\n", "38XwiKMLpS/ExsZaKzp/au32ent/JScnS1LQxY03WV6O4qgRFRsb63Sfe3osfL18S7n6HkfzQylu\n", "HB13RwL1+AWTto4bb527bSVQrlOecrU/7c8xZ8vbL+fs3PT38fP399sL1rjxtVC6TvkCceOcJXba\n", "GsOSAXgVL0cBAACAP5DcAvAqXo4CAAAAfyC5BQD4nOWZbAAAAF8huQUA+JzlmWwAAABfIbltIzyH\n", "CADUhQAAwHdIbtsIzyECAHUhAADwHZJb4Ar0KgEAAADBieQWuAK9Sr7R0NDg7yIAgF9x8xQAfI/k\n", "FoDPVVdX+7sIAOBXvrp5+uWXX/pkvQAQjEhuAbQJGmAA4H3l5eX+LgIABAySWwBtggZYcGIoJXB1\n", "nCMAEDhIbgEATvEcOnB1nCNAy3BjCL5AcgvALyIjI/1dhLDgrPFAowIA4E/cGIIvkNwC8Clnz9pe\n", "e+21bVyS8OSs8UCjAgAAhBqSWwA+5epZW3oQAQAA4A0ktwA84u1klB5EAAAAeAPJLQCPkIwCAAAg\n", "EJHcAgAAAACCHskt4Cc8awoAAAB4D8kt4CeuhveS/CIU8RNQAADAV0hugQAV7s+2OvsJIbSOv/cr\n", "PwEFAAB8heQWQEBy9RNCaJlA2a+MTAAAAN5GcgsAaHPhPjIBAAB4H8ktAI/wzCSAlqC3HgDgayS3\n", "ADzCM5MAWoLeegCAr5HcOmAymby2Ln+/vAWBJ1RiIlS2A0ATzmkAQLAjuXWgtLTUa+sKlJe3BKtQ\n", "HMYWKjERKtuBJqF4rsEznNOBiUdBAMB9JLdO0NALDAxj8x1iHFfiXAMCE4+CAID7SG6doKGHUEeM\n", "A0DwYNg4ALhGcguPmM1mfxcBAICww7BxAHCN5BYeaWxsbNPv41kjAAAAAO4guYVb/PV8Js8aIVC1\n", "9RBBnpEGAAC4OpJbuMXfz2e6SiR4Fgltra2HCPrrHGT0BAAACBYktwgKrhIJnkXyPnoKITF6AgAQ\n", "HGi3QCK5BSS1vufXZDLJZDJ5qTSBwd+99f7GRdIW+wMAEMjCvd2CJiS3gFrf81taWqrS0lIvlQaB\n", "gIukLfZH63GDAAAA3yK5BbygJY1WnmUEwgs3CAAA8C2SW8ALWtJo5VlGAAAAwHtIbgE/Y6gi8D+8\n", "+RwAALQUyS3CWiA0pBmqCPwPbz4HAAAtRXILvwiU3srWNqQDITkG/IHYR7jjHACAwENyC78Ild5K\n", "epm8jwZj22jtC82IfYQ7zgEACDwkt/CI2Wxu0ecCpac2EPHWZFs0GNsGLzQDAAChhuQWHmlpchsq\n", "PbW+QJIBf6KnHAAAhAqSWz8L1h7NYC13IGOfwh8sPeWMIECgoU50jHMVAJwjufWzYOnRtFxMGxsb\n", "JQVPuYMJ+xT+xAgCBJpwrxOdjargXAUc48YPJJJbuMlyMTWZTH4uiX/Rk4BQR4wDgcHV+wd4pACw\n", "xY0fSCS38JCvLqbB0qAO954EhD5iHK542jtiuW7Qq+JdvHwPcGzhwoVauHChv4sBPyG59ZJgSc5a\n", "y1cXUxrUNPwABAdPe0cs1w16VQC0haKiIhUVFfm7GPATklsv8VdyRkIUOmj4IRBQp8BdLb2pGy43\n", "g4Fww7mNQEByG+RIiEIPz1HBn8KhTqEB5h0tvanLSB0gNHFuIxCQ3IYIGmuhw9fPUZE8B5aW/na0\n", "r4VynNg3wKg/gcDEuQnAUyS3ISLY7pYx9NF/eAlJYKmvr/d3ERwKpzgJtvoz2IVa/R9q2xNIODcB\n", "eIrkNkiE2t1LV0MfQ7nXCIHNX7EXrOc4DXt4yt2h78ESW+EwlB8AgkVIJrf19fVat26dXn/9db38\n", "8st69913deTIEZ98V1s1hEP17qWz/RdOvUbOOGvYBWsSFOgsseiv2AvWc5yGPVrKVV3W1rHV2rqV\n", "m7IA4H8hmdx+/vnn2rZtm2688UZNnjxZRqNRH374oU6cOOG17/C0IRwsd6DbWrAksf5otDhr2AVr\n", "EuRtrW2I2n/e37HorI5wFXuB0qAOlHIEIvaNY+7WZZb95+sbe62tWy11CMcbAPwn5JLb4uJi7d27\n", "V+PHj9eECRM0dOhQPfzww+rUqZPWrVvn9npcJaOeNoQDpXeDXr+rc3bc/Zn4cMwca21DNBBuEphM\n", "Juszt87qCFexZ5nv7zjx982BQMa+aR3L/guW/Rgo5yQAhKOQS2737dsno9GooUOHWqdFRkZq8ODB\n", "Kioq0oULF9xaj6fJqLs9s/6+2AVCg96fXB2nQLkJcaW2PmaB2usQiqMfSktLVVFRYTOtpXWEfZwE\n", "6nFsLX/XofCfQKmf3T23wv16CwD+EHLJ7enTp5WYmKjo6Gib6ampqdb5nnC3IeXuRddXF7tQbPj7\n", "grvHKZwb0K3tHfE0Ft1d3tWxC9ZjZt9Q9lYdESy9XJ4iYWg7gXpdae2NG2efd3d7Q/XcCncNDQ3+\n", "LgIALwi55LaqqkodO3ZsNj02NtY63xOeNqRcNbB91Vho67dPens72roR5apxRAO65TztXfF0ecux\n", "s4+Ztjpm3u4RtW8oB+q5FaiJjoUnNzcCfVtay9W+8PZN27bmativqyTF2ed9/Rb/UI87b/DnTcrq\n", "6mq/fTcA7wm55La+vl4RERHNplsuKiaTyaff7+qOrrcaC84usm319klvN3rauhHV0jvvLWmcWI5V\n", "qA4TdcbT7XV3ecux81fD29e9NoF6brX1/vb0XHN1XK6sGwM1aXOX2Wy+6rXM1b7w9EbQihUrPFq+\n", "rTjbDmdJirujJHz1Fn9ncRcuN6Dc4e8by8E6AgjA/wR/TWgnMjLS4V1by0tboqKiJEkvvfSS03XM\n", "mjVLCQkJkqTrrrtOktSpUydFRESooaFBFRUV1vn2/2ZkZNisy9ly27dv1/Dhw52u3yI2NlYXL160\n", "Lmf529n67T9n/7f999vbsGGDsrOz3d5e+/XYf6+Fffldrc/ZfE+PR0v/dVZeZ8fX0fFxto9bWxZX\n", "MWOZbjAYZDabrf/a7ytXMdXafe3p9rZ0/9ifS67OGWfzXe13Z7Fpv5/s97ezcrjaXvvvc3U8XG2P\n", "q3PL0/3trXPM2fbYn2uu6kz75Z2dH462xdkxcnUuubucq222cBar9t9TU1Pj8Pvd3RfuXk8s/8bF\n", "xTncf672m6s6w9W509IYcnYdc1V++7rIEh8tjXln67dfn/3xcnWddLf94ez7W3oNcfdfZ+uzj78r\n", "p5eXl8toNLq83rW0TK7K6O657+1/A6UcV/u3rq5OBoNBiYmJLrfD3ePl6fLu7kdLjHfr1s26Dzt3\n", "7mxt+/uKoxGj8B+D2Ww2+7sQ3rR8+XJVVVXpV7/6lc30o0ePavny5br//vvVu3dvp8ltQ0ODIiIi\n", "9Pvf/74tiosQYYkn4gaeIG7QEsQNWoK4QUsQN2gpf8VOyPXcJiUl6ccff1Rtba3NS6WKi4ut8yXn\n", "O/pqPboAAAAAgMAUcs/cZmVlqbGxUXl5edZp9fX12rNnj9LS0poNrwIAAAAABL+Q67lNS0tTv379\n", "9NVXX+nSpUu69tprlZ+fr8rKSt15553+Lh4AAAAAwAdCLrmVpKlTpyo+Pl75+fmqqalR165ddf/9\n", "9+v666/3d9EAAAAAAD4QksltZGSkcnJylJOT4++iAAAAAADaQMg9cwsAAAAACD8h91NAAAAAAIDw\n", "Q88tAAAAACDoheQzty1RX1+v3Nxcm5dQZWdnKyMjw99FQxs6duyYli1b5nDeo48+qrS0NOvf586d\n", "05o1a1RUVKSIiAj16tVLEydOVIcOHZp9dteuXdq6dasqKioUFxenESNGaMSIET7bDvhOXV2dtmzZ\n", "ouLiYp08eVI1NTW66667NGjQoGbL+iJGqqurtW7dOh04cEAmk0mpqamaOHGikpOTfbK98B53Y+ez\n", "zz5Tfn5+s8937txZTz31VLPpxE7oOnnypPbs2aMff/xRFRUViomJUVpamrKzs5WYmGizLPUNLNyN\n", "G+oa2Dt79qw2btyoU6dO6eLFi4qMjFRiYqKGDx+ugQMH2iwbqHVOxNy5c+d6vOUhaOXKldq9e7du\n", "uukmDRw4UGfOnNGWLVvUo0cPxcfH+7t4aCMVFRXKz8/XyJEjNXz4cPXt29f6X2pqqqKioiRJlZWV\n", "WrJkierr63XLLbcoJSVF+fn5OnjwoAYPHiyj8X+DInbu3Kn//Oc/uuGGGzRixAiZzWZ98803ioyM\n", "VKOlGKIAABQ0SURBVPfu3f21qWihCxcu6OOPP5bZbFbnzp1VUVGhPn36KCkpyWY5X8RIY2OjPvjg\n", "Ax07dkyjRo1SZmamfvzxR3333XfKyspSTExMm+0HeM7d2Dlw4IDKysp011132dRBmZmZzRIaYie0\n", "ffHFFzp69Kh69+6tQYMGqXPnztq3b5+2b9+uzMxMayOS+gZXcjduqGtgr6SkRCdOnFBmZqb69eun\n", "7t27q6ysTN9++62MRqP1l2cCuc6h51ZScXGx9u7dq5ycHI0ePVqSdOONN+qtt97SunXrNGvWLD+X\n", "EG2te/fuysrKcjr/66+/lslk0hNPPGG9+ZGamqrly5drz549Gjp0qCTJZDJp/fr16t27t2bMmCFJ\n", "GjJkiMxmszZv3qyhQ4eqffv2vt8geE3Hjh31zDPPKDY2ViUlJVq8eLHD5XwRI/v27VNRUZFmzJhh\n", "jc9+/frpzTff1MaNGzV9+nRfbz5awd3YkSSj0djsLrk9Yif0jR49WikpKYqIiLBO69evn95++219\n", "8803mjZtmiTqG9hyN24k6hrY6tWrl3r16mUzbdiwYVq8eLHy8vJ06623SgrsOodnbtW0M41Go/VA\n", "SE0/JzR48GAVFRXpwoULfiwd/KW2tlYNDQ0O5+3fv1+9e/e26dVPT09XYmKiCgsLrdOOHTum6upq\n", "DRs2zObzw4YNU11dnQ4dOuSbwsNnIiMjFRsbK0m62vv4fBEj+/btU2xsrM2Nlw4dOqhfv346cOCA\n", "03hFYHA3diwaGxtVU1PjdD6xE/q6detmk6BIUmJiorp06aLS0lLrNOobXMnduLGgrsHVGI1GxcXF\n", "2fTGBnKdQ3Ir6fTp00pMTFR0dLTN9NTUVOt8hJfPP/9cr776ql555RUtXbpUJSUl1nkXLlzQpUuX\n", "lJKS0uxzqampOnXqlPVvS+zYL5ucnCyDwUBshShfxcjp06cdPneSmpoqk8mksrIyb20C/MxkMunV\n", "V1/V/PnztWDBAq1atUp1dXU2yxA74clsNuvixYvWIXrUN3CHfdxYUNfAkbq6Ol26dEnl5eXatm2b\n", "Dh8+rDFjxkgK/DqHYcmSqqqq1LFjx2bTLXfYq6qq2rpI8JPIyEhlZWWpV69eiomJ0blz57R161a9\n", "9957mjVrlpKTk63x4Cxmqqur1dDQoIiICFVVVcloNDZ7uD4yMlIxMTHEVojyVYxUVVXphhtucLhO\n", "y/zrrrvOi1sCf+jYsaPGjBmj5ORkmc1mHT58WDt27NCZM2c0c+ZM691zYic8FRQUqKqqStnZ2ZKo\n", "b+Ae+7iRqGvg3Nq1a5WXlyepqed28uTJuummmyQFfp1DcqumNyXbD9+Qmna81HRXC+GhW7du6tat\n", "m/XvzMxMZWVl6e2339b69ev14IMPqr6+XpJcxkxERITT2LJ8ntgKTb6KEeqq8DB+/Hibv/v376/E\n", "xEStX79e+/btU//+/SU5jweJ2AlV586d0+rVq9WtWzfrW7apb+CKo7iRqGvg3KhRo9SvXz9VVVXp\n", "+++/1+rVqxUVFaVBgwYFfJ3DsGQ17TRH47gtB8/yhlyEp4SEBGVmZurYsWMym83Wk8ydmHEWW5Zl\n", "ia3Q5KsYiYqKoq4KUyNHjpTBYNDRo0et04id8FJVVaWPPvpI7dq104wZM2QwGCRR3+DqnMWNM9Q1\n", "kJp+Dio9PV033nijHnzwQaWnp2vNmjUymUwBX+eQ3KqpW93R8NCLFy9a5yO8xcXFqaGhQXV1ddZ4\n", "cBYzMTEx1jtPHTt2VGNjoy5dumSzXH19vaqrq4mtEOWrGImNjaWuClNRUVFq3769qqurrdOInfBR\n", "U1OjDz/8ULW1tXrwwQdtjhf1DZy5Wtw4Q10DR/r27auamhqVlpYGfJ1DcispKSlJZWVlqq2ttZle\n", "XFxsnY/wdv78eUVFRSk6OlpxcXHq0KGDzUumLE6ePGkTL5b/t1+2pKREZrOZ2ApRvoqRpKQknTp1\n", "qtmbdouLi3XNNdc0+11ChI7a2lpdvnzZ5rklYic8mEwmffTRRyovL9f999+vLl262MynvoEjruLG\n", "GeoaOGLpPTUYDAFf55DcSsrKylJjY6P1wWmp6SDu2bNHaWlpiouL82Pp0Jbs7yxJTW9vO3jwoDIy\n", "MqzT+vbtqx9++EGVlZXWaUePHlVZWZnNK8x79Oih9u3ba8eOHTbr3Llzp6655ppmvyWG0OGLGMnK\n", "ytLFixe1f/9+67RLly5p37596t27t9PnWhA86uvrm91olaRNmzZJknr27GmdRuyEvsbGRq1YsUIn\n", "T57UPffco7S0NIfLUd/gSu7EDXUNHHHUDm5oaFB+fr5iYmKsL3QK5DonYu7cuXPdWjKExcXF6dy5\n", "c9q+fbvq6up0/vx5rV27VqWlpZo2bZo6derk7yKijXz00UcqLCxURUWFSktLVVBQoFWrVikqKkp3\n", "33239RX6Xbt21e7du62/5XXs2DGtWbNGiYmJ+tnPfmZ9w2BERISio6P17bff6uzZs6qpqdF3332n\n", "goIC3XbbbTYJM4LHd999pyNHjuj48eMqKSmRwWBQWVmZjh8/rqSkJEVGRvokRjp37qwjR44oLy9P\n", "jY2NOnv2rFavXq3a2lqb+ETgchU7ly5d0l//+ldVVlaqvLxcxcXF2rRpk/Lz89WrVy+bN50SO6Fv\n", "zZo1ys/PV+/evdWpUyedOXPG5j9Lrwf1Da7kTtxUVlZS16CZlStXKi8vTxUVFSorK9OhQ4e0atUq\n", "nTt3Tj/96U+tP9UTyHWOwezOL8mHgfr6em3YsEEFBQWqqalR165dlZ2dTfIRZiwnXHl5uWpra9Wh\n", "Qwelp6dr7NixSkhIsFn27NmzWrt2rU6cOKHIyEj16tVLEydObPa6c0nKy8vTtm3bdP78ecXHx2v4\n", "8OEaOXJkW20WvOwvf/mLKioqJMn6cg6z2SyDwaA5c+ZYb4j5Ikaqq6u1bt06HThwQCaTSampqcrJ\n", "yXH4e3MIPK5ip127dlq9erWKi4tVVVUls9mshIQEDRw4UKNHj7Y2GK5E7ISupUuX6vjx482G6UlN\n", "8fN///d/1r+pb2DhTtzU1NRQ16CZvXv3ateuXTp79qwuX76s6OhopaWladSoUUpPT7dZNlDrHJJb\n", "AAAAAEDQ45lbAAAAAEDQI7kFAAAAAAQ9klsAAAAAQNAjuQUAAAAABD2SWwAAAABA0CO5BQAAAAAE\n", "PZJbAAAAAEDQI7kFAAAAAAQ9klsAAAAAQNAjuQUAAAAABD2SWwAAAABA0CO5BQDAD+bOnSuj0ah5\n", "8+a1el233XabjEaj9b+oqCglJCSob9++mjFjhpYuXapLly55odQAAASuSH8XAACAcGYwGLy2rpkz\n", "Z+qGG26Q2WxWVVWVjhw5ovXr12vFihV67rnn9Le//U2TJ0/22vcBABBISG4BAAgRM2fO1K233moz\n", "rba2VgsXLtQLL7ygqVOnat26dbrlllv8VEIAAHyHYckAgLCwdOlSTZ8+Xenp6YqJiVF8fLxuvvlm\n", "ffjhhw6X37Fjh3JyctSxY0fFx8drwoQJ+vbbb63DiTdv3tzsMwcOHNDMmTPVrVs3RUdHKykpSQ88\n", "8IB++OEHt8s5c+ZMGY1GHT9+XIsWLdKAAQPUvn17JSUl6YknntCFCxc82u7o6Gg999xzev7551VX\n", "V6c5c+bYzC8pKdGLL76oMWPGKCkpSdHR0UpNTdUDDzyg/fv3N9s+o9Go7Oxsp983YMAAXXPNNTpz\n", "5ox12rJlyzR69Gh16dJF7du3V/fu3TVp0iR98sknHm0LAABXQ88tACAszJ49W/3799dtt92m5ORk\n", "lZaWavXq1XrooYd08OBBvfjii9ZlN2/erJycHJnNZk2bNk0ZGRkqKCjQuHHjnCZ2a9as0bRp09TQ\n", "0KA77rhDPXv2VFFRkVauXKlVq1YpNzdXgwcPdru8zz77rL788ktNmTJFkyZN0oYNG/Tuu+/q8OHD\n", "Wr9+vcfb/8wzz+iPf/yj8vPztW/fPmVlZVm3dcGCBcrOztaQIUMUGxurH374QStWrNC///1vbdmy\n", "RQMHDpQk9enTR+PGjVNubq4OHTqkXr162XzH1q1bVVhYqLvvvltdu3aVJD333HOaP3++0tPTde+9\n", "9yo+Pl4lJSXasWOHVqxYoRkzZni8LQAAOEJyCwAIC4WFherRo4fNNJPJpMmTJ2v+/Pl68sknlZKS\n", "osbGRs2aNUsmk0mrV6/WxIkTrcsvWrRIv/zlL5s9J3v+/Hndd999io2N1ebNm9WnTx+b7x05cqQe\n", "ffRR5eXluV3e7du3a+/evUpLS5MkNTQ0KDs7W7m5udqxY4eGDRvm0fbHxsZq6NCh2rJli7Zv325N\n", "bm+//XadPXtWHTp0sFm+oKBAY8aM0W9/+1utXr3aOn327NnKzc3V4sWL9dprr9l8ZvHixZKkJ554\n", "wjpt0aJFSktL0969e9WuXTub5cvKyjzaBgAAroZhyQCAsGCf2EpSVFSUZs+erfr6em3YsEFSU+/j\n", "kSNHNG7cOJvEVpIef/xx9e7dW2az2Wb68uXLVVlZqXnz5tkktpLUr18/Pfroo9q9e3ezYb5X88IL\n", "L1gTW0mKiIjQI488IqlpyHRLpKamSpJKS0ut07p06dIssZWkgQMHWntpGxoarNPvuusupaSkaOnS\n", "paqrq7NOr6io0CeffKKePXvq9ttvt043GAyKioqS0di8yZGYmNii7QAAwBF6bgEAYeHEiRNasGCB\n", "1q9fr6KiIlVXV9vMP3nypCRp9+7dkqSbb7652ToMBoNGjRrV7Bnabdu2SZL27NmjuXPnNvucZfn9\n", "+/erb9++bpX3pptuajbNkuyeP3/erXXYsyTl9j3Pq1at0jvvvKOdO3eqrKxM9fX11nkGg0GlpaXW\n", "YcYRERF67LHHNG/ePP3zn//UfffdJ0n64IMPVFNTo8cff9xm3Q888IDefPNNZWVlacaMGRo7dqxG\n", "jhyp+Pj4Fm0DAADOkNwCAELe0aNHNXz4cFVUVOjWW2/VpEmTFB8fr4iICB07dkzLli1TbW2tJKmy\n", "slKSrMmcPUfTLcNr3333XadlMBgMHv3WbKdOnZpNi4xsumxf2ZPqiZKSEklNvbUWb7zxhp5++mkl\n", "JCRowoQJ6t69u2JiYmQwGPTZZ58pPz/fum8sHnvsMb3yyitatGiRNbldvHixoqOjrb3LFn/+85+V\n", "np6u999/X/Pnz9f8+fMVGRmpn/zkJ1q4cKEyMjJatC0AANgjuQUAhLw//elPKi8v19KlS/Xzn//c\n", "Zt7f//53LVu2zPp3XFycJNm87fdKjqZbeiELCgrUv39/bxXbq6qqqpSXlyeDwaARI0ZIkurr6zV3\n", "7lwlJydr165dzRL3LVu2OFxXSkqKpkyZopUrV+rgwYMqKytTYWGh7r333mZDjY1Go+bMmaM5c+bo\n", "3Llz+uabb/SPf/xDn376qQoLC1VYWKhrrrnGNxsNAAgrPHMLAAh5hw8flsFg0PTp05vN27Rpk83f\n", "Q4YMkSR9/fXXzZZtbGzU1q1bm00fNWqUJDn8eaBA8dprr6mmpkZDhgxRZmampKZnbysrKzV69Ohm\n", "ie3Fixe1a9euZkOYLWbPni2p6YVRjl4k5UiXLl00depUffzxxxo3bpyOHDmiwsLC1m4aAACSSG4B\n", "AGGgR48eMpvNys3NtZm+du1aLVmyxGbamDFjlJGRodzcXK1Zs8Zm3uLFi3Xo0KFmCd8jjzyiTp06\n", "ad68eQ5f9tTY2KiNGzd6Z2M8VFNToz/84Q965ZVXFB0drTfeeMM677rrrlNMTIx27txpM2TaZDJp\n", "zpw5V32bcXZ2tjIzM7Vs2TJ9+umn6tOnj8aOHWuzTF1dncPeX5PJpPLychkMBsXExHhhKwEAYFgy\n", "ACAMzJ49W++//77uuece3X333UpOTtbevXu1du1azZgxQx9//LF1WYPBoCVLlmjSpEmaMmWKpk+f\n", "rvT0dBUUFOirr77S5MmT9cUXX9i8/TchIUErVqzQ1KlTNXLkSN1+++3KysqSwWBQUVGRtm3bpvPn\n", "z+vy5cs+3c7333/f+tbnqqoqHT16VJs3b9b58+eVkpKi9957T6NHj7YubzQa9etf/1rz58/XgAED\n", "NGXKFNXV1Sk3N1cVFRXWtyU78+STT+rpp5+WpGYvkpKky5cv65ZbblHPnj01ZMgQXX/99aqpqdG6\n", "det04MAB3XnnndZeZAAAWovkFgAQ8gYMGKDc3Fw9//zzWrVqlerr6zVo0CB99tlnio+Pt0luJWns\n", "2LHatGmTdXlJGjlypDZu3KgPPvhA0v+ezbXIzs5WQUGBXn/9da1du1Zff/21oqOjlZKSovHjxzcb\n", "Em0wGBwO+XU2/Wosn7E8OxwREaHY2FglJycrJydHkydP1j333KP27ds3++xLL72kLl26aMmSJVq8\n", "eLE6deqkCRMm6OWXX9YLL7xw1bI8/PDD+s1vfqN27drp4YcfbjY/NjZWCxYsUG5urrZt26Z//etf\n", "iouLU0ZGht555x394he/8Gg7AQC4GoPZ/sf6AACAU2PGjNGOHTtUWVnpMFkMJxs2bND48eP10EMP\n", "2byUCwAAf+CZWwAA7FRXV6uioqLZ9KVLl2rbtm3KyckJ+8RWanpJlSQ99dRTfi4JAAAMSwYAoJnj\n", "x49r8ODBysnJUUZGhurr67V7925t2bJF1157rRYuXOjvIvrN999/r//+97/Ky8vT2rVrdccdd2jY\n", "sGH+LhYAAAxLBgDAXkVFhZ599llt2rRJp0+fVm1trZKTkzV+/Hj97ne/U48ePfxdRL9ZtmyZHnnk\n", "EcXHx2vixIl66623lJCQ4O9iAQBAcgsAAAAACH48cwsAAAAACHoktwAAAACAoEdyCwAAAAAIeiS3\n", "AAAAAICgR3ILAAAAAAh6JLcAAAAAgKBHcgsAAAAACHoktwAAAACAoEdyCwAAAAAIev8PGt3yLzRt\n", "BEMAAAAASUVORK5CYII=\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "%matplotlib inline\n", "from ggplot import *\n", "import warnings\n", "\n", "# ggplot usage of pandas throws a future warning\n", "warnings.filterwarnings('ignore')\n", "\n", "print ggplot(aes(x='ageInDays'), data=ageDF) + \\\n", " geom_histogram(binwidth=7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There appears to be some pattern to this histogram, with rising and falling activity occurring about every 400 days. There are also very large spikes in this data. I hypothesize that these are due to single users making many edits in this concentrated map area in a short period of time." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false }, "outputs": [], "source": [ "os.killpg(pro.pid, signal.SIGTERM) # Send the signal to all the process groups, killing the MongoDB instance" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }