forked from DS-100/textbook
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy patheda_temp.html
More file actions
64 lines (54 loc) · 5.69 KB
/
eda_temp.html
File metadata and controls
64 lines (54 loc) · 5.69 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
<div id="ipython-notebook">
<div class="buttons">
<button class="interact-button js-nbinteract-widget">
Show Widgets
</button>
<a class="interact-button" href="http://data100.datahub.berkeley.edu/user-redirect/git-pull?repo=https://github.com/DS-100/textbook&subPath=notebooks/ch05/eda_temp.ipynb">Open on DataHub</a></div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell"
style="display:none;"
>
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># HIDDEN</span>
<span class="c1"># Clear previously defined variables</span>
<span class="o">%</span><span class="k">reset</span> -f
<span class="c1"># Set directory for data loading to work properly</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="n">os</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">expanduser</span><span class="p">(</span><span class="s1">'~/notebooks/ch05'</span><span class="p">))</span>
</pre></div></div></div></div></div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p><h1>Table of Contents<span class="tocSkip"></span></h1></p>
<div class="toc"><ul class="toc-item"><li><span><a href="#Temporality" data-toc-modified-id="Temporality-1">Temporality</a></span></li></ul></div></div></div></div>
<div class="cell border-box-sizing code_cell rendered">
<div class="input">
<div class="inner_cell"
style="display:none;"
>
<div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># HIDDEN</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="nn">sns</span>
<span class="o">%</span><span class="k">matplotlib</span> inline
<span class="kn">import</span> <span class="nn">ipywidgets</span> <span class="k">as</span> <span class="nn">widgets</span>
<span class="kn">from</span> <span class="nn">ipywidgets</span> <span class="k">import</span> <span class="n">interact</span><span class="p">,</span> <span class="n">interactive</span><span class="p">,</span> <span class="n">fixed</span><span class="p">,</span> <span class="n">interact_manual</span>
<span class="kn">import</span> <span class="nn">nbinteract</span> <span class="k">as</span> <span class="nn">nbi</span>
<span class="n">sns</span><span class="o">.</span><span class="n">set</span><span class="p">()</span>
<span class="n">sns</span><span class="o">.</span><span class="n">set_context</span><span class="p">(</span><span class="s1">'talk'</span><span class="p">)</span>
<span class="n">pd</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">display</span><span class="o">.</span><span class="n">max_rows</span> <span class="o">=</span> <span class="mi">7</span>
<span class="n">pd</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">display</span><span class="o">.</span><span class="n">max_columns</span> <span class="o">=</span> <span class="mi">8</span>
</pre></div></div></div></div></div>
<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h2 id="Temporality">Temporality<a class="anchor-link" href="#Temporality">¶</a></h2><p>Temporality refers to how the data are situated in time and specifically to the date and time fields in the dataset. We seek to understand the following traits about these fields:</p>
<p><strong>What is the meaning of the date and time fields in the dataset?</strong></p>
<p>In the Calls and Stops dataset, the datetime fields represent when the call or stop was made by the police. However, the Stops dataset also originally had a datetime field recording when the case was entered into the database which we took out during data cleaning since we didn't think it would be useful for analysis.</p>
<p>In addition, we should be careful to note the timezone and Daylight Savings for datetime fields especially when dealing with data that comes from multiple locations.</p>
<p><strong>What representation do the date and time fields have in the data?</strong></p>
<p>Although the US uses the MM/DD/YYYY format, many other countries use the DD/MM/YYYY format. There are still more formats in use around the world and it's important to recognize these differences when analyzing data.</p>
<p>In the Calls and Stops dataset, the dates came in the MM/DD/YYYY format.</p>
<p><strong>Are there strange timestamps that might represent null values?</strong></p>
<p>Some programs use placeholder datetimes instead of null values. For example, Excel's default date is Jan 1st, 1990 and on Excel for Mac, it's Jan 1st, 1904. Many applications will generate a default datetime of 12:00am Jan 1st, 1970 or 11:59pm Dec 31st, 1969 since this is the <a href="https://www.wikiwand.com/en/Unix_time#/Encoding_time_as_a_number">Unix Epoch for timestamps</a>. If you notice multiple instances of these timestamps in your data, you should take caution and double check your data sources. Neither Calls nor Stops dataset contain any of these suspicious values.</p></div></div></div></div>