-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathnotes_module.txt
More file actions
217 lines (206 loc) · 7.45 KB
/
notes_module.txt
File metadata and controls
217 lines (206 loc) · 7.45 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
urllib
overview
standard module, internet access
access block
servers may block connections for protection and efficiency
can bypass by using fake User-Agent but not recommended
use provided API
urllib.request.urlretrieve(url,savename) : get file(picture only?) from internet
urllib.request, urllib.parse
url = '...', site address
values = {..}, post parameters
myheaders = {}
myheaders['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'
data = urllib.parse.urlencode(values)
data = data.encode('utf-8')
req = urllib.request.Request(url, data, headers=myheaders)
resp = urllib.request.urlopen(req)
respData = resp.read()
matplotlib
overview
gui data plotting
matplotlib.pyplot
plot() scatter() bar() hist()
show() figure() subplot() clf()
title() xlabel() ylabel() legend() grid()
matplotlib.style
use() available
sqlite3
overview
standard module, sql database
local SQL database, database per file, no server & users & permissions
unlike MySQL is sensitive to casing
connect to database and get cursor
conn = sqlite3.connect('...')
c = conn.cursor()
execute SQL query : c.execute()
create table
c.execute('CREATE TABLE IF NOT EXISTS ...')
read data
c.execute("SELECT ... FROM ...")
c.fetchall()
modify data : permanent so take care! (rollback)
write data : c.execute('''INSERT INTO ...(...) VALUES (?...)''', (...))
update data : c.execute("UPDATE ...SET ... WHERE ...")
delete data : c.execute("DELETE FROM ... WHERE ...")
update db : insert, update, delete
conn.commit()
close connection
c.close()
conn.close()
PIL : image manipulation
Image
new() : create new image
open() save() : open image from, save image to file
merge() : merge channels
img = ....open(...)
show() : show image using OS default viewer
size format mode
crop() : return cropped image, area = 4 coordinate tuple leftop rightbottom
paste() : return combined image, image to add and target area of original image
split() : split image into channels, r g b tuple
resize() : change image size
transpose() : transform image
convert() : convert image property, "L" "RGB" "CMYK"
filter() : PIL.ImageFilter
ImageFilter : apply effects to image
BLUR SHARPEN ...
ImageTk : Tkinter related
numpy : number crunching, data science module
numpy
array() : convert list to numpy array
transpose() : transpose narray
numpy.random : random generation
seed() : same seed gives same sequence of generated numbers per execution
rand() randint() : generate random numbers
heapq
smaller version of pandas?
nlargest() nsmallest() : find n largest/smallest item in complex list
top_three = heapq.nlargest(3, [32, 44, 76, 89, 67, 26, 92, 54])
print(top_three) -> [92, 89, 76]
stocks = [
{'ticker': 'AAPL', 'price': 291},
{'ticker': 'GOOG', 'price': 800},
{'ticker': 'F', 'price': 54},
{'ticker': 'MSFT', 'price': 313}
]
bottom_two = heapq.nsmallest(2, stocks, key=lambda x: x['price'])
print(bottom_two) -> [{'ticker': 'F', 'price': 54}, {'ticker': 'AAPL', 'price': 291}]
pandas : number crunching, data science
import pandas as pd
Create dataframe
df = pd.DataFrame(data structure)
Set index
df.set_index("Date", inplace=True)
Get data, column
df["Visitors"]
df[["Visitors", "Bounce_Rate"]]
File load/save
CSV
df.to_csv("3.csv", header=False)
pd.read_csv("3.csv", names=["Data", "House_Price"], index_col=0)
HTML : need lxml, html5lib, BeautifulSoup4 module
df.to_html("3.html")
pd.read_html()
Pickle : pandas version
HPI_data.to_pickle("state_list2.pickle")
HPI_data2 = pd.read_pickle("state_list2.pickle")
Header
df.rename(columns={"House_Price": "Price"}, inplace=True)
Combine dataframes
Simple
pd.concat([df1, df2, df3])
df1.append(df2)
Join/merge
df8 = df5.join(df6, how="inner") : combine on index, used at combining tables with same columns
pd.merge(df5, df6, on="Year", how="outer") : combine on columns, used at combining tables with same index
left, right, inner(and), outer(or)
Info
df.head() tail()
df.corr() : get correlation data
df.describe() : get overview data
df.resample(...) : get resampled data, http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
df.resample("A").mean() : average, end of year yearly
df.resamoke("AS").ohlc() : open-high-low-close, start of year yearly
Missing data, NaN
df.dropna() : drop rows with NaN
df.fillna() : fill NaN with other row values
Rolling stats
df.rolling(...).mean()
df.rolling(...).std()
df.rolling(...).corr()
Comparision
df = df[df["left"] > df["right"]]
Function mapping, apply
from statistics import mean
def direction(now, future):
if future > now:
return 1
else:
return 0
def average(values):
return mean(values)
hpi["Label"] = list(map(direction, hpi["USA"], hpi["USA_F"]))
hpi["MORG_M"] = hpi["MORG"].rolling(10).apply(average)
web module : web analytics
requests
source = requests.get() : get web source
text = source.text : convert web source to text
bs4 : BeautifulSoup
soup = BeautifulSoup() : convert web text to object
soup.findAll() : search through web object
standard module
os
file manipulation
path:getcwd()
listdir() mkdir() makedirs() rename() rmdir() system()
sys
operating system command related?
can used to communicate between programs of different language
stderr.write() stdout.write() argv
re
regular expression for parsing string, need practice
* vs *? : need to think over greedy vs non-greedy counts
findall()
tkinter
gui creation
tkinter is wrapper around tk which is wrapper around tcl
Window Button Menu Event Image Text
subprocess
access shell commands
can used to communicate between programs of different language
complement of sys module?
call() check_output()
socket
network related, ftplib ssl
encode() decode() bytestrings
gethostbyname()
socket()
connect() accept() close() bind() listen()
send() recv() sendall()
pickle
save python object with compression
no security
dump() load()
struct
convert data to/from bytes format, used in network
pack() unpack() calculate()
collections
get statistics about data structure
Counter()
sklean : machine learning
need to install Numpy+MKL, scipy (http://www.lfd.uci.edu/~gohlke/pythonlibs/)
argparse : argument parser, often used in CLI
import argpasre
def parse():
parser = argparse.ArgumentParser()
parser.add_argument("--x", type=float, default=1.0, help="first number")
parser.add_argument("--y", type=float, default=1.0, help="second number")
parser.add_argument("--operation", type=str, default="add", help="operation (add, sub, mul, div)")
args = parser.parse_args()
print(args)
timit : measures code execution speed, test module (all code must be self contained)
import timit
print(timeit.timeit('''
...code...
''', number=1000))