Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Open sidebar
examples
Data Wrangling and Analysis with Python
Commits
38de40c9
Commit
38de40c9
authored
Jun 21, 2017
by
O'Reilly Media, Inc.
Browse files
Initial commit
parents
Changes
48
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
1476 additions
and
0 deletions
+1476
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 1/py3_requirements.txt
...ith Python - Working Files/Chapter 1/py3_requirements.txt
+82
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 1/requirements.txt
...is with Python - Working Files/Chapter 1/requirements.txt
+86
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/GHE_DALY_Global_2000_2012.xls
...n - Working Files/Chapter 2/GHE_DALY_Global_2000_2012.xls
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/WEF_GlobalCompetitivenessReport_2014-15.pdf
...les/Chapter 2/WEF_GlobalCompetitivenessReport_2014-15.pdf
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_basic_files.py
...with Python - Working Files/Chapter 2/chp2_basic_files.py
+14
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_excel_files.py
...with Python - Working Files/Chapter 2/chp2_excel_files.py
+13
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_pdf_files.py
...s with Python - Working Files/Chapter 2/chp2_pdf_files.py
+17
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_tweepy_api.py
... with Python - Working Files/Chapter 2/chp2_tweepy_api.py
+40
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_weather_api.py
...with Python - Working Files/Chapter 2/chp2_weather_api.py
+28
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_web_scraping.py
...ith Python - Working Files/Chapter 2/chp2_web_scraping.py
+22
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/climate_change_download_0.xls
...n - Working Files/Chapter 2/climate_change_download_0.xls
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/climate_change_download_0.xlsx
... - Working Files/Chapter 2/climate_change_download_0.xlsx
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/eu_revolving_loans.csv
...h Python - Working Files/Chapter 2/eu_revolving_loans.csv
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 2/example_conf.cfg
...is with Python - Working Files/Chapter 2/example_conf.cfg
+11
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Concatenating Datasets.ipynb
...on - Working Files/Chapter 3/Concatenating Datasets.ipynb
+213
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Data Types.ipynb
...is with Python - Working Files/Chapter 3/Data Types.ipynb
+221
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Exploring Data Structures.ipynb
...- Working Files/Chapter 3/Exploring Data Structures.ipynb
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Exports & Imports Transformation.ipynb
...ng Files/Chapter 3/Exports & Imports Transformation.ipynb
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Filtering Datasets.ipynb
...Python - Working Files/Chapter 3/Filtering Datasets.ipynb
+0
-0
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Joining Datasets.ipynb
...h Python - Working Files/Chapter 3/Joining Datasets.ipynb
+729
-0
No files found.
Data Wrangling and Analysis with Python - Working Files/Chapter 1/py3_requirements.txt
0 → 100755
View file @
38de40c9
amqp==1.4.9
anyjson==0.3.3
appnope==0.1.0
backports-abc==0.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
billiard==3.3.0.23
bokeh==0.11.1
celery==3.1.23
certifi==2016.2.28
cffi==1.5.2
cryptography==1.3.1
cssselect==0.9.1
cycler==0.10.0
decorator==4.0.9
entrypoints==0.2
enum34==1.1.3
et-xmlfile==1.0.1
futures==3.0.5
fuzzywuzzy==0.10.0
gnureadline==6.3.3
idna==2.1
ipaddress==1.0.16
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==5.0.0
jdcal==1.2
Jinja2==2.8
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0
kombu==3.0.35
lxml==3.6.0
MarkupSafe==0.23
matplotlib==1.5.1
memory-profiler==0.41
mistune==0.7.2
nbconvert==4.2.0
nbformat==4.0.1
ndg-httpsclient==0.4.0
nltk==3.2.1
notebook==4.2.0
numpy==1.11.0
oauthlib==1.0.3
openpyxl==2.3.5
pandas==0.18.0
pandas-datareader==0.2.1
pathlib2==2.1.0
pexpect==4.0.1
pickleshare==0.7.2
Pillow==3.2.0
psutil==4.2.0
ptyprocess==0.5.1
pyasn1==0.1.9
pycparser==2.14
Pygments==2.1.3
pyOpenSSL==16.0.0
pyparsing==2.1.1
python-dateutil==2.5.3
python-Levenshtein==0.12.0
pytz==2016.3
PyYAML==3.11
pyzmq==15.2.0
qtconsole==4.2.1
redis==2.10.5
requests==2.10.0
requests-file==1.4
requests-oauthlib==0.6.1
scipy==0.17.1
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
SQLAlchemy==1.0.12
terminado==0.6
tornado==4.3
traitlets==4.2.1
tweepy==3.5.0
widgetsnbextension==1.0.0
xlrd==0.9.4
Data Wrangling and Analysis with Python - Working Files/Chapter 1/requirements.txt
0 → 100755
View file @
38de40c9
amqp==1.4.9
anyjson==0.3.3
appnope==0.1.0
backports-abc==0.4
backports.shutil-get-terminal-size==1.0.0
backports.ssl-match-hostname==3.5.0.1
billiard==3.3.0.23
bokeh==0.11.1
celery==3.1.23
certifi==2016.2.28
cffi==1.5.2
configparser==3.3.0.post2
cryptography==1.3.1
cssselect==0.9.1
cycler==0.10.0
decorator==4.0.9
entrypoints==0.2
enum34==1.1.3
et-xmlfile==1.0.1
functools32==3.2.3.post2
futures==3.0.5
fuzzywuzzy==0.10.0
gnureadline==6.3.3
idna==2.1
ipaddress==1.0.16
ipykernel==4.3.1
ipython==4.2.0
ipython-genutils==0.1.0
ipywidgets==5.0.0
jdcal==1.2
Jinja2==2.8
jsonschema==2.5.1
jupyter==1.0.0
jupyter-client==4.2.2
jupyter-console==4.1.1
jupyter-core==4.1.0
kombu==3.0.35
lxml==3.6.0
MarkupSafe==0.23
matplotlib==1.5.1
memory-profiler==0.41
mistune==0.7.2
nbconvert==4.2.0
nbformat==4.0.1
ndg-httpsclient==0.4.0
nltk==3.2.1
notebook==4.2.0
numpy==1.11.0
oauthlib==1.0.3
openpyxl==2.3.5
pandas==0.18.0
pandas-datareader==0.2.1
pathlib2==2.1.0
pdfminer==20110515
pdftables==0.0.4
pexpect==4.0.1
pickleshare==0.7.2
Pillow==3.2.0
psutil==4.2.0
ptyprocess==0.5.1
pyasn1==0.1.9
pycparser==2.14
Pygments==2.1.3
pyOpenSSL==16.0.0
pyparsing==2.1.1
python-dateutil==2.5.3
python-Levenshtein==0.12.0
pytz==2016.3
PyYAML==3.11
pyzmq==15.2.0
qtconsole==4.2.1
redis==2.10.5
requests==2.10.0
requests-file==1.4
requests-oauthlib==0.6.1
scipy==0.17.1
simplegeneric==0.8.1
singledispatch==3.4.0.3
six==1.10.0
SQLAlchemy==1.0.12
terminado==0.6
tornado==4.3
traitlets==4.2.1
tweepy==3.5.0
widgetsnbextension==1.0.0
xlrd==0.9.4
Data Wrangling and Analysis with Python - Working Files/Chapter 2/GHE_DALY_Global_2000_2012.xls
0 → 100755
View file @
38de40c9
File added
Data Wrangling and Analysis with Python - Working Files/Chapter 2/WEF_GlobalCompetitivenessReport_2014-15.pdf
0 → 100755
View file @
38de40c9
File added
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_basic_files.py
0 → 100755
View file @
38de40c9
from
__future__
import
print_function
import
csv
import
pandas
as
pd
my_reader
=
csv
.
DictReader
(
open
(
'data/eu_revolving_loans.csv'
,
'r'
))
for
line
in
my_reader
:
print
(
line
)
df
=
pd
.
read_csv
(
'data/eu_revolving_loans.csv'
,
header
=
1
)
print
(
df
)
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_excel_files.py
0 → 100755
View file @
38de40c9
from
__future__
import
print_function
import
pandas
as
pd
from
openpyxl
import
load_workbook
wb
=
load_workbook
(
filename
=
'data/climate_change_download_0.xlsx'
)
ws
=
wb
.
get_sheet_by_name
(
'Data'
)
for
row
in
ws
.
rows
:
for
cell
in
row
:
print
(
cell
.
value
)
df
=
pd
.
read_excel
(
'data/climate_change_download_0.xlsx'
)
print
(
df
)
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_pdf_files.py
0 → 100755
View file @
38de40c9
import
pdftables
my_pdf
=
open
(
'data/WEF_GlobalCompetitivenessReport_2014-15.pdf'
,
'rb'
)
chart_page
=
pdftables
.
get_pdf_page
(
my_pdf
,
29
)
table
=
pdftables
.
page_to_tables
(
chart_page
)
titles
=
zip
(
table
[
0
][
0
],
table
[
0
][
1
])[:
5
]
titles
=
[
''
.
join
([
title
[
0
],
title
[
1
]])
for
title
in
titles
]
print
(
titles
)
all_rows
=
[]
for
row_data
in
table
[
0
][
2
:]:
all_rows
.
extend
([
row_data
[:
5
],
row_data
[
5
:]])
print
(
all_rows
)
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_tweepy_api.py
0 → 100755
View file @
38de40c9
""" Simple tweepy stream listener for twitter API. """
from
__future__
import
print_function
import
tweepy
try
:
from
configparser
import
ConfigParser
except
ImportError
:
from
ConfigParser
import
ConfigParser
class
PythonListener
(
tweepy
.
StreamListener
):
""" Very simple tweepy stream listener. """
def
on_status
(
self
,
tweet
):
print
(
tweet
.
text
)
def
on_error
(
self
,
msg
):
print
(
'Error: %s'
,
msg
)
def
on_timeout
(
self
):
print
(
'tweepy timeout. waiting before next poll'
)
sleep
(
30
)
def
get_config
():
""" Return my config object. """
conf
=
ConfigParser
()
conf
.
read
(
'config/prod.cfg'
)
return
conf
config
=
get_config
()
auth
=
tweepy
.
OAuthHandler
(
config
.
get
(
'twitter'
,
'consumer_key'
),
config
.
get
(
'twitter'
,
'consumer_secret'
))
auth
.
set_access_token
(
config
.
get
(
'twitter'
,
'access_token'
),
config
.
get
(
'twitter'
,
'access_token_secret'
))
my_listener
=
PythonListener
()
my_stream
=
tweepy
.
Stream
(
auth
=
auth
,
listener
=
my_listener
)
my_stream
.
filter
(
track
=
[
'#python'
,
'python'
])
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_weather_api.py
0 → 100755
View file @
38de40c9
""" Simple weather data from weathermap API. """
from
__future__
import
print_function
from
pprint
import
pprint
import
requests
try
:
from
configparser
import
ConfigParser
except
ImportError
:
from
ConfigParser
import
ConfigParser
def
upcoming_forecast
(
api_key
,
lat
,
lon
):
""" Pulls upcoming forecast based on latitude and longitude. """
resp
=
requests
.
get
(
'http://api.openweathermap.org/data/2.5/forecast'
,
params
=
{
'lat'
:
lat
,
'lon'
:
lon
,
'appid'
:
api_key
,
'units'
:
'metric'
})
return
resp
.
json
()
def
get_config
():
""" Return my config object. """
conf
=
ConfigParser
()
conf
.
read
(
'config/prod.cfg'
)
return
conf
config
=
get_config
()
pprint
(
upcoming_forecast
(
config
.
get
(
'openweather'
,
'api_key'
),
52.520645
,
13.409779
))
Data Wrangling and Analysis with Python - Working Files/Chapter 2/chp2_web_scraping.py
0 → 100755
View file @
38de40c9
from
__future__
import
print_function
import
requests
from
lxml
import
html
response
=
requests
.
get
(
'http://kjamistan.com'
)
page
=
html
.
fromstring
(
response
.
content
)
page
.
make_links_absolute
(
base_url
=
'http://kjamistan.com'
)
posts
=
page
.
xpath
(
'//article[@class="post"]'
)
#posts = page.cssselect('article.post')
all_posts
=
[]
for
post
in
posts
:
link
=
post
.
xpath
(
'header/h2/a'
)[
0
].
get
(
'href'
)
title
=
post
.
xpath
(
'header/h2/a/text()'
)[
0
]
all_posts
.
append
((
title
,
link
))
print
(
all_posts
)
Data Wrangling and Analysis with Python - Working Files/Chapter 2/climate_change_download_0.xls
0 → 100755
View file @
38de40c9
File added
Data Wrangling and Analysis with Python - Working Files/Chapter 2/climate_change_download_0.xlsx
0 → 100755
View file @
38de40c9
File added
Data Wrangling and Analysis with Python - Working Files/Chapter 2/eu_revolving_loans.csv
0 → 100755
View file @
38de40c9
This diff is collapsed.
Click to expand it.
Data Wrangling and Analysis with Python - Working Files/Chapter 2/example_conf.cfg
0 → 100755
View file @
38de40c9
[openweather]
api_key=425b9b9e2416cjfr47329434jk2lX4u32
[twitter]
consumer_key = CIuYfkdFw8392kdfHuioj
consumer_secret = 4QiJw1wkd902eklfjs920skcSwikFpkl3289
access_token = 15632343-qaMfjk1ri8eklclfiFisoTwjneio48930
access_token_secret = FAifw894jk3l24h543ljfs89hC9fhjFhkjrel3784
[google]
api_key=AI16cjfr47329434jk2lX4u32
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Concatenating Datasets.ipynb
0 → 100755
View file @
38de40c9
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = pd.DataFrame()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: []\n",
"Index: []"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"for chunk in pd.read_csv('data/ext_lt_invcur.tsv', sep='\\t', chunksize=100):\n",
" data_rows = [row for row in chunk.ix[:,0].str.split(',')]\n",
" data_cols = [col.split('\\\\')[0] for col in chunk.columns[0].split(',')]\n",
" clean_df = pd.DataFrame(data_rows, columns=data_cols)\n",
" new_df = pd.concat([clean_df, chunk.drop(chunk.columns[0], \n",
" axis=1)], axis=1)\n",
" df = pd.concat([df, new_df])"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>partner</th>\n",
" <th>currency</th>\n",
" <th>stk_flow</th>\n",
" <th>sitc06</th>\n",
" <th>geo</th>\n",
" <th>2014</th>\n",
" <th>2012</th>\n",
" <th>2010</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>EXT_EU</td>\n",
" <td>EUR</td>\n",
" <td>EXP</td>\n",
" <td>SITC0-4A</td>\n",
" <td>AT</td>\n",
" <td>61.9</td>\n",
" <td>65.6</td>\n",
" <td>67</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>EXT_EU</td>\n",
" <td>EUR</td>\n",
" <td>EXP</td>\n",
" <td>SITC0-4A</td>\n",
" <td>BE</td>\n",
" <td>53.8</td>\n",
" <td>85.8</td>\n",
" <td>92.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>EXT_EU</td>\n",
" <td>EUR</td>\n",
" <td>EXP</td>\n",
" <td>SITC0-4A</td>\n",
" <td>BG</td>\n",
" <td>57.0</td>\n",
" <td>46.2</td>\n",
" <td>54.1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>EXT_EU</td>\n",
" <td>EUR</td>\n",
" <td>EXP</td>\n",
" <td>SITC0-4A</td>\n",
" <td>CY</td>\n",
" <td>79.1</td>\n",
" <td>60.7</td>\n",
" <td>61.4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>EXT_EU</td>\n",
" <td>EUR</td>\n",
" <td>EXP</td>\n",
" <td>SITC0-4A</td>\n",
" <td>CZ</td>\n",
" <td>58.3</td>\n",
" <td>66.7</td>\n",
" <td>59.1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" partner currency stk_flow sitc06 geo 2014 2012 2010 \n",
"0 EXT_EU EUR EXP SITC0-4A AT 61.9 65.6 67 \n",
"1 EXT_EU EUR EXP SITC0-4A BE 53.8 85.8 92.4 \n",
"2 EXT_EU EUR EXP SITC0-4A BG 57.0 46.2 54.1 \n",
"3 EXT_EU EUR EXP SITC0-4A CY 79.1 60.7 61.4 \n",
"4 EXT_EU EUR EXP SITC0-4A CZ 58.3 66.7 59.1 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.5"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Data Wrangling and Analysis with Python - Working Files/Chapter 3/Data Types.ipynb
0 → 100755
View file @
38de40c9
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"my_series = pd.Series([23, 54, 62, 25])"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"0 23\n",
"1 54\n",
"2 62\n",
"3 25\n",
"dtype: int64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [