Spark iPython notebook

AWS EMR

1. spot instance
pricing
took me 20 minutes to start the cluster

2. hue
Dynamic Port forwarding
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-proxy.html

3. cluster group – master, core, task

#install anaconda python
mkdir ~/bwang
cd ~/bwang
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda-2.2.0-Linux-x86_64.sh

#!/bin/bash
for i in `seq 1987 2008`
do
wget “http://stat-computing.org/dataexpo/2009/${i}.csv.bz2”
done

bzip2 -d *

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s