AWS EMR
1. spot instance
pricing
took me 20 minutes to start the cluster
2. hue
Dynamic Port forwarding
http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-connect-master-node-proxy.html
3. cluster group – master, core, task
#install anaconda python
mkdir ~/bwang
cd ~/bwang
wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda-2.2.0-Linux-x86_64.sh
#!/bin/bash
for i in `seq 1987 2008`
do
wget “http://stat-computing.org/dataexpo/2009/${i}.csv.bz2”
done
bzip2 -d *