Cassandra:
http://cassandra.apache.org/download/
CQL:
https://pypi.python.org/pypi/cql/1.0.4
iPython Notebook – Jon
https://github.com/rustyrazorblade/python-presentation
SQOOP:
http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/ana/anaSqpDemo.html
Spark:
Shark, dramatically reduce IO.
Spark+Cassandra:
When you have to do JOIN in Cassandra, you need to put that logic into your code… so you lookup a few records, and use the result to look up in another table since the performance is really good.