I just wrote a tutorial on how to set up Cassandra locally and do some basic CRUD operation to gain the confidence that Cassandra can do some basic staff we can do in MYSQL, but why need Cassandra?
Here are a few pros and cons I heard might keep in mind when comparing with other big data technologies.
1. high scalability (performance and capacity increase linearly as the number of the size of the cluster)
2. high availability (you can sync data centers across different regions and there is no master node)
3. “stupidly fast” write and “NoSQL-ly fast” read. (The hashring method hashes records to nodes using a hashfunctions)
1. cannot do JOIN (you technically can do two rounds of select instead of join but there no cmd called join in C*)
2. not designed for analytics (you will be surprised by the lack of analytic functions offered by C*, you can use Spark and Tableau on top of Cassandra and it has been proved to be helpful.)
Now, let’s take a look at how Cassandra stores the data.