My colleague share a book that he bought with me – “Machine Learning with Spark“. I read the part of the first chapter and feel pretty good about. I think it is definitely a hard book for the ones who doesn’t have that much programming experience, the Author probably has the assumption that the reader should know one of the three languages(Java, Python, Scala) at least to even read the book.
I have been doing some Spark programming in Python and today, I read a few examples written in Scala, the syntax is extremely simple and similar as Python. I have also heard from some people that Scala code will run much faster than Pyspark most of the cases. Here are a few things that are new to me:
- use “val” whenever to create a new variable
- => is the anonymous function
- map{ case (a,b,c) => (b,c) } where case will contain a block of code
- .reduceByKey(_ + _)