Welcome to the Spark Summit 2014 Training hands-on exercises! Much of this material is extended and enhanced from those given at previous AMP Camp Big Data Bootcamps.
These hands-on exercises will have you walk through examples of how to use the higher-level libraries in the Spark project: Spark SQL, Spark Streaming, MLlib, & GraphX.
In order to get the most out of this course, we assume:
- You have experience using the core Spark APIs
- You have a laptop
- Your laptop has Java 6 or 7 installed
If you would like a quick primer on Scala, check out the following doc in the appendix:
|Spark SQL Interactive||yes||no||yes|
|MLlib - Machine Learning||yes||no||yes|
|GraphX - Graph Analytics||yes||no||no|
In several of the proceeding training modules, you can choose which language you want to use as you follow along and gain experience with the tools. The following table shows which languages this mini course supports for each section. You are welcome to mix and match languages depending on your preferences and interests.
The modules we will cover at the advanced Spark training are listed below. These can be done in any order according to your interests.
|Spark SQL||Use the Spark shell to write interactive SQL queries||Short||Programming Guide|
|Spark Streaming||Process a sample of Twitter tweet streams||Medium||Programming Guide|
|MLlib||Build a movie recommender with Spark||Medium||Programming Guide|
|GraphX||Explore graph-structured data and graph algorithms||Long||Programming Guide|
Once you complete the course, we would appreciate hearing your feedback. Please fill out the following survey: