Hands-on Exercises


Welcome to the Spark Summit 2014 Training hands-on exercises! Much of this material is extended and enhanced from those given at previous AMP Camp Big Data Bootcamps.

These hands-on exercises will have you walk through examples of how to use the higher-level libraries in the Spark project: Spark SQL, Spark Streaming, MLlib, & GraphX.



In order to get the most out of this course, we assume:

If you would like a quick primer on Scala, check out the following doc in the appendix:

Exercises Overview

Languages Used

Spark SQL Interactive yes no yes
Spark Streaming yes yes no
MLlib - Machine Learning yes no yes
GraphX - Graph Analytics yes no no

In several of the proceeding training modules, you can choose which language you want to use as you follow along and gain experience with the tools. The following table shows which languages this mini course supports for each section. You are welcome to mix and match languages depending on your preferences and interests.

Exercise Content

The modules we will cover at the advanced Spark training are listed below. These can be done in any order according to your interests.

Exercise Description Length More Documentation
Spark SQL Use the Spark shell to write interactive SQL queries Short Programming Guide
Spark Streaming Process a sample of Twitter tweet streams Medium Programming Guide
MLlib Build a movie recommender with Spark Medium Programming Guide
GraphX Explore graph-structured data and graph algorithms Long Programming Guide

Providing feedback

Once you complete the course, we would appreciate hearing your feedback. Please fill out the following survey:

Hands-on Exercises