Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
sesiuni:data_warehouse [2015/09/19 21:46]
dserban [Introduction]
sesiuni:data_warehouse [2015/09/19 21:54]
dserban [Topics]
Line 14: Line 14:
 == Topics == == Topics ==
  
-* Understanding the Spark programming model and API +Introduction to Data Analysis with Spark
-* Detecting the 12-01-2001 anomaly in the CrossFilter data set +
-* Twitter stream / Sentiment analysis for hashtags+
  
 +* What is Apache Spark?
 +* Introduction to Core Spark Concepts
 +* Working in the PySpark shell
 +* Working with PySpark in an iPython notebook
 +* Standalone Applications
 +
 +Programming with RDDs
 +
 +* RDD Basics
 +* Creating RDDs
 +* RDD Operations
 +* Passing Functions to Spark
 +* Common Transformations and Actions
 +* Caching RDDs
 +
 +Working with Key-Value Pairs
 +
 +* Motivation
 +* Creating Pairwise RDDs
 +* Transformations on Pairwise RDDs
 +* Actions Available on Pairwise RDDs
 +* Data Partitioning. Key Performance Considerations
 +
 +Running on a Cluster
 +
 +* Configuring a Spark Cluster
 +* Deploying Applications with spark-submit
 +
 +Structured Data with Spark SQL
 +
 +* The DataFrame API
 +* Inner Joins and Left Outer Joins in the RDD API versus in Spark SQL
 +
 +Building Interactive Data Analytics Apps With Flask
 +
 +* A Simple Example - Parameterized CrossFilter Histograms
 +
 +Spark Streaming
 +
 +* A Simple Example - Stream of Integers / Moving Average
 +
 +Advanced Spark Programming
 +
 +* Working on a Per-Partition Basis
 +
 +Machine Learning with MLlib
 +
 +* Overview and Terminology
 +* Machine Learning Basics. What is a Feature
 +* The LabeledPoint Data Type
 +* TF-IDF
 +* Preparing The Data For Analysis / Stemming, Stopword Elimination
 +* LogisticRegressionWithSGD / Filtering Spam
 +
 +Exercises
 +
 +* The Complete Works of Shakespeare. Computing Word Counts
 +* Detecting the 12-01-2001 Anomaly in the CrossFilter Data Set
 +* Geographical Data - Analysis of City Initials per Country
 +* Applying PageRank on a Subset of Wikipedia
 +* Twitter Stream / Sentiment Analysis for Hashtags
 +* The Brown Corpus (NLTK). Stylistic Classification with Cosine Similarity
 +* Sensor Data. Detecting Tachycardia and Bradycardia in an ECG Stream
 == Registration is now closed == == Registration is now closed ==
  
sesiuni/data_warehouse.txt · Last modified: 2015/09/19 21:59 by dserban