Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
sesiuni:data_science [2016/05/30 22:19]
dserban [Topics]
sesiuni:data_science [2016/08/17 11:10] (current)
fbratiloveanu [When and Where?]
Line 8: Line 8:
 == When and Where? == == When and Where? ==
  
-**Between**:​ 25 June 2016 - 31 July 2016 (every Saturday and Sunday)+**Between**: ​3-25 September ​2016 (every Saturday and Sunday, and the actual workshop dates are 3, 4, 10, 11, 17, 18, 24, 25). The workshop lasts 4 hours and will be held in 12:00-18:00 interval. The schedule will be decided at the end of August.
  
 Private communications will be sent to the selected participants to announce further details (after registration is complete and the list of participants is finalized). Private communications will be sent to the selected participants to announce further details (after registration is complete and the list of participants is finalized).
Line 20: Line 20:
 * Working in the PySpark shell * Working in the PySpark shell
 * Working with PySpark in an iPython notebook * Working with PySpark in an iPython notebook
 +* Building Spark/Scala Applications with sbt
 * Standalone Applications * Standalone Applications
  
Line 48: Line 49:
 * The DataFrame API * The DataFrame API
 * Inner Joins and Left Outer Joins in the RDD API versus in Spark SQL * Inner Joins and Left Outer Joins in the RDD API versus in Spark SQL
 +* Datasets (compile-time type-safe DataFrames)
  
 Building Interactive Data Analytics Apps With Flask and Spark Building Interactive Data Analytics Apps With Flask and Spark
Line 59: Line 61:
 * Streaming Data via Kafka topic (Apache Kafka) * Streaming Data via Kafka topic (Apache Kafka)
 * Storing Streaming Analytics Results in a NoSQL Datastore (Apache Cassandra) * Storing Streaming Analytics Results in a NoSQL Datastore (Apache Cassandra)
 +* Structured Streaming / Infinite DataFrames
  
 Advanced Spark Programming Advanced Spark Programming
Line 67: Line 70:
  
 * Overview and Terminology * Overview and Terminology
-* Machine Learning Basics. What is a Feature +* Machine Learning Basics
-* The LabeledPoint Data Type+
 * TF-IDF * TF-IDF
 * Preparing The Data For Analysis / Stemming, Stopword Elimination * Preparing The Data For Analysis / Stemming, Stopword Elimination
Line 97: Line 99:
  
 You can register for the workshop using the [[https://​docs.google.com/​forms/​d/​1ocS-KDKF99HWILR5LaEV8KE-38xa0kltZoFu8sKWjI8/​viewform|online registration form]]. You can register for the workshop using the [[https://​docs.google.com/​forms/​d/​1ocS-KDKF99HWILR5LaEV8KE-38xa0kltZoFu8sKWjI8/​viewform|online registration form]].
 +
 +Deadline: September 2nd, 23:59.
  
 If you have any questions, please ask them [[https://​github.com/​dserban/​datascience2016summer/​issues/​1|here]]. If you have any questions, please ask them [[https://​github.com/​dserban/​datascience2016summer/​issues/​1|here]].
sesiuni/data_science.1464635949.txt.gz · Last modified: 2016/05/30 22:19 by dserban