Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
sesiuni:data_science [2016/05/30 18:24]
dserban [Topics]
sesiuni:data_science [2016/07/18 17:40]
dserban [Topics]
Line 8: Line 8:
 == When and Where? == == When and Where? ==
  
-**Between**:​ 25 June 2016 - 31 August ​2016 (every Saturday and Sunday)+**Between**: ​3-25 September ​2016 (every Saturday and Sunday, and the actual workshop dates are 3, 4, 10, 11, 17, 18, 24, 25)
  
-Private communications will be sent to announce further details after registration is complete and the list of participants is finalized.+Private communications will be sent to the selected participants ​to announce further details ​(after registration is complete and the list of participants is finalized).
  
 == Topics == == Topics ==
Line 20: Line 20:
 * Working in the PySpark shell * Working in the PySpark shell
 * Working with PySpark in an iPython notebook * Working with PySpark in an iPython notebook
 +* Building Spark/Scala Applications with sbt
 * Standalone Applications * Standalone Applications
  
Line 48: Line 49:
 * The DataFrame API * The DataFrame API
 * Inner Joins and Left Outer Joins in the RDD API versus in Spark SQL * Inner Joins and Left Outer Joins in the RDD API versus in Spark SQL
 +* Datasets (compile-time type-safe DataFrames)
  
 Building Interactive Data Analytics Apps With Flask and Spark Building Interactive Data Analytics Apps With Flask and Spark
Line 56: Line 58:
  
 * A Simple Example - Stream of Integers / Rolling Sum * A Simple Example - Stream of Integers / Rolling Sum
-* Streaming ​data via TCP socket (netcat) +* Streaming ​Data via TCP socket (netcat) 
-* Streaming ​data via Kafka topic (Apache Kafka) +* Streaming ​Data via Kafka topic (Apache Kafka) 
-Aggregating streams and storing the results ​in a NoSQL datastore ​(Apache Cassandra)+Storing Streaming Analytics Results ​in a NoSQL Datastore ​(Apache Cassandra) 
 +* Structured Streaming / Infinite DataFrames
  
 Advanced Spark Programming Advanced Spark Programming
Line 67: Line 70:
  
 * Overview and Terminology * Overview and Terminology
-* Machine Learning Basics. What is a Feature +* Machine Learning Basics
-* The LabeledPoint Data Type+
 * TF-IDF * TF-IDF
 * Preparing The Data For Analysis / Stemming, Stopword Elimination * Preparing The Data For Analysis / Stemming, Stopword Elimination
Line 97: Line 99:
  
 You can register for the workshop using the [[https://​docs.google.com/​forms/​d/​1ocS-KDKF99HWILR5LaEV8KE-38xa0kltZoFu8sKWjI8/​viewform|online registration form]]. You can register for the workshop using the [[https://​docs.google.com/​forms/​d/​1ocS-KDKF99HWILR5LaEV8KE-38xa0kltZoFu8sKWjI8/​viewform|online registration form]].
 +
 +Deadline: September 2nd, 23:59.
  
 If you have any questions, please ask them [[https://​github.com/​dserban/​datascience2016summer/​issues/​1|here]]. If you have any questions, please ask them [[https://​github.com/​dserban/​datascience2016summer/​issues/​1|here]].
sesiuni/data_science.txt · Last modified: 2016/08/17 11:10 by fbratiloveanu