Pagini
Workshops
Parteneri
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
sesiuni:data_science [2016/05/30 16:08] dserban [Introduction] |
sesiuni:data_science [2016/08/17 11:10] (current) fbratiloveanu [When and Where?] |
||
---|---|---|---|
Line 8: | Line 8: | ||
== When and Where? == | == When and Where? == | ||
- | * **Days**: 25 June 2016 - 31 August 2016 (every Saturday and Sunday) | + | **Between**: 3-25 September 2016 (every Saturday and Sunday, and the actual workshop dates are 3, 4, 10, 11, 17, 18, 24, 25). The workshop lasts 4 hours and will be held in 12:00-18:00 interval. The schedule will be decided at the end of August. |
- | An email will be sent to announce the room and the hours after the participant will be accepted. | + | Private communications will be sent to the selected participants to announce further details (after registration is complete and the list of participants is finalized). |
== Topics == | == Topics == | ||
Line 20: | Line 20: | ||
* Working in the PySpark shell | * Working in the PySpark shell | ||
* Working with PySpark in an iPython notebook | * Working with PySpark in an iPython notebook | ||
+ | * Building Spark/Scala Applications with sbt | ||
* Standalone Applications | * Standalone Applications | ||
Line 48: | Line 49: | ||
* The DataFrame API | * The DataFrame API | ||
* Inner Joins and Left Outer Joins in the RDD API versus in Spark SQL | * Inner Joins and Left Outer Joins in the RDD API versus in Spark SQL | ||
+ | * Datasets (compile-time type-safe DataFrames) | ||
Building Interactive Data Analytics Apps With Flask and Spark | Building Interactive Data Analytics Apps With Flask and Spark | ||
Line 55: | Line 57: | ||
Spark Streaming | Spark Streaming | ||
- | * A Simple Example - Stream of Integers / Moving Average | + | * A Simple Example - Stream of Integers / Rolling Sum |
+ | * Streaming Data via TCP socket (netcat) | ||
+ | * Streaming Data via Kafka topic (Apache Kafka) | ||
+ | * Storing Streaming Analytics Results in a NoSQL Datastore (Apache Cassandra) | ||
+ | * Structured Streaming / Infinite DataFrames | ||
Advanced Spark Programming | Advanced Spark Programming | ||
Line 64: | Line 70: | ||
* Overview and Terminology | * Overview and Terminology | ||
- | * Machine Learning Basics. What is a Feature | + | * Machine Learning Basics |
- | * The LabeledPoint Data Type | + | |
* TF-IDF | * TF-IDF | ||
* Preparing The Data For Analysis / Stemming, Stopword Elimination | * Preparing The Data For Analysis / Stemming, Stopword Elimination | ||
- | * LogisticRegressionWithSGD / Filtering Spam | + | * Linear Regression / The Longley Dataset |
+ | * Logistic Regression / Filtering Spam | ||
+ | * Decision Trees | ||
+ | * Random Forests | ||
+ | |||
+ | Parallel graph processing with GraphX | ||
+ | |||
+ | * A Simple Example - PageRank | ||
Exercises | Exercises | ||
Line 79: | Line 91: | ||
* The Brown Corpus (NLTK). Stylistic Classification with Cosine Similarity | * The Brown Corpus (NLTK). Stylistic Classification with Cosine Similarity | ||
* Sensor Data. Detecting Tachycardia and Bradycardia in an ECG Stream | * Sensor Data. Detecting Tachycardia and Bradycardia in an ECG Stream | ||
- | == Registration is now closed == | ||
- | |||
- | If you have any questions, please ask them [[https://github.com/dserban/datascience2016summer/issues/1|here]]. | ||
== Prerequisites == | == Prerequisites == | ||
Line 90: | Line 99: | ||
You can register for the workshop using the [[https://docs.google.com/forms/d/1ocS-KDKF99HWILR5LaEV8KE-38xa0kltZoFu8sKWjI8/viewform|online registration form]]. | You can register for the workshop using the [[https://docs.google.com/forms/d/1ocS-KDKF99HWILR5LaEV8KE-38xa0kltZoFu8sKWjI8/viewform|online registration form]]. | ||
+ | |||
+ | Deadline: September 2nd, 23:59. | ||
+ | |||
+ | If you have any questions, please ask them [[https://github.com/dserban/datascience2016summer/issues/1|here]]. | ||
== Instructor == | == Instructor == | ||
Line 98: | Line 111: | ||
==== _____________ ==== | ==== _____________ ==== | ||
- | |||
- | == Participants == | ||
- | |||