Machine Learning for Natural Language Processing

When and where

Time:

  • Week 1 (lectures and hands-on sessions): 15 June 2015 - 19 June 2015, daily between 5:00pm and 8:00pm
  • Week 2 (project): 23 June 2015 - 26 June 2015, daily between 5:00pm and 7:00pm

Place:

  • Room EG306, Faculty of Computer Science and Automatic Control at the “Politehnica” University of Bucharest

Introduction

The purpose of this workshop is to provide a crash-course introduction to the state-of-the-art Machine Learning techniques currently used in solving software engineering problems involving Natural Language.

While touching on theoretical principles, the course will be geared specifically towards practical applications commonly encountered in Natural Language Processing. The main techniques discussed will be: Language Modelling, Distributional Semantics, Text Classification, and Deep Learning as applied to Natural Language Processing.

The course will specifically emphasize the newer, Deep Learning techniques that are becoming popular at companies like Google, Facebook, and Twitter, and will provide an introduction a few of the Open Source libraries that these companies sponsor.

Applications discussed include Spam Filtering, Topic Modelling, Text Regeneration, and Text-based Recommender Systems.

Goals

At the end of this workshop, you will be familiar with:

  • The distinction between Supervised and Unsupervised Machine Learning, and between Generative and Discriminative approaches.
  • Elements of Linguistics typically included in the engineering design of Natural Language Processing solutions.
  • Decomposing a Natural Language Processing problem into smaller, known sub-problems such as Language Modelling, as well as the tools to solve these sub-problems and their pitfalls.
  • Elements of Deep Learning, and the wider use of Neural Networks in Natural Language Processing.

Format and Curriculum

The workshop would take place over the course of two weeks, and involve 10 hours of demonstrative lectures together with 5 hours of hands-on practicals. It is aimed at a target audience of 12-15 attendants.

Week 1

During the first week, there will be a maximum of 2 hours of demonstrative lectures every day, followed by one hour of hands-on practicals in which the principles taught will be put to use via self-contained applications that involve popular Open Source Machine Learning toolkits and solutions.

An outline of the curriculum covered in the demonstrative lectures and hands-on sessions of the first week is given below:

  • Introduction to Natural Language Processing and Machine Learning. Supervised and Unsupervised Machine Learning. Tools used in practicals: scikit (Machine Learning in Python)
  • The engineering importance of grammar. POS tags, Named Entities, and Dependency Parsing. Tools used in practicals: the Porter Stemmer, the Natural Language Toolkit, the RASP Parser.
  • Shallow Machine Learning. Language Modelling and Classifiers. Tools used in practicals: SVM Light, the Stanford Parser
  • Neural Networks. The Backpropagation Algorithm. Tools used in practicals: The Fast Artificial Neural Network Library.
  • Auto-Encoders. Deep Learning. Tools used in practicals: The Torch Deep Learning Library.

Nota Bene: The curriculum is subject to minor modifications without prior notice, depending on external planning factors.

Presentations

Week 2

During the second week, there will be four consecutive Hackathon sessions, aiming to come up with prototypes to an engineering problem. Only four places exist for the project, and the participants from the first week who wish to stay on and do the project will be selected based on availability and project affinity, and announced on June 20th, 2015.

The title of the project will be:

  • Text Regeneration from OCRed PDF Documents

Prerequisites

The course will be taught entirely in English, and is aimed for Computer Science and Computer Engineering students in their 2nd, 3rd, or 4th year of studies with a good background in software engineering, as well as to fresh members of the industry who wish to further their knowledge of Machine Learning as it is applied to problems in Natural Language Processing. The following are the minimum prerequisites for attending:

  • Strong command of the Linux programming environment (quickly installing libraries, and getting prototypes up and running).
  • Good background in mathematics, especially probability, linear algebra, and basic calculus.
  • Prior contacts with either Machine Learning or Natural Language Processing is a plus.
  • Having completed the Stanford Machine Learning MOOC given by Andrew Ng on Coursera is a plus.

Registration

Registration for the workshop was possible using the online Registration Form. Registration closed on June 7th, at 23:59 Bucharest Time.

Registrants were announced of the outcome of their application by email on June 9th, 2015. The selected course participants were:

Participant Name Participant Affiliation
First Last Faculty University Position Week 2 Hackathon
Andrei Stefan Tuicu ACS Politehnica University of Bucharest Undergraduate, Year 3
Andrei-Vlad Fulgeanu ACS Politehnica University of Bucharest Undergraduate, Year 4
Teodor Szente N/A (high scool) Cantemir Voda National College Student
Gabriel Rotaru ACS Politehnica University of Bucharest Undergraduate, Year 2
Bogdan Merlusca FILS Politehnica University of Bucharest Graduated, Class of 2007
Vlad-Ovidiu Lupu FMI University of Bucharest Undergraduate, Year 2
Eduard Lache ACS Politehnica University of Bucharest Undergraduate, Year 2
Daniel Dogaru ACS Politehnica University of Bucharest Undergraduate, Year 4
George-Cristian Muraru ACS Politehnica University of Bucharest Undergraduate, Year 2
Ioana-Alina Bănică ETTI Politehnica University of Bucharest Undergraduate, Year 4
Eduard George Ionescu FMI University of Bucharest Undergraduate, Year 1
Vladu Ana Maria ETTI Politehnica University of Bucharest Undergraduate, Year 4
Matei Popovici ACS Politehnica University of Bucharest Lecturer

ACS - Faculty of Automatic Control and Computer Science
FMI - Faculty of Mathematics and Informatics
FILS - Faculty of Engineering in Foreign Languages
ETI - Faculty of Electronics, Telecommunication and Information Technology

Instructor and Organizer

Adrian Scoică

sesiuni/ml4nlp.txt · Last modified: 2015/06/23 20:30 by adrian.sc