Introduction to Text Mining

Course, bachelor, winter 2019, L.079.05501

General

Lecture

  • Instructor. Henning Wachsmuth
  • Location. F1.110
  • Time. Thursday, 11am – 2pm, c.t.
  • First date. October 10, 2019, c.t.
  • Last date. January 30, 2020.

Tutorial

  • Instructor. Milad Alshomary
  • Location. F1.110
  • Time. Wednesday, 4pm – 6pm s.t.
  • First date. October 16, 2019, c.t.
  • Last date. January 29, 2020

Announcements

  • The exam dates have been entered into PAUL. Please double check if everything is correct (date, 45-minutes slot, room FU.327). If not, let us know immediately.
  • All general information on exams is provided below.

Description

This course teaches students all major skills needed to approach typical tasks in the analysis of natural language text. Starting from fundamentals of linguistics and statistics, the lecture gives an overview of several text analyses and covers a selection of them in detail. Both rule-based and statistical techniques are discussed, among the latter standard approaches from machine learning.

The students learn both theoretically and practically to design, implement, and evaluate text analysis algorithms for given tasks. Besides the topical content, the lecture aims to educate students in how to conduct scientific experiments and how to employ large datasets in experiments.

Lectures

The course will cover lectures on the following topics. The slides from each lecture will be put here, usually soon after the respective lecture has taken place. Slides marked with an asterisk (*) are not relevant for the exam. Notice that there may be updates of the slides (marked as such), so check from time to time.

  • I. Overview (slides)
  • II. Basics of Linguistics (slides)
  • III. Text Mining using Rules (slides)
  • IV. Basics of Empirical Methods (slides)
  • V. Text Mining using Grammars (slides)
  • VI. Basics of Machine Learning (slides)
  • VII. Text Mining using Similarities and Clustering (slides)
  • VIII. Text Mining using Classification and Regression (slides)
  • IX. Text Mining using Sequence Labeling (dropped)
  • X. Practical Issues (slides)

Slides with meta-information:

  • Organizational course information (slides)
  • Information on exam dates (slides)
  • Teaching evaluation results (slides)
  • Information on exam application (slides)

Tutorials

The tutorials will cover the following topics. The slides from each tutorial will be put here, usually soon after the respective lecture has taken place:

  • Oct 16. Introduction to Python (download)

Assignments

The course includes six assignment sheets in total that are published here bi-weekly. Each sheet consists of written tasks as well as Python programming tasks.

  • Assignment sheet 1 – submission until Oct 27
  • Assignment sheet 2 – submission until Nov 10
  • Assignment sheet 3 – submission until Nov 24
  • Assignment sheet 4 – submission until Dec 8
  • Assignment sheet 5 – submission until Jan 5
  • Assignment sheet 6 – submission until Jan 19

For all programming tasks, we provide a Python notebook containing template code (to help you start the task) that you fill up with your solutions. More information on how to use Python notebooks will be presented in the first tutorial.

Submission

Group submissions of up to three people are allowed and encouraged. The deadline of the submission is always at 23:59 (UTC+1) on the respective day.

Please, submit your assignments via email to Milad Alshomary as a .zip archive containing a .pdf file for the written part and .ipynb (python notebook) for the programming part. 

Please provide your last names and student numbers with your solutions as follows: <last name>-<student number>-tm-assignment<assignment number>.zip, for example, "meier-1234567-schulz-2345678-tm-assignment1.zip".

Assignment grades

Student grades will be updated in a file linked here (anonymously with matriculation number). Also, detailed assessments of student solutions are given in the tutorials.

Exam

An oral exam has to be taken in order to pass the course. The first round of exams will take place in February 2019. You need to register for taking the exam on PAUL as usual (see orga slides from first lecture).

Students who get only 4 ECTS points for their course will not be examined on lecture part X. In addition, each of these students can freely choose one of the other lecture parts to be excluded from the exam.

Important: Each student needs to obtain at least 50% of all assignment points in order to be allowed to take the exam.