Achtung:

Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Award for Prof. Dr. Axel Ngonga 
Photo: Next Einstein Forum Show image information

Award for Prof. Dr. Axel Ngonga Photo: Next Einstein Forum

Reserved Open Thesis

Controlled Natural Language for Data Science (Master)
TopicSupervisorContact
Controlled Natural Language for Data ScienceDr. Ricardo UsbeckDr. Ricardo Usbeck

 

Current data analytic solutions (SAP HANA, Tableau, Datameer, RapidMiner) provide either a form-based or visual interface to build analytic pipelines. In this thesis, the student will develop a controlled natural language and an interface to speak to smart components from the DICE research group and other repositories in order to build analytical pipelines in a natural language fashion. Related Work: http://pages.cs.wisc.edu/~jignesh/publ/Ava.pdf

 

 

A RNN/seq2seq based approach to Question Answering using limited training data (Master)
TopicSupervisorContact
A RNN/seq2seq based approach to Question Answering using limited training data (Master)Dr. Ricardo UsbeckTommaso SoruDr. Ricardo UsbeckTommaso Soru

Deep Learning for Open Question Answering (QA) has shown to work on training set sizes of 100.000+ question-answer pairs or in really narrow domains. Current Linked Data based QA benchmarks such as QALD (https://github.com/ag-sc/QALD) have shown to attract a variety of researchers and to pose complex yet interesting questions.

In this thesis, the student will develop a deep learning-based approach for Linked Data QA using several semantic QA techniques to guide the transformation of a natural language question into a SPARQL query. This approach will then be evaluated using the QALD benchmark and the GERBIL QA platform.

Related Work:

 

 

Named Entity Extraction for Archival Data
TopicSupervisorContact
Named Entity Extraction for Archival DataMichael Röder, Dr. Ricardo Usbeck, Adrian (Zazuko)Dr. Ricardo Usbeck

Task

Given a list of names: "1) Edouard Lock 3) Carolyn Carlson", "1. Frey Faust 2. Frey Faust; Isabelle Fuchs 3. Isabelle Fuchs", "A. Bournonville; M. Petipa; R. Petit; J. Cranko; J. Neumeier; M. Béjart; N. Christe; Ed. Cook; J. Rusillo", "Abdelaziz Sarrokh", "Abdelaziz Sarrokh in Zusammenarbeit mit den Tänzerinnen und Tänzer", "Aco Renz, Su Wen-Chi", "Adina Secretan", "Adriana Locijan", "Adrienne Dellas", "Agens & Gergne Kasizstan";  Find the corresponding Linked Data entities in the largest Swiss archive knowledge base. The solution will consist of natural language processing heuristics as well as Machine Learning. We expect solid knowledge in Java or Python as well as skills in using GitHub. We offer a strong supervision and first steps towards a professional scientific training.

Controlled Natural Language for Data Science (Master)
TopicSupervisorContact
Controlled Natural Language for Data ScienceDr. Ricardo UsbeckDr. Ricardo Usbeck

 

Current data analytic solutions (SAP HANA, Tableau, Datameer, RapidMiner) provide either a form-based or visual interface to build analytic pipelines. In this thesis, the student will develop a controlled natural language and an interface to speak to smart components from the DICE research group and other repositories in order to build analytical pipelines in a natural language fashion. Related Work: http://pages.cs.wisc.edu/~jignesh/publ/Ava.pdf

Improving Open Question Answering using Semantic Enhancement and Entailment (Master)
TopicSupervisorContact
Improving Open Question Answering using  Semantic Enhancement and EntailmentDr. Ricardo UsbeckDr. Ricardo Usbeck

Current leading approaches for the Stanford SQUAD dataset use pure neural networks in many variances (RNN, LSTM, etc.) or an ensemble of models to reach high f-measures. In this thesis, the student will analyze current error cases and implement or extend semantic approaches and tools to enhance the capabilities of existing models. The final approach as well as the baseline will be evaluated against the SQUAD leaderboard.

Related Work:

rajpurkar.github.io/SQuAD-explorer/

 

 

Open Thesis

A full-text search based on HDT (Bachelor)


TopicSupervisorContact
A full-text search based on HDT (Bachelor)Dr. Ricardo Usbeck,Javier D. Fernández PhDDr. Ricardo Usbeck

Extend http://www.rdfhdt.org to use a Lucene/ElasticSearch/SolR index next to the query capabilities of hdt similar to Apache Fuseki. This should allow for even faster hybrid query processing.

Data: http://www.rdfhdt.org/team/

Source Code: https://github.com/rdfhdt

BQL (google graph store) to SPARQL bridge (Master)
TopicSupervisorContact
BQL (google graph store) to SPARQL bridge (Master)Prof. Axel NgongaProf. Axel Ngonga

Some text

Data: http://someurl.de

Source Code:  http://someurl.de


Benchmarking BadWolf using IGUANA 2.0 (Bachelor)
TopicSupervisorContact
Benchmarking BadWolf using IGUANA 2.0 (Bachelor)Prof. Axel NgongaProf. Axel Ngonga

Some text

Data: http://someurl.de

Source Code:  http://someurl.de

Controlling RDF data in Virtual Reality (Master)
TopicSupervisorContact
Controlling RDF data in Virtual RealityDr. Ricardo UsbeckDr. Ricardo Usbeck

In this thesis, the student will explore different ways to interact with RDF datasets (graphs or RDF cubes) based on Virtual Reality (VR) interactions. This will be a pre-study for later interactions based on a VR- and speech-coupled environment. A thorough investigation of existing VR dataset interactions needs to be performed as a start. Later we will investigate how RDF supports the process of interacting.

Creation of a sameAs service based on HDT (Bachelor)
TopicSupervisorContact
Creation of a sameAs service based on HDTMichael RöderMichael Röder

In this thesis, the student will use sameAs links stored in an HDT file and offer a simple service that retrieves a set that takes a single URI and retrieves a set of URIs that are connected to this URI. Note that this task includes a) the check of the data for validity and b) the development of an approach to curate the data. (Is based on https://github.com/dice-group/gerbil/issues/224)

Extraction of Linked Data from HTML Webpages (Bachelor)
TopicSupervisorContact
Extraction of Linked Data from HTML WebpagesMichael RöderMichael Röder

In this thesis, the student will focus on the extraction of structured data (RDFa, Microformat, Microdata, PageMap, …) which is embedded into HTML pages. The extracted data has to be transformed into RDF using common vocabularies. The work will extend the Linked Data Crawler Squirrel.  (Is based on https://github.com/dice-group/Squirrel/issues/36)

Implementation of a scalable web table extraction benchmark (Master)
TopicSupervisorContact
Implementation of a scalable web table extraction benchmarkMichael Röder, Ivan ErmilovMichael Röder

In this thesis, the student should create a benchmark based on the HOBBIT platform (http://project-hobbit.eu) for benchmarking web table extraction systems. The benchmark has to be implemented in a scalable way, i.e., the student will have to develop an algorithm that can generate web tables based on a given Knowledge Base. These web tables have to mimic the real world tables that can be found in the web to make sure that the benchmarking is realistic. To perform mimicking of a real world tables, the student will need to perform statistical analysis of existing web table corpus, for example, WDC Web Table Corpus. The statistical criteria such as average number of rows per table need to be defined. Based on a statistical model inferred from a real world table corpus, the table generator should be able to create a table corpus with the characteristics similar to a real world web table corpus.

 

 

Usage of X2vec for dataset search (Master)
TopicSupervisorContact
Usage of X2vec for dataset searchMichael RöderMichael Röder

For this thesis, the student should get an overview over the different X2vec methods that exist to transform a given dataset into a vector representation. This representation should be used for calculating similarity values to other, indexed datasets as done with Tapioca (http://aksw.org/projects/Tapioca.html). 

 

 

Usage of Graph Automatons for searching similar datasets (Master) (under construction)
TopicSupervisorContact
Usage of Graph Automatons for searching similar datasetsMichael RöderMichael Röder

Graphs can be represented as automatons. The thesis will answer the question whether the representation of knowledge graphs as automatons and the comparison of these automatons can be used for the search of similar datasets. For the implementation of the use case, Tapioca (http://aksw.org/projects/Tapioca.html) can be reused.

 

 

Searching similar datasets on compressed data (Master)
TopicSupervisorContact
Searching similar datasets on compressed dataMichael RöderMichael Röder

Since the Linked Open Data Cloud is growing, new ways of representing this linked data are being developed. This includes approaches like HDT (http://www.rdfhdt.org/) which tries to compress the size of linked data datasets. In this thesis, the student has to analyze the available compressions and figure out how they can be accessed best to get the data from the datasets which is necessary to search for similar datasets (see Tapioca (http://aksw.org/projects/Tapioca.html).

 

 

Benchmarking of dataset similarity approaches (Master)
TopicSupervisorContact
Benchmarking of dataset similarity approachesMichael RöderMichael Röder

The thesis will develop a benchmark for dataset similarity approaches like Tapioca (http://aksw.org/projects/Tapioca.html). The benchmark will have to be implemented in a scalable way but should mimic real world linked data.

 

 

Optimization techniques for federated SPARQL query processing
TopicSupervisorContact
Optimization techniques for federated SPARQL query processing

Muhammad SaleemMuhammad Saleem

This thesis will explore the different  optimization techniques used in the distributed SPARQL query processing. In particular, the source selection, they index, the join ordering and query planning, the different join implementations etc.

 

 

Optimization techniques in Triple stores for SPARQL query processing
TopicSupervisorContact
Optimization techniques in triple stores for SPARQL query processingMuhammad SaleemMuhammad Saleem

This thesis will explore the different  optimization techniques used in the state of the art triple stores including the data representation and storage, indexing, the join ordering and query planning, the different join implementations etc.

 

 

Analysis of the relative errors in cardinality-based SPARQL federation engines
TopicSupervisorContact
Analysis of the relative errors in cardinality-based SPARQL federation engines.Muhammad SaleemMuhammad Saleem

This thesis will investigate how good is the query plan generated by the underlying cost-based distributed SPARQL engine in terms of the relative error.  The relative error is a performance measure which tell how accurate is the estimated result size of the triple patterns or joins between triple patterns. The more accurate estimation leads to better query execution time.

 

 

Blocked Thesis

Integration and Lifting of Question Answering Datasets (Bachelor)
TopicSupervisorContact
Integration and Lifting of Question Answering DatasetsDr. Ricardo UsbeckDr. Ricardo Usbeck

Currently, there are more than 30 datasets from over 20 years research. All these datasets are in different formats and forms and their Question-Answer pairs can only be answered on specific underlying datasets.

In this thesis, the student will analyse the features of all these datasets and propose a solution to lift and access this benchmark to 5-star data (http://5stardata.info/en/). Answers will be grounded in knowledge bases via machine learning methods. Finally, the lifted datasets will be integrated into the renowned framework, GERBIL QA (http://gerbil-qa.aksw.org/gerbil/).

Source Code: https://github.com/dice-group/NLIWOD/tree/master/qa.datasets

 

 

Procedural Question Answering (Master)
TopicSupervisorContact
Procedural Question AnsweringDr. Ricardo UsbeckDr. Ricardo Usbeck

“How can I change a light bulb?” is just a simple question. However, this question is hard to answer for a computer. Humans at least can go to http://www.wikihow.com/Main-Page to find an answer. In this thesis, the student will develop and evaluate a semantic-based procedural QA system. The student can reuse various DICE algorithms to analyze the question and the underlying knowledge and data base.

Source Code: https://github.com/dice-group/NLIWOD/tree/master/qa.watodo

 

 

Systematic Survey on Dialogue Systems with a focus on Data Science and Structured Data (Bachelor/Master)
TopicSupervisorContact
Systematic Survey on Dialogue Systems with a focus on Data Science and Structured DataDr. Ricardo UsbeckDr. Ricardo Usbeck

Chatbots are a new hot and trending topic in research as well as in industry. Most chatbots however, either behave like long if-then-else structures or are unpredictable and grammatically incorrect since they are based on machine learning.

In this thesis, the respective student will do a thorough and systematic literature review of the latest advanced (2012 - today) possible techniques and approaches. The survey should especially focus on the use of structured knowledge such as RDF, OWL, NoSQL or SQL databases to power dialogue systems and should find a precise definition for the terms chatbot and dialogue system.

 

 

German, Domain-Specific Relation Extraction based on Shallow Parsing (Master)
TopicSupervisorContact
Procedural Question AnsweringDr. Ricardo UsbeckDr. Ricardo Usbeck

For one of our latest projects SOLIDE, we need to extract domain-specific data from small corpora of firefighter training material as well as firefighter questions. This domain is very small, precise, full of abbreviations and has not yet been modelled semantically.

Thus, the student will analyse existing material and develop ground truth data as well as an approach for shallow parsing large quantities of unstructured texts and semi-structured tables. -The final approach will be evaluated against the aforementioned ground truth data. A deep knowledge of German language is **not** required but a plus.

 

 

Machine-Learning based verbalization of RDF and SPARQL (Master)
TopicSupervisorContact
Machine-Learning based verbalization of RDF and SPARQLDr. Ricardo UsbeckDr. Ricardo Usbeck

Humans are not proficient in understanding formal languages such as RDF, SPARQL or SPARQL results sets. Thus, recent frameworks have tried to verbalized these formal datasets using NLP and rules. The idea behind this work is to translate this data using the rich RDF descriptions as well as training data from the mapping between Wikipedia and DBpedia using advanced machine learning techniques such as RNNs or other neural networks. The approach needs to be evaluated using human quality raters at the end of the thesis.

Source Code/Related Work: github.com/AKSW/SemWeb2NL

 

 

A scalable Keyword Search Engine over LOD-a-lot (Master)
TopicSupervisorContact
Procedural Question AnsweringDr. Ricardo UsbeckDr. Ricardo Usbeck

The Web of Data is full of knowledge but only a handful of approaches are able to search this web space with complex semantic queries such as “birthplace bill gates wife”. While the semantic search problem has attracted much research lately, most algorithms are not scalable enough to work with several billion facts. Thus, this thesis will develop and test a scalable semantic search engine with focus on efficient information retrieval for the smart semantic colour-spreading algorithm on top.

Data: http://lod-a-lot.lod.labs.vu.nl/about/

Source Code: https://github.com/dice-group/SESSA

 

 

A semantic Job Search Engine (full-cycle) (Bachelor/Master)
TopicSupervisorContact
A semantic Job Search Engine (full-cycle)Dr. Ricardo UsbeckDr. Ricardo Usbeck

The goal of this thesis is to apply crawling, extraction and search components from the DICE Group to build a search engine that is able to find all available job offerings by universities in Germany. However, the landing pages are very heterogenous and thus possible extraction approaches such as FOX, AGDISTIS, REX or TAIPAN will need adaption in order to store high-qualitative data. The final system is subject to manual extraction evaluations.

Source Code: Earlier Version github.com/falkmueller/unijobs_web

 

 

Further information:

Disclaimer

Source: http://bit.ly/2xdIdhs

For most theses, the required skills include good knowledge of the Java programming language or Python and the willingness to delve into exciting research. The development will be carried out using Git in a Scrum-like setting. Students will be provided with the opportunity to impact the whole of the Semantic Web and Data Science community. Furthermore, we will offer close supervision during the writing of your thesis. You can also have a look at our github repository (https://github.com/dice-group) or send us your ideas!

The University for the Information Society