The module builds on the Bachelor's module Databases (INF-BSc-P09), in which operational (relational databases) for storing and analyzing regular (small) data sets were introduced. When analyzing big data, further challenges arise regarding the amount of data, the dimensionality of the data, the heterogeneity of models and schemas as well as the distribution of data among several databases. Various database technologies for this are presented in this module.
The topics in detail are:
- Data warehouses
- Multidimensional data model
- Design of data warehouses with the multidimensional data model
- Relational storage (star schema, snowflake schema, full-fact, galaxies)
- multidimensional storage
- OLAP queries, SQL extensions for warehouses
- Multidimensional queries
- ETL (Extraction, Transformation, Load)
- Column stores and main memory databases
- Technical features and implementation
- Application fields and scenarios
- Data lake and data lakehouses
- Data sources
- Properties of data
- NoSQL data models such as document stores, key-value stores and graph databases
- Features of lakehouses
- Data integration
- Schema and data integration
- Mapping tools
- Selected data analytics methods and their implementation on a database/data warehouse as well as their use for database tasks
- Association rules
- Procedures for clustering
- Classification methods
- Parallel data mining algorithms with map-reduce
- Current research topics in the field of data engineering, data integration and storage and utilization of multidimensional data
Kurssprache | Turnus | Wochenstunden | ECTS | Prüfung |
---|---|---|---|---|
Deutsch | WiSe | 2V+2? | 6 | 90-minütige Klausur |