Overview
Contents:
- Data types and their characteristics
- Common functions of data science infrastructures
- Storage, compute, and cloud infrastructures for data science
- Concept of a data lake
- Data pre-processing methods and selected tools
- Time series and graph data, the respective DBMS, and query languages
- Data analytics platforms
- Data presentation and visualization
- Data science workflows and selected infrastructure components
Learning Outcome
Upon completion the course, students
- understand the basic functions of data science infrastructures and their significance.
- understand basic data types and their specifics.
- understand the most important technical infrastructures for storing and processing data locally and in the cloud as
well as their advantages and disadvantages in relation to data science applications.
- can apply the concept of the data lake to basic data science problems.
- are able to apply the different steps of data pre-processing to selected data sets.
- can identify the characteristics of time series and graph data and are able to recall the functions of DBMSs designed
for their processing.
- can present the basic tasks of data analysis platforms and can describe them using examples.
- can apply methods and tools for the presentation and visualisation of data.
- can model basic data science workflows and are able to transfer their knowledge to basic data science projects.
Recommended Prerequisites
Python and basic database knowledge
Examination
In-class, written exam (90 min) or oral exam (approx. 30 min.)
Examination prerequisites:
Students complete 50% of the homework exercises.
Examination requirements:
Through the examination students demonstrate that they are able to describe basic functions of (cloud-based) data
science infrastructures as well as to specify and identify basic data types. Students can also prove their understanding
of data lakes and can apply their knowledge of MapReduce and Hadoop in that particular context. They can analyse basic
data pre-processing problems and sketch common solutions. Student can show that they understand time series and graph
data as well as the corresponding DBMS and that they can present common tasks of data analysis platforms. Through the
examination, students also demonstrate their ability to select appropriate methods for visualising data and show that
they are able to create basic data science workflows