Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
teaching:autumn_term_2021:hpda [2021-10-03 20:46]
Julian Kunkel [Agenda]
teaching:autumn_term_2021:hpda [2021-12-05 19:58] (current)
Julian Kunkel [Agenda]
Line 8: Line 8:
 High-Performance Computing and Big Data Analytics. High-Performance Computing and Big Data Analytics.
  
 +**Note that the lecture will be given online. I will make a survey regarding the exercise and presumably offer hybrid attendance for the exercise.** 
 ===== Key information ===== ===== Key information =====
  
 || Contact || [[about:people:julian_kunkel|Julian Kunkel]] ||  || Contact || [[about:people:julian_kunkel|Julian Kunkel]] || 
 || Location || Virtual, [[https://meet.gwdg.de/b/jul-gpr-4ao-ndv|meeting room]] || || Location || Virtual, [[https://meet.gwdg.de/b/jul-gpr-4ao-ndv|meeting room]] ||
-|| Time || Monday 16-18 (lecture), Monday 12-14 (lunch exercise!) ||+|| Time || Monday 16:15-17:45 (lecture), Monday 12:15-13:45 (lunch exercise!) ||
 || Language || English || || Language || English ||
 || Module || Modul B.Inf.1712: Vertiefung Hochleistungsrechnen, Module M.Inf.1236: High-Performance Data Analytics  || || Module || Modul B.Inf.1712: Vertiefung Hochleistungsrechnen, Module M.Inf.1236: High-Performance Data Analytics  ||
Line 45: Line 46:
  
 ===== Learning Objectives ===== ===== Learning Objectives =====
-  * Assign big data challenges to a given use-case  + 
-  * Outline use-cases for high-performance data analytics  +  * Assign big data challenges to a given use-case 
-  * Estimate performance and runtime for a given workload and system  +  * Outline use-case examples for high-performance data analytics 
-  * Create a suitable hardware configuration to execute a given workload within a deadline  +  * Estimate performance and runtime for a given workload and system 
-  * Construct suitable data models for a given use-case and discuss their pro/cons  +  * Create a suitable hardware configuration to execute a given workload within a deadline 
-  * Discuss the rationales behind the design decisions behind our learned tools  +  * Construct suitable data models for a given use-case and discuss their pro/cons 
-  * Describe the concept of visual analytics and its potential in scientific workflows  +  * Discuss the rationales behind the design decisions for the tools 
-  * Compare the features and architectures of NoSQL solutions to the abstract concept of a parallel file system  +  * Describe the concept of visual analytics and its potential in scientific workflows 
-  * Appraise the requirements for designing system architectures for systems storing and processing data  +  * Compare the features and architectures of NoSQL solutions to the abstract concept of a parallel file system 
-  * Apply distributed algorithms and data structures to a given problem instance and illustrate their processing steps  +  * Appraise the requirements for designing system architectures for systems storing and processing data 
-  * Explain the importance of hardware characteristics when executing a given workload +  * Apply distributed algorithms and data structures to a given problem instance and illustrate their processing steps in pseudocode 
 +  * Explain the importance of hardware characteristics when executing a given workload
  
 ===== Examination ===== ===== Examination =====
  
-Written (90 Min.) or oral (ca. 30 Min.)+Written (90 Min.) or oral (ca. 30 Min.) -> depends on the number of attendees.
  
 See the learning objectives. See the learning objectives.
Line 66: Line 68:
 ===== Agenda ===== ===== Agenda =====
  
-  * 25.10.21 - **Lecture Overview. Use Cases.**  +  * 25.10.21 - **Lecture Overview. Use Cases.** -- {{ :teaching:autumn_term_2021:hpda01-lecture-01.pdf |Slides}} -- {{ :teaching:autumn_term_2021:hpda01-01.pdf |Exercise}} 
-    * Exercise: There is **no exercise** today! +    * Exercise: There is **no meeting** today! 
-  * 01.11.21 - **System Architectures and Distributed Algorithms *+    Exercise sheet 1 is due next week! 
-    * Exercise: Discussion of use cases covering business/industry and science. Sketching the analytics pipeline for a use case. +    * Exercise topics: Discussion of use cases covering business/industry and science. Sketching the analytics pipeline for a use case. 
-  * 08.11.21 - **Data Models and Data Processing Strategies** +  * 01.11.21 - **Data Models and Data Processing Strategies** -- {{ :teaching:autumn_term_2021:hpda01-lecture-02.pdf |Slides}} -- {{ :teaching:autumn_term_2021:hpda01-02.pdf |Exercise}} 
-    * ExerciseSketching system architectures and the execution of distributed algorithms. +    * Exercise: Developing data models for selected use cases. Researching performance for HPDA. Python Word-Count.  
-  * 15.11.21 **Databases and Data Warehouses** +  * 08.11.21 - **Databases and Data Warehouses** -- {{ :teaching:autumn_term_2021:hpda21-lecture-03.pdf |Slides}} -- {{ :teaching:autumn_term_2021:hpda21-03.pdf |Exercise}}
-    * Exercise: Developing data models for selected use cases. Sketching the processing pipeline+
-  * 22.11.21 - **Distributed Processing (with Hadoop)**+
     * Exercise: Developing a database schema and SQL queries.     * Exercise: Developing a database schema and SQL queries.
-  * 29.11.21 - **Designing Distributed Systems and Performance Modelling** +  * 15.11.21 - **Distributed Storage and Processing with Hadoop** -- {{ :teaching:autumn_term_2021:hpda21-lecture-04.pdf |Slides}} -- {{ :teaching:autumn_term_2021:hpda21-04.pdf |Exercise}} 
-    * Exercise: Data processing with Hadoop. +    * Exercise: MapReduce processing with Python. Sketching the difference between SQL running via Hadoop (and Hive) vsa traditional relational database vs. a data warehouse 
-  * 06.12.21 - **Dataflow Computation** +  * 22.11.21 - **Dataflow Computation and Big Data SQL using Hive** -- {{ :teaching:autumn_term_2021:hpda21-lecture-05-hive.pdf |Slides Hive}} -- {{ :teaching:autumn_term_2021:hpda21-lecture-05-dataflow.pdf |Slides Dataflow}} -- {{ :teaching:autumn_term_2021:hpda21-05.pdf |Exercise}} 
-    * Exercise: Performance analysis of scenariosAnalysing mappings of use cases to systems+    * Exercise: MapReduce via Streaming in HadoopDeveloping a dataflow system in Python
-  * 13.12.21 - **Columnar Access and Document Storage** +  * 29.11.21 - **Columnar Access and Document Storage** -- {{ :teaching:autumn_term_2021:hpda21-lecture-06.pdf |Slides}} -- {{ :teaching:autumn_term_2021:hpda21-07.pdf |Exercise}} 
-    * Exercise: Developing a dataflow system. +    * Exercise: Managing data using MongoDB 
-  * 20.12.21 - **In-Memory Computation** +  * 06.12.21 - **In-Memory Computation** -- {{ :teaching:autumn_term_2021:hpda21-lecture-07.pdf |Slides}} -- {{ :teaching:autumn_term_2021:hpda21-07a.pdf |Exercise}}
-    * ExerciseProcessing data using HBASE and MongoDB +
-  * 10.01.22 **Stream Processing **+
     * Exercise: Data processing using Spark     * Exercise: Data processing using Spark
-  * 17.01.22 - **Visual Analytics and Large-Scale Data Analysis**+  * 13.12.21 - **Stream Processing ** -- Slides -- Exercise 
 +    * Exercise:  
 +  * 20.12.21 - **The Apache Ecosystem and Beyond** -- Slides -- Exercise -- //This slide deck is optional and not subject to examination// 
 +    * Exercise: None 
 +  * 10.01.22 - **Designing Distributed Systems and Performance Modelling** -- Slides -- Exercise 
 +    * During the exercise, we discuss any questions you may have. 
 +    * Exercise: Sketching system architectures and the execution of distributed algorithms. Performance analysis of scenarios. Analyzing mappings of use cases to systems.  
 +  * 17.01.22 - **Visual Analytics and Large-Scale Data Analysis** -- Slides -- Exercise
     * Exercise: Sketching stream workflows for use cases     * Exercise: Sketching stream workflows for use cases
-  * 24.01.22 - **Storage Systems in Cloud and HPC**+  * 24.01.22 - **Storage Systems in Cloud and HPC** -- Slides -- Exercise
     * Exercise: Developing a visualization using GoJS     * Exercise: Developing a visualization using GoJS
-  * 31.01.22 - **INVITED TALK** -- TBA+  * 31.01.22 - **Use Cases in AeroSpace** (tentative title) -- Cornelia Grabe (DLR) -- Slides
     * Exercise: Performance analysis of storage solutions     * Exercise: Performance analysis of storage solutions
   * 07.02.22 - **Summary**   * 07.02.22 - **Summary**
     * Exercise: Q&A Session     * Exercise: Q&A Session
  
 +===== Links =====
 +  * Example Scripts: https://github.com/JulianKunkel/hpda-samples
  
  • teaching/autumn_term_2021/hpda.1633286777.txt.gz
  • Last modified: 2021-10-03 20:46
  • by Julian Kunkel