Data Analysis and Visualization in R
Module IN2339
This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.
Basic Information
IN2339 is a semester module in English language at Bachelor’s level and Master’s level which is offered in winter semester.
This Module is included in the following catalogues within the study programs in physics.
- Focus Area Bio-Sensors in M.Sc. Biomedical Engineering and Medical Physics
Total workload | Contact hours | Credits (ECTS) |
---|---|---|
180 h | 90 h | 6 CP |
Content, Learning Outcome and Preconditions
Content
R programming basics 1
R programming basics 2 (including report generation with R markdown)
Data importing
Cleaning and organizing data: Tidy data 1
Cleaning and organizing data: Tidy data 2
Base plot
Grammar of graphics 1
Grammar of graphics 2
Unsupervised learning (hierarchical clustering, k-means, PCA)
Case study I
Drawing robust interpretations 1: empirical testing by sampling
Drawing robust interpretations 2: classical statistical tests
Supervised learning 1: regression, cross-validation
Supervised learning 2: classification, ROC curve, precision, recall
Case study II
R programming basics 2 (including report generation with R markdown)
Data importing
Cleaning and organizing data: Tidy data 1
Cleaning and organizing data: Tidy data 2
Base plot
Grammar of graphics 1
Grammar of graphics 2
Unsupervised learning (hierarchical clustering, k-means, PCA)
Case study I
Drawing robust interpretations 1: empirical testing by sampling
Drawing robust interpretations 2: classical statistical tests
Supervised learning 1: regression, cross-validation
Supervised learning 2: classification, ROC curve, precision, recall
Case study II
Learning Outcome
At the end of the module students are able to:
- 1. produce scripts that automatically generate data analysis report
- 2. import data from various sources into R
- 3. apply the concepts of tidy data to clean and organize a dataset
- 4. decide which plot is appropriate for a given question about the data
- 5. generate such plots
- 6. know the methods of hierarchical clustering, k-means, PCA
- 7. apply the above methods and interpret their outcome on real-life datasets
- 8. know the concept of statistical testing
- 9. devise and implement resampling procedures to assess statistical significance
- 10. know the conditions of applications and how to perform in R the following statistical tests: Fisher test, Wilcoxon test, T-test.
- 11. know the concept of regression and classification
- 12 apply regression and classification algorithms in R
- 13. know the concept of error in generalization, cross-validation
- 14. implement in R a cross-validation scheme.
- 15. know the concepts of sensitivity, specificity, ROC curves
- 16. assess the latter in R
- 1. produce scripts that automatically generate data analysis report
- 2. import data from various sources into R
- 3. apply the concepts of tidy data to clean and organize a dataset
- 4. decide which plot is appropriate for a given question about the data
- 5. generate such plots
- 6. know the methods of hierarchical clustering, k-means, PCA
- 7. apply the above methods and interpret their outcome on real-life datasets
- 8. know the concept of statistical testing
- 9. devise and implement resampling procedures to assess statistical significance
- 10. know the conditions of applications and how to perform in R the following statistical tests: Fisher test, Wilcoxon test, T-test.
- 11. know the concept of regression and classification
- 12 apply regression and classification algorithms in R
- 13. know the concept of error in generalization, cross-validation
- 14. implement in R a cross-validation scheme.
- 15. know the concepts of sensitivity, specificity, ROC curves
- 16. assess the latter in R
Preconditions
no info
Courses, Learning and Teaching Methods and Literature
Courses and Schedule
Type | SWS | Title | Lecturer(s) | Dates | Links |
---|---|---|---|---|---|
VO | 2 | Data Analysis and Visualization in R (IN2339) | Gagneur, J. |
Tue, 14:00–16:00, virtuell |
eLearning |
UE | 4 | Exercise Data Analysis and Visualization in R (IN2339) | Gagneur, J. |
Tue, 14:00–16:00, Interims II 004 and dates in groups |
Learning and Teaching Methods
Lecture provides the concept + programming exercises where these concepts are applied on data. The goal of each exercise is the generation of report documents.
Media
Weekly posted exercises online, slides, live demo
Literature
An Introduction to Statistical Learning
with Applications in R http://www-bcf.usc.edu/~gareth/ISL/
R for Data Science, by Garrett Grolemund and Hadley Wickham
with Applications in R http://www-bcf.usc.edu/~gareth/ISL/
R for Data Science, by Garrett Grolemund and Hadley Wickham
Module Exam
Description of exams and course work
Written exam and project work:
The listed achievements, see Intended Learning Outcomes, are evaluated by one written exam of 90 min. There will be moreover two case studies, where the students must provide the source code that generates the report of an analysis of a given dataset. The analysis of this data covers all topics stated under Intended Learning Outcomes. The first case study covers topics 1-7. The second covers the topics 8-16. The final mark is the exam mark with bonus points for the two case studies.
The listed achievements, see Intended Learning Outcomes, are evaluated by one written exam of 90 min. There will be moreover two case studies, where the students must provide the source code that generates the report of an analysis of a given dataset. The analysis of this data covers all topics stated under Intended Learning Outcomes. The first case study covers topics 1-7. The second covers the topics 8-16. The final mark is the exam mark with bonus points for the two case studies.
Exam Repetition
The exam may be repeated at the end of the semester.