This course was last conducted in 2018.
Contents
This course is an extension of the course Biostatistics II and will cover four major topics in modern medical statistics which go beyond classical regression models: Modelling multivariate observations, analysing high dimensional data, treating missing values, and fitting and validating predictive models.
The course is structured such that first an introduction to the topic is given, followed by an application session on the topic. The participants are invited to work on their own data. The following table gives an overview about he covered topics.
Structural equation models (SEM) (Jacob Hjelmborg)
Problem area: Observational studies: Describing the interrelation of multiple measurements,Estimation of causal relationships
Statistical methods: Structural equations, causal diagrams and path diagrams reflecting prior causal knowledge
Example: Imagine studying the relation of an exposure of treatment on an outcome – the immediate analysis is to invoke a multiple regression model relating the single outcome with the exposure, possibility adjusting for multiple potential confounders: This may be done for outcomes of different types, dichotomous, survival time etc., and you may even include random effects or latent variables to capture certain dependence structures. But there may be several outcomes of interest and further, they may be related, that is, dependent on each other. The exposures as well. Then the setting is multivariate.
Structural equation modelling deals with this situation of having several related variables, e.g. several outcomes that are dependent on each other. This is done by specifying two structures; 1) The mean value structure which governs the expected value of the outcomes conditional on exposures, fixed effects, and then 2) The covariance structure of the variables which governs the mutual relations between the outcomes. We may use “structural equation models” SEM’s to
- Investigate mediation and moderation of several variables.
- Estimate and compare models across multiple groups of individuals.
- Investigate the properties of multiple-item scales.
- Modelling longitudinal data or repeated measurements.
Statistical learning with high dimensional data (Ulrich Halekoh)
Problem area: Via automatized data collection over time it is today possible to collect massive amounts of data from patients. The same is true for images. In these ‘big data’ situations, one often has no mathematical model for the data but wants either to reveal interpretable structures or diagnostic or prognostic classification of patients.
Statistical methods: Neuronal nets, support vector machines, biased regularized regression modelling
Example: On sixty cancer lung cancer patients, images of the lung provides different aspects of the lung tissue density which are supposed to characterize the present state and the prognostic future of the patients. Special interest lies in the development of a procedure to recommend patients for further treatment.
Clinical Prediction Models (Birgit Debrabant)
Problem area: Regression modelling for prediction purposes aims (in contrast to explanatory models) to evaluate the future performance of a treatment.
Statistical methods: Calibration of a model (how closely is the future outcome predicted) is distinguished from discrimination (is the model good enough to classify patients in future success of failure groups). We consider the validation of prediction models and methods to evaluate possible over fitting.
- Overall performance measures (R-square, Brier score) and
- Performance measures for discrimination (c statistic, AUC)
- Calibration slope
- Cross-Validation
- Bootstrap validation
- Prediction prognosis of disease (e.g. Prediction of 30-day mortality after acute myocardial infarction).
- Prediction to target preventive interventions (e.g. models for hereditary breast cancer).
- Prediction for therapeutic decision-making (e.g. for replacement of risky heart valves).
Example: On sixty cancer lung cancer patients, images of the lung provides different aspects of the lung tissue density which are supposed to characterize the present state and the prognostic future of the patients. Special interest lies in the development of a procedure to recommend patients for further treatment.
Missing values (Pia Veldt Larsen)
Problem area: Measurements lost due to failure of the measurement process (e.g. failure of patient compliance) or to the inherent features of the measurement process (e.g. disruption of the measurement process because of the death of the patient).
Statistical methods: Multiple imputation and Inverse probability weighting
Example: In large study on female breast cancer patients exploring potential clinical risk factors associated with survival, a considerable number of clinical measurements are missing. That is, a standard complete case analysis including all clinical risk factors would leave out a large proportion of the observations, reducing the statistical power of the analysis materially. A further challenge may occur if the clinical measurements are not missing completely at random, but might depend on the levels of the patient’s other clinical measurements. In such situations complete case analyses can lead to biased results or affect the representativeness of the results.
Teaching arrangements
Lecture and practical exercise, possibly on own data of a participant
Course fee
The course is free of charge for PhD students enrolled in Universities that have joined the "Open market agreement".
For all other participants the course fee is DKK 3173,-