Explorative Data Analysis of the Lean European Open Survey on SARS-CoV‑2 infected patients (LEOSS) Public Data Set

Welcome! The following statistics provide some visusal insights into LEOSS Public Data Set. The Public Data Set constitutes patient data from the LEOSS cohort after a data cleaning process and includes data from patients documented until December 17, 2020. The LEOSS Public Data Set is originating from the LEOSS Initiative. The data preprocessing pipeline is described by Jakob et al. in "Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19".

Copyright: This work is licensed under the Creative Commons Attribution Non-Commercial 4.0 License. With the use of this data you agree to include a proper acknowledgement of the LEOSS study group in any work based on the data set. By working with this notebook you agree to maintain the confidentiality of the data set at all times and to not attempt to compromise or otherwise violate the privacy of the patients described. To view a copy of the license, visit https://creativecommons.org/licenses/by-nc/4.0/.

Acknowledgements: These analyses are based on voluntary work by PROCON IT, which we are truly grateful for.

If you have any comments on the notebook, please drop us a message at analysis@leoss.net.

Data Set Structure

Here we provide information on the basic structure of the LEOSS Public Data Set.

The data set consists of 4802 patients and 16 variables. A row represents anonymized data of a single patient.

The columns are described by the variables:

  • Age.at.diagnosis (categorial): age group
  • Sex (categorial): sex
  • Month.first.diagnosis (categorical): month of diagnosis
  • Year.first.diagnosis (categorical): year of diagnosis
  • Uncomplicated.phase (Boolean): true if patient was in the Uncomplicated Phase*
  • Complicated.phase (Boolean): true if patient was in the Complicated Phase*
  • Critical.phase (Boolean): true if patient was in the Critical Phase*
  • Recovery.phase (Boolean): true if patient was in the Recovery Phase*
  • Last.known.patient.status (categorial): health status at the end of medical consultation
  • Vasopressors.in.complicated.phase (categorial): true if cathecholamine was administered in the Complicated Phase*
  • Vasopressors.in.critical.phase (categorial): true if adrenaline or other cathecholamine were administered in the Critical Phase*
  • Invasive.ventilation.in.critical.phase (categorial): true if invasive ventilation was neccessary in the Critical Phase*
  • Superinfection.in.uncomplicated.phase (categorical): details on superinfections in the Uncomplicated Phase*
  • Superinfection.in.complicated.phase (categorical): details on superinfections in the Complicated Phase*
  • Superinfection.in.critical.phase (categorical): details on superinfections in the Critical Phase*
  • Symptoms.in.recovery.phase (categorial): true if symptoms occurred in the Recovery Phase*

*The Clinical Phases are defined according to the LEOSS criteria on https://leoss.net/statistics/:

Uncomplicated Phase:

  • asymtomatic

OR

  • symptoms of upper respiratory tract infection
  • nausea, emesis, diarrhea
  • fever

Complicated Phase:

  • need for new oxygen supplementation
  • clinically meaningful increase of prior oxygen home therapy
  • PaO2 at room air < 70 mmHg
  • SO2 at room air < 90 %
  • AST or ALT > 5x ULN
  • new cardiac arrhythmia
  • new pericardial effusion > 1 cm
  • new heart failure with pulmonary edema, congestive hepatopathy or peripheral edema

Critical Phase:

  • need for catecholamines
  • life-threatening cardiac arrhythmia
  • need for unplanned mechanical ventilation (invasive or non-invasive)
  • prolongation (>24h) of planned mechanical ventilation
  • Liver failure with Quick < 50 % or INR > 3.5
  • qSOFA >= 2
  • acute renal failure in need of dialysis

Recovery Phase:

  • improvement by one degree of severity according to this scheme or discharge from hospital

AND

  • defervescence

AND

  • no further progression or re-hospitalization

To get to know the Public Data Set better, the values of variables are shown below according to the used data set. Please be aware that the Public Data Set is only a part of the complete LEOSS data set. Anonymization processes may lead to variables having less values than in the complete LEOSS data set. For example the variable 'Sex' can also have the value 'Diverse', but there is no patient with this sex in the Public Data Set.

Age.at.diagnosis:
<= 25 years, 26 - 45 years, 46 - 65 years, 66 - 85 years, > 85 years

Sex:
Female, Male

Month.first.diagnosis:
<= 3, 4, 5, 6, 7, 8, 9, 10, 11

Year.first.diagnosis:
2020

Uncomplicated.phase:
no, yes

Complicated.phase:
no, yes

Critical.phase:
no, yes

Recovery.phase:
no, yes

Vasopressors.in.complicated.phase:
n/a, no, unknown/missing, yes

Vasopressors.in.critical.phase:
n/a, no, unknown/missing, yes

Invasive.ventilation.in.critical.phase:
n/a, no, unknown/missing, yes

Superinfection.in.uncomplicated.phase:
bacterial, bacterial&fungal, fungal, n/a, none, unknown/missing

Superinfection.in.complicated.phase:
bacterial, bacterial&fungal, fungal, n/a, none, unknown/missing

Superinfection.in.critical.phase:
bacterial, bacterial&fungal, fungal, n/a, none, unknown/missing

Symptoms.in.recovery.phase:
n/a, no, unknown/missing, yes

Last.known.patient.status:
Dead from COVID-19, Dead from other causes, Not recovered (means recovery phase not achieved), Recovered, unknown/missing

n/a: In cases where the patient was not in the respective phase a variable refers to, the variable has been given the value 'Not applicable (N/a)'. If for example a patient has never been in the Critical Phase, 'Vasopressors.in.critical.phase' is a variable which is not applicable to this patient.

These are the first 50 patients in the Public Data Set:

Age.at.diagnosis Sex Month.first.diagnosis Year.first.diagnosis Uncomplicated.phase Complicated.phase Critical.phase Recovery.phase Vasopressors.in.complicated.phase Vasopressors.in.critical.phase Invasive.ventilation.in.critical.phase Superinfection.in.uncomplicated.phase Superinfection.in.complicated.phase Superinfection.in.critical.phase Symptoms.in.recovery.phase Last.known.patient.status
0 26 - 45 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
1 66 - 85 years Male <= 3 2020 yes yes no yes no n/a n/a none none n/a no Recovered
2 26 - 45 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a unknown/missing
3 26 - 45 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
4 46 - 65 years Male <= 3 2020 no yes yes yes no yes yes n/a bacterial bacterial yes Recovered
5 46 - 65 years Male <= 3 2020 yes yes no yes no n/a n/a bacterial&fungal bacterial&fungal n/a no Recovered
6 26 - 45 years Male <= 3 2020 yes no no no n/a n/a n/a unknown/missing n/a n/a n/a Recovered
7 46 - 65 years Female <= 3 2020 yes yes no yes no n/a n/a none none n/a yes Recovered
8 <= 25 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
9 46 - 65 years Male <= 3 2020 yes no no yes n/a n/a n/a none n/a n/a no Recovered
10 26 - 45 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
11 46 - 65 years Female <= 3 2020 yes no no no n/a n/a n/a unknown/missing n/a n/a n/a Recovered
12 26 - 45 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
13 46 - 65 years Male <= 3 2020 yes no no no n/a n/a n/a unknown/missing n/a n/a n/a Not recovered (means recovery phase not achieved)
14 46 - 65 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
15 26 - 45 years Male <= 3 2020 yes yes no yes no n/a n/a bacterial bacterial n/a no Recovered
16 66 - 85 years Female <= 3 2020 yes no no yes n/a n/a n/a none n/a n/a no Recovered
17 46 - 65 years Male <= 3 2020 yes no no no n/a n/a n/a unknown/missing n/a n/a n/a Not recovered (means recovery phase not achieved)
18 46 - 65 years Female <= 3 2020 yes no no no n/a n/a n/a unknown/missing n/a n/a n/a Not recovered (means recovery phase not achieved)
19 46 - 65 years Female <= 3 2020 yes yes no no no n/a n/a none none n/a n/a Recovered
20 46 - 65 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
21 26 - 45 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
22 46 - 65 years Male <= 3 2020 yes no yes no n/a yes yes none n/a bacterial n/a Recovered
23 46 - 65 years Male <= 3 2020 yes yes yes no yes yes yes none none none n/a Recovered
24 46 - 65 years Male <= 3 2020 yes no yes no n/a yes yes none n/a none n/a Recovered
25 46 - 65 years Male <= 3 2020 no yes yes no yes yes yes n/a none none n/a Recovered
26 26 - 45 years Male <= 3 2020 yes yes no no no n/a n/a none none n/a n/a Recovered
27 66 - 85 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
28 66 - 85 years Female <= 3 2020 yes yes yes no no yes yes none none bacterial n/a Dead from COVID-19
29 46 - 65 years Male <= 3 2020 yes no no yes n/a n/a n/a none n/a n/a no Recovered
30 > 85 years Female <= 3 2020 no yes no no no n/a n/a n/a none n/a n/a Recovered
31 66 - 85 years Female <= 3 2020 yes yes no yes no n/a n/a none none n/a no Recovered
32 26 - 45 years Female <= 3 2020 yes yes no yes no n/a n/a none none n/a no Recovered
33 46 - 65 years Male <= 3 2020 yes yes no yes no n/a n/a none none n/a yes Recovered
34 46 - 65 years Male <= 3 2020 yes yes no yes unknown/missing n/a n/a none none n/a no Recovered
35 46 - 65 years Male <= 3 2020 yes no no yes n/a n/a n/a none n/a n/a no Recovered
36 > 85 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Not recovered (means recovery phase not achieved)
37 46 - 65 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
38 46 - 65 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Not recovered (means recovery phase not achieved)
39 26 - 45 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Not recovered (means recovery phase not achieved)
40 <= 25 years Male <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
41 46 - 65 years Male <= 3 2020 yes yes yes no no yes yes none none unknown/missing n/a Not recovered (means recovery phase not achieved)
42 66 - 85 years Male <= 3 2020 no yes no no unknown/missing n/a n/a n/a unknown/missing n/a n/a Not recovered (means recovery phase not achieved)
43 66 - 85 years Male <= 3 2020 yes no no yes n/a n/a n/a none n/a n/a no Recovered
44 46 - 65 years Male <= 3 2020 yes yes no yes no n/a n/a none none n/a yes Recovered
45 66 - 85 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
46 46 - 65 years Female <= 3 2020 yes no no no n/a n/a n/a none n/a n/a n/a Recovered
47 26 - 45 years Female <= 3 2020 yes yes no yes no n/a n/a bacterial bacterial n/a no Recovered
48 66 - 85 years Male <= 3 2020 yes yes no yes no n/a n/a unknown/missing bacterial n/a no Recovered
49 26 - 45 years Female <= 3 2020 yes yes no yes no n/a n/a unknown/missing none n/a no Recovered

1. Descriptive Analysis

The following descriptive statistics are computed in this section:

  • Age Distribution
  • Sex Distribution
  • Age Distribution by Sex

The total number of patients is 4802.

The total number of patients is 4802.


The total number of patients is 4802.


2. COVID-19 Mortality and Recovery Rates

The following descriptive statistics on the health status at the end of medical consultation are computed in this section:

  • Frequency of Health Status at the End of Medical Consultation
  • COVID-19 Mortality and Recovery Rates
  • COVID-19 Mortality vs Recovery by Age
  • COVID-19 Mortality vs Recovery by Sex
  • Crosstable Mortality vs Recovery vs Sex vs Age

Note that we will use a filtered data set for computing the rates, which we describe below.

Frequency of Health Status at the End of Medical Consultation

Last.known.patient.status
Recovered 3695
Dead from COVID-19 590
Not recovered (means recovery phase not achieved) 395
Dead from other causes 96
unknown/missing 26

The total number of patients is 4802.


For the remaining section 2 we proceed with a filtered data set.

For the COVID-19 mortality and recovery rate computations, we exclude patients with a documented health status at the end of medical consultation of 'unknown/missing', 'not recovered', and 'dead from other causes'. Please note that this influences the following computations and plots.

The number of patients in the filtered data set is 4285.

Frequency of Health Status at the End of Medical Consultation in the Filtered Data Set

Last.known.patient.status
Recovered 3695
Dead from COVID-19 590
unknown/missing 0
Dead from other causes 0
Not recovered (means recovery phase not achieved) 0

COVID-19 Overall Mortality and Recovery Rate for Filtered Data Set:

COVID-19 Overall Mortality Rate: 0.13768961493582263
COVID-19 Overall Recovery Rate: 0.8623103850641773

COVID-19 Mortality Rate for Filtered Data Set by Age:

Dead from COVID-19
Age.at.diagnosis
<= 25 years 0.000000
26 - 45 years 0.010610
46 - 65 years 0.060920
66 - 85 years 0.238768
> 85 years 0.418773

COVID-19 Recovery Rate for Filtered Data Set by Age:

Recovered
Age.at.diagnosis
<= 25 years 1.000000
26 - 45 years 0.989390
46 - 65 years 0.939080
66 - 85 years 0.761232
> 85 years 0.581227

The number of patients in the filtered data set is 4285. Patients with a documented health status at the end of medical consultation of 'unknown/missing', 'not recovered', and 'dead from other causes' are excluded in the filtered data set.


The number of patients in the filtered data set is 4285. Patients with a documented health status at the end of medical consultation of 'unknown/missing', 'not recovered', and 'dead from other causes' are excluded in the filtered data set.

Crosstable Mortality vs Recovery vs Sex vs Age (percentage of patients in each of the subgroups)

Last.known.patient.status Recovered Dead from COVID-19 All
Sex Age.at.diagnosis
Female <= 25 years 2.12 0.00 2.12
26 - 45 years 7.77 0.05 7.82
46 - 65 years 12.46 0.35 12.81
66 - 85 years 12.37 2.45 14.82
> 85 years 2.54 1.40 3.94
Male <= 25 years 1.45 0.00 1.45
26 - 45 years 9.64 0.14 9.78
46 - 65 years 21.35 1.84 23.20
66 - 85 years 15.31 6.23 21.54
> 85 years 1.21 1.31 2.52
All 86.23 13.77 100.00

3. Clinical Phases

From here on we will indicate the four clinical phases as

  • Uncomplicated Phase -> UC
  • Complicated Phase -> CO
  • Critical Phase -> CR
  • Recovery Phase -> RC

In the following we will plot the:

  • Frequency of Phases
  • Frequency of Disease Courses for Patients being in the Uncomplicated Phase at Baseline
  • Frequency of Disease Courses by Age for Patients being in the Uncomplicated Phase at Baseline
  • Frequency of Disease Courses by Sex for Patients being in the Uncomplicated Phase at Baseline
  • Frequency of Disease Courses for Male Patients being in the Uncomplicated Phase at Baseline
  • Frequency of Disease Courses for Female Patients being in the Uncomplicated Phase at Baseline
  • Frequency of Disease Courses for Patients being in the Complicated Phase at Baseline
  • Frequency of Disease Courses by Age for Patients being in the Complicated Phase at Baseline
  • Frequency of Disease Courses by Sex for Patients being in the Complicated Phase at Baseline

The Baseline/diagnosis is defined as the day when the sample of the first positive SARS-CoV-2 result was taken.

The disease courses are denoted as compositions of the above phase abbreviations. 'UC_RC' is, for example, whenever a patient was in the Uncomplicated Phase at Baseline and then in the Recovery Phase without a severe disease progression (Complicated or Critical Phase).

Since there might be patients who have no phase documented at all we need to proceed with a filtered data set in which those patients are dropped.

The number of patients in this filtered data set is 4797.

Please note that these numbers add up to more than the total number of patients as each patient can be in different phases during the course of disease.

Disease Courses for Patients being in the Uncomplicated Phase at Baseline

Indicated by the value name starting with UC_.

The number of patients being in the Uncomplicated Phase at Baseline is 4102 from 4797 total patients.

The number of patients being in the Uncomplicated Phase at Baseline is 4102 from 4797 total patients.

The number of patients being in the Uncomplicated Phase at Baseline is 4102 from 4797 total patients. The number of male patients being in the Uncomplicated Phase at Baseline is 2357 from 2786 total male patients. The number of female patients being in the Uncomplicated Phase at Baseline is 1745 from 2011 total female patients.

The number of male patients being in the Uncomplicated Phase at Baseline is 2357 from 2786 total male patients. The number of female patients being in the Uncomplicated Phase at Baseline is 1745 from 2011 total female patients.

Disease Courses for Patients being in the Complicated Phase at Baseline

Indicated by the value name starting with CO_.

The number of patients being in the Complicated Phase at Baseline is 536 from 4797 total patients.

The number of patients being in the Complicated Phase at Baseline is 536 from 4797 total patients.

The number of patients being in the Complicated Phase at Baseline is 536 from 4797 total patients. The number of male patients being in the Complicated Phase at Baseline is 326 from 2786 total male patients. The number of female patients being in the Complicated Phase at Baseline is 210 from 2011 total female patients.

Crosstable for the Phases (percentage of patients in each of the subgroups)

Critical Phase no yes Total
Recovery Phase no yes no yes
Uncomplicated Phase Complicated Phase
no no 0.00 0.06 2.15 1.10 3.31
yes 2.08 5.27 2.44 1.38 11.17
yes no 6.00 41.40 1.88 1.88 51.16
yes 4.48 20.35 4.02 5.50 34.35
Total 12.57 67.08 10.49 9.86 100.00

4. Superinfections

In LEOSS superinfections are recorded as 'Proven bacterial infection', 'Probable or suspected bacterial infection', 'Proven fungal infection' or 'Probable or suspected fungal infection'.

The number of patients with any at least probable or suspected superinfection is 1718 from 4797 total patients.

More will follow!