Laszlo Treszkai

Estimating personal COVID risk from population-level data

2021-01-06T00:00:00+01:00

In a nutshell: If N people get infected per day in a country with susceptible population size S, then doing “average” activities has approximately an N/S risk of contracting it on that day.

Introduction

The microCOVID tool is great for estimating the chances of contracting COVID-19 during a given activity.¹ It does so by estimating the risk of transmission of various activities from literature data, combining this with an estimate of a met person being COVID-positive.

In this post, I’m building a model which assumes that every citizen of the country acts exactly the same on a given day, which leads to the current infection rate. This means that if you do the same activities as “everyone else”, you’ll have the same chance of getting infected as “everyone else”.

Caveat: these are all back-of-the-envelope calculations of an extremely simplistic model.

The numbers

Let’s run the numbers of this model on Hungary, on January 6, 2021.

New cases (average of 7 days): 1746/day
Recovered: 180,000
Population (²): 9,770,000
Susceptible (³): population - recovered = 9,590,000

One might argue⁴ that the actual case count is higher than those tested positive, but I assume cases which do not make the books are not severe enough to warrant a test. However, if your country doesn’t seem to do enough tests (e.g. because the ratio of positive results per test is absurdly high), then the actual case count is surely higher.

That means that for the “average” person, the chance of becoming infected is 1,746 / 9,590,000 / day ≈ 180 microCOVIDs / day.

Note that this “average” person does not refer to the most common person (the mode) or the median; but literally, this Average Joe makes up the entire 10 million population of the country. In this model, everyone is equal. (And nobody more equal than the others.)

Act like this average person for an entire year (assume January 6 conditions all year long), and you have 6% chance of contracting the disease. (Because of course, every day of the year is equal.)

Comparing with microCOVID

I tried to simulate the working day of the “average” citizen with the microCOVID tool. I did this using a one-time interaction (daily):

for 8 hours
with 1 person
at 6+ feet / 2+ meters
indoors
cotton mask or bandana on me⁵ and on them⁶
having normal conversation.

This adds up to about 200 microCOVIDs each day. That’s surprisingly close to my figure! I pinky-swear that I first picked these settings based on a gut feeling, and didn’t adjust them to approximate 180 microCOVID.

Obviously, changing the settings will move the risk away from 200 μCOVID/day. Grocery shopping and spouses/kids/relatives should also be added to the list. But the fact that these two models are in the same ballpark is good validation.

Conclusion

The super easy method for pandemic risk assessment:

Calculate the risk for the Average Citizen, by simply dividing the daily case increase⁷ with the population size (whether that’s a city or a country). Adjust this upwards if you think your country has insufficient testing practices.
Adjust this with some factors for how you think your behavior compares with the average. Working from home? Divide by 5. Doing grocery shopping online? Divide by 2. Meeting a dozen people at the office every day? Multiply by five. I’m just making these numbers up, but so can you. You should do this step at the beginning, to minimize fooling yourself.
Multiply this by 365 to get the risk of contracting the virus in a year⁸. If you want to go fancy, use lower figures for the summer, higher ones while the graphs are skyrocketing. (Again, decide beforehand how to calculate this step.)

Finally, smash a generous error bar on the result: say, plus or minus an order of magnitude.

Can you live with 6% a chance of COVID-19 in the coming year? If not, then maybe you should scale back your activities. If your country’s average risk is too low for you (for example, because you’re young and live in New Zealand and are more likely to die in a car accident), then consider saying hello to the neighbors from a friendly distance.

Stay safe. Wear a mask, wear a helmet, wear a safety belt.

It also introduces a COVID budget that you can allocate as you wish: if you target a 1% chance of contracting COVID in a year, then you have 200 microCOVIDs allocated for each week. (0.01 / 50 = 0.0002.) Spend it wisely. ↩
Until recently, kids did not play a significant role in the transmission and hospitalization, therefore minors could be (or could have been) deducted from the susceptible population. ↩
microCOVID seems to compare the case count against the entire population. I count a recovered person as not immune to the virus. ↩
In Hungary, the reported prevalence is 0.12%, but microCOVID uses an adjusted prevalence of 0.39%. This correction of 3x my model should use too. ↩
You should totally buy an FFP-2 mask and fit it snugly to your face. If you wear it for long periods then buy a handful and rotate them daily. Dispose after one month. Adjust numbers as budget allows, but buy one good mask. This does not constitute medical advice. ↩
Convince your employer to buy every employee an FFP-2 mask. The less time you spend on sick leave, the better it is for them. ↩
Preferably averaged over the last 7 days. ↩
Technically, you should calculate 1 − (1−p)³⁶⁵, but that’s practically 365 × p. The overall calculation has much bigger errors anyway. ↩

The effects of daylight savings time adjustment on the incidence rate of acute myocardial infarction: a Bayesian meta-analysis (original research)

2019-11-11T00:00:00+01:00

Laszlo Treszkai (firstname.lastname@gmail.com)

Version of 11 November, 2019.

This document might be revised in the future; any potential updates will be linked from here.

Abstract

Background

Multiple observational studies claim that the daylight savings time (DST) adjustment in spring causes an increase in acute myocardial infarction (AMI) count during the following days or weeks, attributing this increase to the reduction in sleep or the disturbance in the circadian rhythm. Previous studies used frequentist methods for interval estimation and often showed “statistically significant” differences, although the results were inconsistent and sometimes the effects in the same study were incoherent (such as a significant difference on Tuesday but not on Monday). A recent meta-analysis used frequentist methods and showed an increase in incidence rate after the spring adjustment and could not show a change after the autumn adjustment.

Methods

This study reanalyzes the data described in the relevant observational studies. We propose a Bayesian model that should capture the alleged phenomenon truthfully, apply this model consistently to every study, and combine the results using a fixed-effects model. Under our model, the risk ratio on Monday is the highest, it is slightly lower on Tuesday, and it decreases linearly to 1 until Saturday. We do the calculations using both analytic methods and Monte Carlo methods with the Stan software.

Results

In total, 7 observational studies were identified and analyzed, from which one was excluded. The remaining 6 studies included 14,024 AMI incidences on the week following spring DST adjustment, and 15,921 incidences on the week following autumn DST adjustment. Together with related trend data obtained from the surrounding weeks, these figures show a risk ratio (RR) of 107.7% on the Monday following a spring DST change (95% credible interval: [104.8%, 110.7%]), and a mean RR of 97.7% (95% CrI: [95.1%, 100.3%]) after the autumn DST change. The results from analytic and Monte Carlo methods matched precisely. The credible intervals obtained from a non-informative prior yield practically the same results, and so does a slightly more complex model for the time decay of the effect.

Conclusion

Overall, the spring DST adjustment has a small but quasi-certain positive effect on AMI incidences, and the risk ratio in autumn is approximately 1 or slightly less than 1. We note that the combined RR is less than half of what has been suggested by certain smaller but highly cited studies, but our analysis shows larger effects than the recent meta-analysis of the same data by Manfredini et al. (2019). Our results give strong support to the hypothesis that the DST transitions – especially the spring transition when sleep is reduced – have a noticeable effect on our circadian rhythm. Nonetheless, we cannot confidently claim that these results are of direct practical importance: there is no evidence that the additional AMI counts in the days after DST transition are not merely shifted earlier from the following weeks.

Introduction

This study has a two-fold purpose. First, it compiles all the published data about the effects of DST on the risk of AMI, and presents a meta-analysis where the data from multiple countries and years is analyzed in a unified model. On the other hand, it demonstrates the use of Bayesian methods in an analysis or meta-analysis, explaining the thinking behind model specification and quantifying our prior beliefs about the parameters. The software required for reproducing this paper is freely available at https://github.com/treszkai/BayesianScience.

Sipilä et al. (2016) explain the importance of sleep and its effects on the risk of heart disease:

Sleep is essential for well-being and its disturbances have been associated with disruption of numerous physiological processes and changes in cardiovascular risk factors (1,2). Sleep disordered breathing has been associated with risk of coronary heart disease (3,4) and sleep impairment with prognosis of myocardial infarction (MI) (5).

Daylight saving time (DST) is used in many countries including the United States and the members of the European Union for prolonging of sun-light proportion of day. Clock shifts however alter and disrupt chronobiological rhythms and impair sleep (7,8) providing a ‘‘natural experiment’’ for studying the effects of rhythm and sleep disruptions on the incidence of vascular events. Although chronobiological factors have been shown to affect the incidence of MI (9,10), studies on the association of DST and the incidence of MI have been partly conflicting. With one exception (11), all studies show changes in the temporal distribution of MI in the week following DST transitions but the patterns of change differ (12–15) and there is no agreement about the impact of these changes on the overall incidence of MI (11–16).

We will see that there is a simple reason for the disagreement between studies: most of the studies have been critically underpowered.

Although the majority of medical research uses frequentist methods, this is not the first meta-analysis in medicine that uses Bayesian statistics. The following are some noteworthy examples:

Gelman et al. (2013) present an example for estimating mortality ratios after a myocardial infarction between the control group and a group that uses beta-blockers, using data from 22 independent studies.
Devin Incerti (2015) provides a Bayesian re-analysis of the effects of mammography on breast cancer-related mortality rates.
Yang et al. (2017) analyze 25 randomized controlled trials of prokinetics for the treatment of functional dyspepsia in a Bayesian network meta-analysis.

Methodology shared in most papers

Following the naming of (Čulić 2013), we refer to the week following the DST adjustment as “posttransitional week”.

Every study that was included compares the observed AMI counts against a trend prediction. The trend prediction for AMI counts on given days – sometimes called “control group” – was usually defined as the average of the respective days on the two weeks before and after the posttransitional week. The analysis of Sandhu et al. (2014) was the only exception, as they used a regression model that included AMIs from all year except the two weeks following the spring and autumn DST adjustments.

Years on which the DST adjustment coincided with Easter were usually excluded from the studies. If Easter fell on the 2 weeks following (or preceding) the DST adjustment, the control period was the two out of three weeks that did not include Easter.

Every paper adjusted the counts for the shorter (resp. longer) Sunday following a spring (resp. autumn) transition by multiplying the real counts with $24/23$ (resp. $24/25$ ). This sometimes resulted in fractional AMI counts, which we rounded to the nearest integer when treated as an observation.

Materials and methods

Study selection

We analyzed data from every study that was included in the meta-analysis of Manfredini et al. (2019).

Performing a PubMed search instead of using the list of publications from (Manfredini et al. 2019) would be a tedious process with little benefit: said meta-analysis retrieved 2633 papers dated up to 31 December 2018 (from which 7 were relevant).

Analyzed data

From each paper, we extracted the trend predictions and the actual AMI counts on each day of the spring and autumn posttransitional weeks. When the trend prediction was not available, we divided the total number of AMI cases by the study length in days. We restricted our analysis to the number of incidences, and ignored all variables that describe incidences, such as age and gender of patient, STEMI (ST elevation MI) or non-STEMI, or various medications taken prior to the incident.

Problems with standard statistical tests

The standard statistical practice for deciding whether there is a difference in a particular variable (such as AMI counts) between two groups is to use a null hypothesis significance test (NHST). Using this method, one defines a null hypothesis as the variable of interest having some predetermined value, which in this case would correspond to zero increase in AMI counts after a DST change. The NHST answers the question: assuming the null hypothesis is true, what is the probability that data which is generated according to the sampling and testing intentions has a more extreme test statistic than that of the actual observations (Kruschke, Liddell 2018). If this probability is less than some fixed threshold (typically 0.05), the effect is claimed to exist. The NHST suffers from a multitude of problems, and has received its fair share of criticism from statisticians. It encourages black-and-white thinking without allowing uncertainty (claiming that an effect either exists or not, depending on the p-value), it encourages binary classification of effects without quantifying the relationship (statistically significant differences might be of no practical relevance if they are small), and these tests are conducted against a given null hypothesis without any way to gain evidence for the null hypothesis (an inability to refute the null hypothesis is not equal to accepting it). Recently, The American Statistician released a special issue titled Moving to a World Beyond “p < 0.05” (Wasserstein 2019), together with commentaries from 94 authors.

We can get a more accurate sense of the value of the parameter if instead of testing a hypothesis, we estimate the value of the parameter. The standard tool for this is stating the 95% confidence interval () for a parameter, which is the set of parameter values that wouldn’t be rejected at the $p<0.05$ level. This is the approach suggested by Cumming (2014) and Cumming and Calin-Jageman (2016), who call it the New Statistics.

While reporting intervals is better than a single value from it (i.e. the p-value), confidence intervals still suffer from deep-rooted flaws. It still encourages black-and-white thinking: parameter values inside the CI are compatible with the null hypothesis, those outside it are not. Confidence intervals do not give distributional information, i.e. a value close to the limits of the CI is not “less compatible” with the hypothesis then a value in the middle, nor is a study of large sample size “more confident” than a smaller study (although usually the CI of a large study is narrower). This binary nature makes it hard to aggregate the results of multiple studies and to perform a meta-analysis accurately. In addition, confidence intervals are also frequently misinterpreted: specifically, the true parameter value is not 95% likely to be inside the CI, although they are often thought to be.

Kruschke and Liddell (2018) compare approaches to statistical inference along two axes: whether the method uses a frequentist or Bayesian framework, and whether the method compares hypotheses or estimates parameter values. They make a detailed case that Bayesian parameter estimation is superior in most situations to the frequentist methods or Bayesian hypothesis testing, hence the title of the paper, The Bayesian New Statistics.

Overview of our model and statistical methods

In this meta-analysis we define a (Bayesian) statistical model for the parameter of interest and our observations. For every paper, we have the following observations: the AMI counts on each day of the posttransitional week, and the AMI counts predicted by the trend. The unobserved parameter is the risk ratio (RR), i.e. the multiplier by which mean AMI counts increase in the posttransitional week, compared to the same day of an ordinary week. Our description of this parameter initially also include some reasonable uncertainty in our beliefs, quantified in the prior distribution. The goal of the analysis is to derive the posterior probability distribution of the RR (or posterior for short), which is an adjustment of the prior probabilities based on the likelihood of each parameter value, i.e. the probability that a given parameter value would produce the observed data. Although the posterior is influenced by the prior and the statistical model, this influence can be insubstantial in the face of enough data, as will be the case in this analysis. Finally, the posterior is summarized in a 95% credible interval of parameter values, which is either a central credible interval or a highest density posterior interval.

Notation

For a particular study $s$ , $t_i^{(s)}$ denotes the counts as predicted by the trend model on day $i$ of the posttransitional week (with $d = 1,\,\ldots,\,5$ for Monday, …, Friday after the change) and $y_d^{(s)}$ denotes the observed count on day $i$ . The (unobserved) mean of the distribution of $y_d^{(s)}$ is denoted by $x_d^{(s)}$ – the meaning of this variable will become clear in the next section. The risk ratio for day $d$ is denoted by $r_d^{(s)} = x_d^{(s)} / t_d^{(s)}$ . Finally, $\mathcal D^{(s)}$ denotes the whole dataset, i.e. all of the observations $\{y_1^{(s)},\ldots,y_5^{(s)}\}$ . To avoid cluttered notation, sometimes the superscript is omitted, resulting in e.g. $y_1$ .

Poisson distribution

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event. In our case, the “event” is an AMI, and the fixed interval of time is a day. Although AMIs don’t happen at a constant rate throughout the day, the sum of Poisson-distributed random variables is also Poisson-distributed, so any day’s total will also be Poisson-distributed.

The distribution has a single parameter, which is a positive real number, and is often denoted $λ$ . The mean (expected value) of $\text{Poisson}(λ)$ is $λ$ , and the standard deviation is $\sqrt{λ}$ . Its probability mass function is shown below for $λ=100$ , along with the 95% highest density interval (HDI) – the shortest interval that covers 95% of the probability mass.

The analyzed studies reported the sum of AMIs on a given day over the period of the study (e.g. all posttransitional Tuesdays during the years 2010–2013), never the counts for individual years. This sum is denoted with $y_d$ , where $d$ signifies the day. We note again that the individual counts are each Poisson-distributed, so their sum is Poisson-distributed too. (However, their average would not be Poisson-distributed.) This means that $y_d$ is sampled from a Poisson distribution whose parameter $x_d$ is the sum of the trend on day $d$ over the period of the study ( $t_d$ ), multiplied with the for the given day ( $r_d$ ).

In order for the Poisson assumption to not hold in this analysis, two individuals experiencing an AMI on a given day need to be statistically dependent conditional on the day’s average. This is not the case during a heat wave or a news broadcast about a major catastrophe, when the AMIs are dependent but not conditionally dependent. The rare scenarios for conditional dependence are when two people partake in a strenuous activity together (such as hiking), or when the AMI of a person causes an AMI in another.

Model of posttransitional AMI counts

We perform the analysis using a fixed-effects model, which assumes that the DST adjustment effects an identical increase in AMI counts in every country, every year. The independence of region is a strong assumption because the leading hypothesis attributes the increase in myocardial infarctions to the disruption of the circadian rhythm, and those beyond their working age do not necessarily experience sleep loss on a posttransitional Monday. Therefore, we hypothesize that the effect is likely to be lower in countries where the average age of retirement is lower – a random-effects model could account for these differences. The independence of year is a weak assumption.

The model for the AMI count on a posttransitional Monday is described by the following graph – such a graph is called a Bayes network or a directed graphical model:

Loosely speaking, the arrows denote causal or logical dependencies, where the exact formula for the dependency is shown next to the nodes (in a canonical Bayes network, the formulas are described only in the text). The model can be translated into the following sentences:

The observed posttransitional AMI count on Monday follows a Poisson distribution.
The mean of the posttransitional AMI count on Monday is equal to the trend count on Monday, multiplied by the RR on Monday.
Monday’s RR is a random variable, meaning it has an associated prior belief distribution (which we define below).

Moving to a multi-day model

The reviewed literature performed hypothesis tests for every day of the posttransitional week – including weekends, sometimes noting a significant difference for Tuesday, but not Monday (Sandhu 2014). Such day-by-day tests of “statistical significance” need not concern themselves of consistency – in the everyday sense of the word –, i.e. that prior to observations we expect any effect to be highest on Monday and wear off as time progresses.

When performing a Bayesian analysis, we must have prior expectations on the expected parameter values – these prior beliefs are then changed according to the model and the observed data, resulting in the posterior distribution. In accordance with the literature, we assume that the effect is constrained to the posttransitional week, and that if there is an effect on Monday, there is some effect on Friday too. We expect no increase on Sunday, the day of the adjustment (after adjusting for the shorter day), because relatively few people wake up at the same time on Sundays (and sleep shorter as a consequence). On Tuesday, Wednesday, Thursday, Friday, we expect the relative increase to be 80%, 60%, 40%, 20% of the increase on Monday (see figure below) – this we call the “linear weekday model”. (This linear assumption will be weakened in a later analysis.) We denote the increase in on Monday with $\theta$ (the only parameter of the model), thus $r_\text{Mo} = 1 + \theta$ , $r_\text{Tu} = 1 + 0.8 \cdot \theta$ , …, $r_\text{Fr} = 1 + 0.2 \cdot \theta$ .

The infarction counts on neighboring days are conditionally independent given $\theta$ (apart from exceptional cases, such as a mass catastrophe), which means we can model the days separately and simply multiply their likelihoods. (Prior to observing the data, it feels very unlikely to us that there would be any effect on Friday, but one paper attempted to measure effects on the 2 and 4 weeks following DST adjustment, meaning they didn’t think such a long-lasting effect is completely implausible, therefore we consider including Friday as part of the expert opinion.)

This model of all weekdays is described by the following graph:

Here the rectangle means the nodes inside it should be repeated for $d = \text{Mo}..\text{Fr})$ — this rectangle is called a “plate”. A common parameter $\theta$ determines $r_d$ for a given day $d$ , which, together with $t_d$ , determines the number of expected AMIs ( $x_d$ ) and actual AMIs ( $y_d$ ).

Prior beliefs about RR (spring)

We would like to estimate the value of a continuous parameter $\theta$ , where the standard procedure is to conduct a one-sided t-test, with the null hypothesis defined as $\theta = 0$ .

Gelman et al. (2013) suggest beginning Bayesian data analysis with a noninformative or weakly informative prior – this avoid biasing the results to any particular value, and lets the posterior represent the data more closely.

I believe $\theta$ is likely to be approximately $0.0$ (i.e., $\text{RR} \approx 1$ , no effect), but it wouldn’t be very surprising if $\theta$ were positive. (I find it very unlikely, less than $\approx 0.1\%$ , that the decreases.) So I would like to place substantial probability mass close to 0.0, and spread the rest on values between $0.0$ and $1.0$ ( $P(\theta > 1.0) \lessapprox 0.1\%$ ).

We can formalize this description by placing 50-50% of the prior probability mass of either there being zero effect (a Gaussian distribution with standard deviation of 0.01), or there being an increase in counts, where the increase in has an Exponential( $\lambda=0.2^{-1}$ ) prior on it. (An Exponential( $\lambda=0.2^{-1}$ ) distribution has a mean of $0.2$ .) This distribution is plotted on the figure below.

Prior beliefs about the RR (autumn)

The counts on the autumn posttransitional week used the same model as the spring counts, but it assumed an (improper) uniform prior on $\theta$ . (This prior is improper because no distribution exists that is uniformly distributed on the whole linear number line. In practice we would get the same posterior if we assumed a Uniform(−2,+2) prior.)

Summary of assumptions

Every statistical test makes assumptions about the data, but in most reports using null hypothesis tests significance tests, these assumptions are never mentioned, instead they are implicit in the performed tests. Therefore, statistics is often sold as a sort of alchemy that transmutes randomness into certainty, an “uncertainty laundering” that begins with data and concludes with success as measured by statistical significance (Gelman 2016). I view it as a strength of Bayesian data analysis that these assumptions must be stated explicitly. To summarize this section, we make the following assumptions in this analysis:

Every region that use DST has the same RR in every year.
Any effect is limited to the posttransitional weekdays, and the effect is highest on Monday, 20% less on Tuesday, and so on until 0% on Saturday.
Our prior belief on the spring is split half-half between $1.0$ and all values greater than $1$ , with the probability decaying exponentially at a rate of $0.2^{-1}$ . We make no prior assumptions about the autumn RR.

Posterior calculations analytically

We performed our calculations for the fixed-effects model in spring analytically, using custom software written in Python. The result of these calculations was a 95% central credible interval, which is an interval of parameter values containing 95% of the posterior probability, with 2.5% on the negative and positive ends. This is not equal to the HDI when the distribution is skewed, but is usually a good approximation.

Posterior calculations with Monte Carlo methods

We also performed our posterior calculations with Monte Carlo methods using the open source statistical modeling software Stan. Models in Stan are written using its own description language (which comes with extensive documentation and a supportive community), and they need to be first compiled into binary form using an interface in R, Python, or other languages. Then, after providing the observable data to the model, Stan draws samples from the posterior distribution of the parameters, and calculates the 95% highest posterior density interval (HDI, a.k.a. HPD) – the interval that covers the most plausible parameter values. For most practical purposes, 1000 independent samples would be enough, but we drew 50,000 samples to accurately assess the equality to the analytic solution.

The code for the fixed-effects linear weekday Stan model is as follows:

data {
  int DAYS;            // Number of days
  int STUDIES;         // Number of studies
  real NORMAL_SIGMA;   // The standard deviation of the normal component of the prior
  real EXPON_BETA;     // The beta parameter of the exponential component of the prior

  // The observed AMI counts and the trend predictions, for each day of each study
  int<lower=0> ami_obs[STUDIES, DAYS];
  real<lower=0> ami_trend[STUDIES, DAYS];
}

parameters {
  // Monday RR - 1.
  // (We cannot model RR_Mon directly because cannot assign a
  //   common distribution for that.)
  // Its probabilistic value is assigned in the model block below.
  real rr_Mon_minus_1;
}

transformed parameters {
  // The RR for every day
  real rr_day[DAYS];
  // The posttransitional AMI counts for every day of every study.
  real ami_dst_mean[STUDIES, DAYS];

  // Specifying the RR for every day, using the linear weekday model.
  for (i in 1:DAYS) {
    rr_day[i] = (rr_Mon_minus_1 * (DAYS + 1 - i) / DAYS) + 1;
  }

  for (s in 1:STUDIES) {
    for (i in 1:DAYS) {
      ami_dst_mean[s][i] = ami_trend[s][i] * rr_day[i];
    }
  }
}

model {
  // Mixture models are specified using the construct below:
  // target += log_sum_exp(c1 * XXX_lpdf(x | p1), c2 * YYY_lpdf(x | p2));
  target += log_sum_exp(normal_lpdf(rr_Mon_minus_1 | 0, NORMAL_SIGMA),
                        exponential_lpdf(rr_Mon_minus_1 | EXPON_BETA));

  // Finally, the observations are drawn from a Poisson distribution.
  for (s in 1:STUDIES) {
    for (i in 1:DAYS) {
      ami_obs[s][i] ~ poisson(ami_dst_mean[s][i]);
    }
  }
}

The data and parameters blocks declare the observed quantities and the unobserved parameters, without specifying their distribution.

The transformed parameters block contains all quantities that can be deterministically derived from the parameters.

The model block describes both the prior distributions for the parameters and the likelihood functions.

Sampling using the Python interface

We can compile the Stan model and sample from it in Python using PyStan. Once the software and its dependencies are installed, we can use the following code to draw 50,000 samples from the posterior and plot the results. On my computer, the model compilation takes about a minute, the sampling a few seconds.

import pystan
import matplotlib.pyplot as plt

# 6-long list of 5-long lists integers (weekday observations)
all_obs = [[1735, 1644, 1555, 1522, 1467],  # Janszky and Ljung 2008
           [28, 28, 26, 23, 24],            # Jiddou et al. 2013
           ...
          ]

# 6-long list of 5-long lists floats
all_trend = ...

stan_data = {
    'STUDIES': 6,
    'DAYS': 5,
    'NORMAL_SIGMA': 0.01,
    'EXPON_BETA': 1/0.2,
    'ami_obs': all_obs,
    'ami_trend': all_trend
}

model = pystan.StanModel(model_file='dst_model.stan')

fit = model.sampling(data=stan_data, iter=50000)

print(pystan.stansummary())

plt.hist(fit['rr_day[1]']); plt.show()

Sampling using the R interface

The R interface of Stan is called RStan, and can be used as follows:

library("rstan") # observe startup messages

stan_data <- list(STUDIES = 6,
                  DAYS = 5,
                  NORMAL_SIGMA = 0.01,
                  EXPON_BETA = 1/0.2,
                  ami_obs = all_obs,
                  ami_trend = all_trend)

fit <- stan(file = 'dst_model.stan', data = stan_data, iter = 50000)

hist(extract(fit)$$rr_day[1]

Effect of study size in a Bayesian framework

For a small study, i.e. if the trend and observed AMI counts are low, we want to see a very slight change in the prior; for a large study, we want to see a bigger change.

Two factors should play into this. First, if the trend predicts low counts, then we are likely to observe relatively big fluctuations: observing 12 heart attacks on a day when the long-term average is 10 represents a +20% increase, yet it occurs once every 3 days on average. Second, if the study was small and the trend is estimated from only a few weeks’ data, our estimate of the trend itself has greater variance. This second factor is not yet modeled in our work, but in small studies like that of Čulić (2013), this too could play a role.

To see the difference between a small and a large study, we visualize the prior and the posterior for the following scenarios:

Observation higher than trend, small sample size (top left);
Observation equals trend, large sample size (top right);
Observation lower than trend, large sample size (bottom left);
Observation higher than trend, large sample size (bottom right).

When the sample size is small, there is only a slight change from prior to posterior. With a large sample size, the prior beliefs barely have an effect on the posterior. (In the lower right plot, the posterior peaks at more than 1.1 because with 1100 AMI every day, the linear weekday model fits better with a larger RR.)

Results

Relevant studies

The list of studies analyzed are identical to those analyzed in (Manfredini et al., 2019):

Janszky and Ljung (2008)
Janszky et al. (2012)
Čulić (2013)
Jiddou et al. (2013)
Sandhu et al. (2014)
Kirchberger et al. (2015)
Sipilä et al. (2016)

We excluded the study of Janszky et al. (2012), as the population is a strict subset of (Janszky and Ljung, 2008), with no additional information that is relevant for our analysis. The meta-analysis of Manfredini et al. (2019) did not exclude this study, which biased their results significantly, as the population size of this study is the second largest of all.

Key characteristics of the above studies can be found in the table below, with more details in the appendix.

Paper	Sun	Mon	Tue	Wed	Thu	Fri	Sat
(Janszky and Ljung, 2008)	(1374)	(1636)	(1494)	(1471)	(1484)	(1422)	(1370)
	1439	1735	1644	1555	1522	1467	1414
(Jiddou et al., 2013)	(13)	(29)	(20)	(23)	(17)	(25)	(16)
	23	28	28	26	23	24	18
(Čulić, 2013)	(6)	(7)	(6)	(7)	(6)	(6)	(5)
	5	14	6	9	6	5	8
(Kirchberger et al., 2015)	(70)	(70)	(70)	(70)	(70)	(70)	(70)
	66	85	83	76	77	85	60
(Sandhu et al., 2014)	(111)	(138)	(127)	(125)	(120)	(120)	(110)
	108	170	125	122	117	117	114
(Sipilä et al., 2016)	(208)	(269)	(243)	(259)	(227)	(227)	(198)
	201	229	253	254	262	242	179

(Spring AMI counts. Trend predictions in parentheses, under them the number of incidences on the posttransitional week. Total count on the posttransitional week: 14,024.)

Paper	Sun	Mon	Tue	Wed	Thu	Fri	Sat
(Janszky and Ljung, 2008)	(1780)	(2140)	(1991)	(1910)	(1941)	(1949)	(1781)
	1777	2038	1958	1895	1916	1977	1732
(Jiddou et al., 2013)	(18)	(24)	(21)	(27)	(22)	(24)	(20)
	11	34	25	19	20	18	30
(Kirchberger et al., 2015)	(67)	(67)	(67)	(67)	(67)	(67)	(67)
	60	57	77	73	77	84	60
(Sandhu et al., 2014)	(86)	(107)	(99)	(97)	(93)	(93)	(85)
	89	102	79	93	104	86	99
(Sipilä et al., 2016)	(159)	(197)	(193)	(170)	(201)	(178)	(157)
	160	214	180	198	199	172	153
(Čulić, 2013)	(6)	(7)	(6)	(7)	(6)	(6)	(5)
	7	9	12	6	12	5	4

(Autumn AMI counts. Trend predictions in parentheses, under them the number of incidences on the posttransitional week. Total count on the posttransitional week: 15,921.)

AMI risk after spring transition

The posteriors after the individual papers are shown below, along with their 95% central credible interval (CCrI).

The width of the 95% CCrI is a measure of the precision of the estimate. The 95% CCrI after (Janszky and Ljung, 2008) and (Sipilä et al. 2016) are comparably narrow, but they are centered around 1.085 and 1.001, respectively. In fact, as we can see from the likelihood functions (not shown here), the study of Sipilä et al. (2016) presents a case for a slight decrease in AMI risk under this model.

In the fixed effects model the posterior is weighted heavily towards the study with the largest sample size (Janszky and Ljung 2008), and the other studies barely play a role. Specifically, the posterior mean of the is 107.7% (95% central credible interval: $[104.7\%, 110.7\%]$ ) – the posterior is shown below. We emphasize again that the relative weights of the studies is not arbitrary, but is fully determined by the model and the data through the rules of probability theory.

We arrive at the same posterior when drawing samples from it through a Monte Carlo method with Stan. Furthermore, as the tails of posterior are symmetric, the 95% highest density interval of [104.8%, 110.7%] closely aligns with the 95% central credible interval obtained earlier ([104.7%, 110.7%]). (This fact merely verifies that the two methods compute the model correctly, it does not provide additional evidence about the quality of the data.)

The studies together provide so many data points that the choice of prior does not play an important role. Assuming a uniform prior on the risk ratio, i.e. assuming that we have no more prior evidence for +2% than for +20% or −30% change in risk, we arrive at practically the same posterior, and a 95% HDI of [104.7%, 110.7%].

Exponential weekday model

The exponential weekday model relaxes the assumption of linear decrease in throughout the week, and instead models the daily RRs as exponentially decreasing. That is, for a parameter $\alpha \in [0,1]$ , the risk ratios are determined as:

$r_\text{Mon} = 1 + \theta$ ,
$r_\text{Tue} = 1 + \alpha \cdot \theta$ ,
$r_\text{Wed} = 1 + \alpha^2 \cdot \theta$ ,
etc.

Assuming a uniform prior on both $\alpha$ and $\theta$ , the posterior for this model looks as follows:

I expected the posterior on $\alpha$ to be centered much closer to zero (meaning a rapid decrease in risk after Monday), but the posterior shows the opposite: most of the plausible values of $\alpha$ correspond to an $r_\text{Fri} / r_\text{Mon}$ ratio greater than the 0.2 ratio assumed previously ( $0.7^4 \approx 0.24$ ). The 95% for $\theta$ is [104.0%, 110.2%] (mean 107.1%), which is close to the linear weekday model, and the 95% for $\alpha$ is [0.66, 1.0] (mean 0.83). The figure below shows the risk ratios over the week for 20 of the sampled combinations of $(\alpha, \theta)$ .

Over the five weekdays this posterior corresponds to an average risk ratio of 105.0% (95% HDI: [103.1%, 107.0%]). Assuming an affected population of 1.6 billion globally, with AMI rates standard across the USA $^\textsf{[source]}$ , this means that over the whole posttransitional week the an additional 2700 people experience AMI (95% HDI: [1600, 3700]), on top of the regular 53,000 per week.

AMI risk after autumn transition

The posterior for the autumn data, using the linear weekday model with uniform prior on $\theta$ is shown below. The 95% of [95.1%, 100.3%] suggests a decrease in risk, but the hypothesis of “no change in risk” ( $\theta = 100.0\%$ ) is also compatible with the data.

Globally, this translates to a change of AMI counts over the whole week of −700 (95% HDI [−1600, +100]), from the original 53,000.

Visualizing the observations and the posterior predictive distribution

Posterior predictive distribution

In the figure below we visualize the posterior predictive distribution (for each day of each paper) on the spring posttransitional week, together with the actual observations.

These predictive distributions on $\tilde y$ can be calculated by integrating the likelihoods $P(\tilde y \given \theta)$ over the parameter space, weighted by the posterior probability of the parameter values $p(\theta \given \mathcal D)$ , using the following formula:

$P(\tilde y \given \mathcal D) = \int P(\tilde y \given \theta, \mathcal D) \,d\theta = \int P(\tilde y \given \theta) p(\theta \given \mathcal D) \,d\theta$

(Posterior predictive distribution for spring.)

Only the Monday observation of (Sipilä et al., 2016) falls out of the 95% central credible interval (CCrI), and in addition the Thursday observation of (Sipilä et al., 2016) and the Monday observation of (Čulić, 2013) falls out of the 90% CCrI (shown here), indicating a good fit of the model.

Further research

The importance of this issue depends on whether the increase in AMIs on the posttransitional week is merely a shift from the weeks afterwards. In other words, how many of these additional AMIs would have been asymptomatic, had it not been for the DST? We suspect that this number is quite low, because effectively the transition shifts the sleep schedule by an hour, which happens relatively often (e.g. when traveling), and single-day sleep deprivations are even more common. One way to approach this question is to collect the AMI counts in the few weeks following a DST transition, and compare the results obtained from regions with DST and regions without DST.

The main deficiency of this meta-analysis is the assumption of equal effects regardless of country, while using the fixed effects model. This assumption could be relaxed in a random effects model, although that would introduce a subjective choice of inter-country variance, making the results harder to interpret correctly and simpler to misinterpret.^{[fn-1] ↓}

As the absolute effect of DST transitions on AMI incidences is not substantial (given the low base rate), even on a global scale, I suggest no further research on this specific topic.^{[fn-2] ↓} There are many research areas around either sleep or cardiovascular health that are more important.

Conclusion

A standard argument against Bayesian methods is that the subjective choice of prior influences the results arbitrarily. Although this is a philosophical question, we believe meaningful and consistent probabilistic inference cannot be done without describing our initial beliefs and defining how different parameter values would result in different observations. However, in our case the likelihood of the observed data dominated the prior, rendering the choice of prior almost irrelevant.

Our analysis showed an increase in AMI risk during spring (relative risk increase 5–11% on Monday, less on later days), which translates to an additional 1600–3700 AMI incidences over the whole affected period. The data from the autumn transition showed either no change or a slight decrease in AMI risk (at most 5% relative risk decrease), translating to an estimated change in incidence counts somewhere between −1600 and +100. These figures alone do not provide an argument against the institution of DST, especially without evidence that these changes are not merely the result of future AMI incidences advanced (in spring) or postponed (in autumn), which is the default position. However, the analysis provides strong evidence for the hypothesis that our body can react negatively to a single hour shift in our sleep cycles, which should be a crucial factor in the evaluation of DST, and shows the importance of a consistent sleep schedule.

License

The effects of daylight savings time adjustment on the incidence rate of acute myocardial infarction: a Bayesian meta-analysis by Laszlo Treszkai is licensed under a Creative Commons Attribution 4.0 International License (CC-BY-4.0).

The data presented in the Relevant studies section belong to the original authors and they do not fall under the above CC-BY-4.0 license.

The software used for this analysis is distributed under the MIT license.

Please cite this work as follows:

Laszlo Treszkai. 2019. The effects of daylight savings time adjustment on the incidence rate of acute myocardial infarction: a Bayesian meta-analysis. http://treszkai.github.io/2019/11/11/dst-vs-ami

BibTeX:

@misc{,
  title = {The effects of daylight savings time adjustment on the incidence rate of acute myocardial infarction: a {B}ayesian meta-analysis},
  author = {Laszlo Treszkai},
  howpublished = {\url{http://treszkai.github.io/2019/11/11/dst-vs-ami}},
%  note = {Accessed: yyyy-mm-dd}  % Optional. The document at this URL is not going to change.
  year = {2019}
  month = {oct}
}

References

Cumming, G. (2014). The new statistics why and how. Psychological Science, 25(1), 7–29.

Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. link

Ronald L. Wasserstein, Nicole A. Lazar. 2016. The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician. Volume 70, Issue 2, pp. 129-133. link (OA)

Ronald L. Wasserstein, Allen L. Schirm & Nicole A. Lazar. 2019. Moving to a World Beyond “p < 0.05”. Volume 73, pp. 1–19. link (OA)

John K. Kruschke, Torrin M. Liddell, 2018. The Bayesian New Statistics. Psychonomic Bulletin & Review. Volume 25, Issue 1, pp 178–206. link (OA)

Amneet Sandhu, Milan Seth, Hitinder S. Gurm. 2014. Daylight savings time and myocardial infarction. Open Heart. link

Roberto Manfredini, Fabio Fabbian, Rosaria Cappadona, Alfredo De Giorgi, Francesca Bravi, Tiziano Carradori, Maria Elena Flacco, Lamberto Manzoli. 2019. Daylight Saving Time and Acute Myocardial Infarction: A Meta-Analysis. Journal of Clinical Medicine. 2019, 8, 404; link

Kirchberger et al. 2015. Are daylight saving time transitions associated with changes in myocardial infarction incidence? Results from the German MONICA/KORA Myocardial Infarction Registry. BMC Public Health. 2015; 15: 778. link

Janszky and Ljung. 2008. Shifts to and from Daylight Saving Time and Incidence of Myocardial Infarction. The New England Journal of Medicine. BMC Public Health. 359; 18. link

Viktor Čulić. 2013. Daylight saving time transitions and acute myocardial infarction. Chronobiology International. 2013; 30(5): 662–668. link

Janszky, Ahnve, Ljung, Mukamal, Gautam, Wallentin, Stenestrand. 2012. Daylight saving time shifts and incidence of acute myocardial infarction – Swedish Register of Information and Knowledge About Swedish Heart Intensive Care Admissions (RIKS-HIA). Sleep Medicine 13 (2012) 237–242. link

Monica R. Jiddou, MD, Mark Pica, BS, Judy Boura, MS, Lihua Qu, MS, and Barry A. Franklin, PhD. 2013. Incidence of Myocardial Infarction With Shifts to and From Daylight Savings Time. The American Journal of Cardiology. Volume 111, Issue 5, Pages 631–635. link

Jussi O.T. Sipilä, Päivi Rautava & Ville Kytö. 2016. Association of daylight saving time transitions with incidence and in-hospital mortality of myocardial infarction in Finland. Annals of Medicine, 48:1-2, 10-16. link

Young Joo Yang, Chang Seok Bang, Gwang Ho Baik, Tae Young Park, Suk Pyo Shin, Ki Tae Suk, Dong Joon Kim. 2017. Prokinetics for the treatment of functional dyspepsia: Bayesian network meta-analysis. BMC Gastroenterology 17:83 DOI 10.1186/s12876-017-0639-0. link (OA)

Xiaole Su, Xinfang Xie, Lijun Liu, Jicheng Lv, Fujian Song, Vlado Perkovic, Hong Zhang. 2017. Comparative Effectiveness of 12 Treatment Strategies for Preventing Contrast-Induced Acute Kidney Injury: A Systematic Review and Bayesian Network Meta-analysis Volume 69, Issue 1, pp. 69–77. DOI: 10.1053/j.ajkd.2016.07.033, link

Devin Incerti. 2015. Bayesian Meta-Analysis with R and Stan. Self-published, online. https://devinincerti.com/2015/10/31/bayesian-meta-analysis.html. Retrieved 4 Oct 2019.

Appendix

Characteristics of studies

Janszky and Ljung (2008)

Data:

source: the Swedish registry of acute myocardial infarction (“which provides high-quality information on all acute myocardial infarctions in the country since 1987”)
years: 1987–2006
observations: the incidence of AMI during each of the first 7 days after the spring or autumn transition
trend: the mean of the incidences on the corresponding weekdays 2 weeks before and 2 weeks after the day of interest
total AMI cases on spring posttransitional week: 10,776

Quotes:

The effects of transitions were consistently more pronounced for people under 65 years of age than for those 65 years of age or older.

The authors properly controlled for the Easter holiday.

Analyses of the data for the spring shift are based on the 15 years between 1987 and 2006 in which Easter Sunday was not the transition day. […] For years in which Easter Sunday was celebrated 2 weeks after the Sunday of the spring shift, we defined the control period for the Sunday of the shift as the Sunday 3 weeks before and the Sunday 3 weeks after (thus skipping Easter Sunday).

Overanalysis:

The following observations do not have any plausible explanation, and are probably just noise. Question: did later studies confirm these findings?

When we did not exclude Easter if it coincided with the exposure or control days, we observed an even higher effect size associated with the spring transition.

For the autumn shift, in contrast to the analyses of all acute myocardial infarctions, analyses restricted to fatal cases showed a smaller decrease in the incidence of acute myocardial infarction on Monday, and the risk of fatal acute myocardial infarction increased during the first week after the shift.

The effect of the spring transition to daylight saving time on the incidence of acute myocardial infarction was somewhat more pronounced in women than in men, and the autumn effect was more pronounced in men than in women.

Additional information:

The authors were employed by institutions in Stockholm, Sweden, meaning the use of the Swedish registry is no evidence for selection bias. Furthermore, the end of the 30-year period of their study is only a year away from the date of the publication.

Janszky et al. (2012)

Data:

those AMI patients who were admitted to CCUs at participating hospitals
from 1995 to 2007
dataset: Register of Information and Knowledge about Swedish Heart Intensive Care Admissions (RIKS-HIA)
total AMI cases during spring posttransitional week: 3235.9

This study didn’t publish per-day AMI counts, only the total during the whole posttransitional week.

The time period matches exactly that of Janszky and Ljung (2008), and every case included in this study was also included in Janszky and Ljung (2008). As such, this study doesn’t add new information to the previous work with regards to the variables we consider, and it is excluded from our meta-analysis in order to avoid double-counting.

As the authors put it:

The study populations of the present and our previous study overlapped substantially. Our previous analyses included all AMIs detected either at a hospital or at an autopsy in Sweden from 1987 to 2006, a clear strength. In the present work, we investigated only those AMI patients who were admitted to CCUs at participating hospitals from 1995 to 2007. Although this limited our power substantially, it allowed us to examine clinical factors that might modify the risks related to DST transitions.

Čulić (2013)

Data:

patients hospitalized because of AMI
from 1990 to 1996
40 patients on workdays following DST change
at University Hospital Centre Split in Split, Croatia

It is unclear whether the trend prediction is made from the 2 weeks before and after the posttransitional week, or from all 50 nontransitional weeks:

The incidence ratios of AMI for the first week after the two DST shifts (posttransitional weeks) and each day of that week were estimated by dividing the incidence during those periods with the average incidences during corresponding days and weeks throughout the year: 2 wks before and 2 wks after the posttransitional week, and the 50 nontransitional weeks of the year altogether.

It is unclear why exactly the data from 1990 to 1996 was analyzed, if the study was conducted in 2013. This is suggestive of selection bias.

Overanalysis:

23 additional variables were analyzed (sex, employment status, use of β-blocker, etc.); some were bound to have low p-values:

The independent predictors for AMI during this period in spring were male sex (p = 0.03) and nonengagement in physical activity (p = 0.02) and there was a trend for the lower risk of incident among those taking calcium antagonists (p = 0.07). In autumn, the predictors were female sex (p = 0.04), current employment (p = 0.006), not taking b-blocker (p = 0.03), and nonengagement in physical activity (p = 0.02).

Jiddou et al. (2013)

Data:

a retrospective electronic chart review
all patients presenting to the emergency centers at Beaumont Hospitals in Royal Oak and Troy, Michigan, with the primary diagnosis of AMI
age: patients who were aged >18 years, resulting in 70±15 years
exclusion conditions: minor, pregnant
from October 2006 to April 2012 (7 years)
trend: patients admitted with comparable diagnoses on the corresponding weekdays 2 weeks before and 2 weeks after the shifts to and from DST
additional variables: demographic data, medical history, tobacco use, prescribed medications, whether the patient underwent cardiac catheterization; diagnosis of hypertension, hyperlipidemia, and coronary artery disease.

Quotes:

2 AMIs occurred on Easter Sunday and were considered potential confounders and excluded.

It is correct to note the incidences on Easter Sunday, but even more important would be the incidences on Easter Monday. But even then, is only correct to exclude the patients entirely if the relevant control incidences are also reduced – it is unclear whether this trend correction happened.

Sandhu et al. (2014)

Data:

Time: 1 January 2010 – 15 September 2013 (3 fall and 4 spring DST changes; 1354 days)
Procedural data for hospital admissions where PCI was performed in the setting of AMI
Number of cases: 42,060 hospital admissions for AMI requiring PCI occurred during the study period.
The median daily AMI total was 31, ranging from a minimum of 14 to a maximum of 53 admissions.

Results:

There was no difference in the total weekly number of PCIs performed for AMI for either the fall or spring time changes in the time period analysed. After adjustment for trend and seasonal effects, the Monday following spring time changes was associated with a 24% increase in daily AMI counts (p=0.011), and the Tuesday following fall changes was conversely associated with a 21% reduction (p=0.044). No other weekdays in the weeks following DST changes demonstrated significant associations.

Analysis:

I was unable to obtain the data at Blue Cross Blue Shield of Michigan and the study did not include the number of AMI cases numerically, therefore I estimated it from the chart in Figure 3 (which was accurate to 0.4 AMI).

Kirchberger et al. (2015)

Data:

AMI count: 25,499 cases of AMI
data source: MONICA/KORA Myocardial Infarction Registry (link; public data should be published yearly according to this website, but I did not find a link to download the dataset)
time period: 1 January 1985 and 31 October 2010 (26 spring and 25 fall DST changes – 2010 fall adjustment was on 31 October)
ages: 25–74
includes: coronary death and AMI
location: city of Augsburg (Germany) and the two adjacent counties (about 600,000 inhabitants)
additional variables: information on re-infarction, various medication prior to AMI, current occupation, history of hypertension, hyperlipidemia, diabetes, smoking, and obesity.
confounders accounted for: global time trend, temperature, relative humidity, barometric pressure, and indicators for month of the year, weekday and holiday

Quotes:

The final model included the following covariates: time trend and previous two day mean relative humidity as regression splines with four and two degrees of freedom, respectively, previous two day mean temperature as a linear term and day of the week as categorical variable.

The optimized spring model [of the data from March and April, excluding the week in question] included time trend and same day mean relative humidity as regression splines with six and three degrees of freedom.

Six d.o.f. for 2 months is probably overfitting the data, even though it was the sum of 26 years. However, it shouldn’t make a predictible effect, and its overall effect is probably negligible.

The incidence rate ratio was assessed as observed over expected events per day and the mean per weekday and corresponding 95% confidence intervals were calculated.

However, it is not stated how the confidence intervals were calculated: most importantly, which statistical test was used?

Analysis:

The paper stated only the calculated RRs for the spring and autumn prediction models (for all seven days), not the actual counts. Assuming the researchers analyzed the data in an honest manner (i.e. not picking model parameters for lower trend prediction and thus more significant observed increase), and that the model didn’t predict large deviations from the 2.7 /day average, we can calculate a close approximation of the observations as $\mathrm{RR}_d \cdot \mathrm{trend}$ .

Sipilä et al. (2016)

Data:

years: 2001–2009, except 2002 and 2005 (due to Easter). 7 years.
Exclusion criterion: age < 18.
Age: mean age 71.2, SD 12.6 years
2 weeks prior and 3 weeks after DST transition
all 22 Finnish hospitals with coronary catheterization laboratory that treat emergency cardiac patients
database: Finnish Care Register for Health Care (CRHC), a nationwide, obligatory and automatically collected hospital discharge database.
Study group: posttransitional week
Control group: 2 weeks before/after posttransitional week
Easter in study group: 2002, 2005. “Years with DST spring transition on Easter Sunday were excluded from the analysis (2002 and 2005) to increase international comparability and avoid confounding”
Easter in control group: “When Easter Sunday was celebrated within 2 weeks after DST transition, post-DST control weeks after Easter were selected.”
Spring study+control group size: 1269+5029 = 6298
Standardized incidence of MI admissions in participating hospitals during spring study period was 259/100,000 person-years.

Quotes:

Incidence of MI admissions was similar to control weeks for Sunday–Tuesday after DST transition (Figure 1). However, on fourth day after transition (Wednesday), there was a significant increase in MI incidence compared to control weeks (IR 1.16; CI 1.01– 1.34).

Is there anything special about the Wednesday that follows a DST transition? One should not be surprised if a value falls outside of a 95% confidence/credible interval – after all, it happens at least 5% of the time even in the absence of any “interesting” effect.

Patients admitted during the week after DST transition were less likely to have diagnosed diabetes or ventricular arrhythmias compared to patients admitted during control weeks, but had diagnosed renal failure more often.

There is no simple and plausible explanation for this, therefore it is more probable that this is a result of finding patterns in noise.

Population-based incidence of MI admissions to participating hospitals during spring and autumn periods were calculated using corresponding population data of mainland Finland obtained from Statistics Finland and standardized to European standard population 2013 by using the direct method.

The meaning of the above statement is unclear.

Footnotes

Footnote 1

“Sleep researchers show a 20% increase in risk of heart attacks in Michigan but a 10% decrease in Finland, so it is advised to travel to Europe for this week.”

[back to source] ↑

Footnote 2

Originally, I wrote the following:

Further research could analyze the publication bias (if you know how to do that in a Bayesian framework, please mention it in the comments below), or analyze more data, preferably from multiple countries. Maybe the DST transition has a smaller effect on the Finnish population than on the Swedish population, which could easily be analyzed using Bayesian statistics.

But then I calculated the absolute global effect, which is quite small, therefore the updated recommendation.

[back to source] ↑

Trust in numbers — notes of a talk given by Sir David Spiegelhalter

2019-10-08T00:00:00+02:00

The Institute of Medical Statistics of the Center for Medical Statistics, Informatics and Intelligent Systems at the Medical University of Vienna just turned 50 years old, and they organized a two-day event around it. I was fortunate to have attended the keynote talk of Sir David Spiegelhalter (wiki), who is a British statistician and Winton Professor of the Public Understanding of Risk at the Faculty of Mathematics, University of Cambridge, which was one of the most entertaining and informative talk I have heard. There is no way I can do justice to the talk, and I wouldn’t even attempt to bring through the humor (his humour) – the goal of this post is to increase your vigilance a little bit when it comes to any reports about science, and to shed light on the work of Spiegelhalter.

The professor has authored several academic books on statistics, and was interviewed by the CNN with the title, Why statistics should make you suspicious. And keeps doing a huge service to science in a number of other ways.

The problem explained in the talk was that numbers are used to persuade people, not to inform them. (Actually, that was only the first half – the second half offered a handful of steps we could take when presenting our data.) Take for example politics, and the campaign around Brexit. Even if it were true that it costs £350 million a week for the UK to be a member of the EU, it would be much less misleading if it said that it costs 80 pence per person per day to be a member of the EU. The cost of a bag of potato chips. (The other side committed similar errors too – I’m not trying to win a battle here.)

As Eliezer Yudkowsky says, politics is the mind-killer, but of course, using numbers to mislead instead of to show an honest representation of reality is done everywhere where there are numbers. My favorite topic these days: medical statistics. I’m picking a topic from the talk as an example (which Spiegelhalter analyzed in more detail in a Medium post): dietary advice about processed meat consumption. CNN did a great job with picking the title of their article to be as close to the original conclusions as possible: Eating just one slice of bacon a day linked to higher risk of colorectal cancer, says study. But by the time this study reaches The Sun, it gets reported as the following:

Boy, that escalated quickly. And what does “higher risk of colorectal cancer” mean anyhow? In this case, the study showed a 19% increase. As Peter Attia explains in his detailed post series on science, Studying Studies, such big numbers generally mean an increase in relative risk, not in absolute risk. Relative risk is meaningless without knowing the base rate of the disease. In this case, 5% of US men and women born today are expected to be diagnosed with colorectal cancer sometime during their lives. Add 19% to that 5% figure (i.e., multiply it by 1.19), and you get 6%, for the people who eat 1 slice of bacon a day. (The 5% figure is surprisingly high, by the way! Fortunately, it has a five-year survival rate of 65%. I don’t know how much of the 5% is a false positive; I guess it doesn’t include the disconfirmed cases. These figures I just gathered from Wikipedia, FWIW.)

You can take the extra step and visualize these numbers using what Gigerenzer calls natural frequencies. As one Wikipedia author puts it, “the problem is not simply in the human mind, but in the representation of the information”, so let’s deliver using things we evolved to understand: a small tribe of human-like icons.

All of these people below eat a clean diet without processed meat, and those with a distraught face will get colorectal cancer:

😎😎😎😎😎😎😫😎😎😎
😎😎😎😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😎😎😎
😎😎😎😫😎😎😎😎😎😎
😎😎😫😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😎😎😫
😎😎😎😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😫😎😎

And all of these people eat a slice of Extrawurst daily:

😎😎😎😎😎😎😫😎😎😎
😎😎😎😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😎😎😎
😎😎😎😫😎😎😎😎😎😎
😎😎😫😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😎😎😫
😎😎😎😎😎😎😎😎😎😎
😫😎😎😎😎😎😎😎😎😎
😎😎😎😎😎😎😎😫😎😎

See the difference? It’s that one troubled guy in row 9.

Now, I’m not saying bacon is good for health, or that that additional risk factor would be negligible (admittedly, my mocking tone above suggests otherwise). But if the scientists, journalists, and clinicians report the risk honestly, and no-one is trying to influence you into eating more burgers by playing at our primal instincts (including the marketing division of McDonald’s and our social group who calls you chicken if you don’t eat your black pudding), then us puny humans could make more educated decisions about which sacrifices we are willing to make.

This post was just a tiny part of what was said at the talk. In parting, I have two takeaway quotes. First,

80% of statistics are false.

(From anonymous statistician, a comedian, and also Elon Musk.) Unfortunately, this factoid alone doesn’t enable one to navigate reality.

The second quote is of a little more value, but still doesn’t help one to sieve through statistics:

There’s no point in being trustworthy if you’re boring.

(From Spiegelhalter in today’s talk.)

This talk was anything but boring. If you have a chance to see Spiegelhalter in person, do so: he gets my highest grade recommendation. (He also has a book, titled The Art of Statistics, which I haven’t read.)

(Somewhat related: just today on my way home I learned of Edward Tufte’s book, The Visual Display of Quantitative Information, which also looks amazing.)

On the overconfidence of modern neural networks

2019-09-26T00:00:00+02:00

On the overconfidence of modern neural networks. This is the title of the coursework I did with a fellow student at the University of Edinburgh. (PDF: Part 1, Part 2.)

Our topic was influenced by a previous study, titled On Calibration of Modern Neural Networks (Guo, Pleiss, Sun, & Weinberger, 2017).

Applications of uncertainty estimation include threshold-based outlier detection, active learning, uncertainty-driven exploration of reinforcement learning, or certain safety-critical applications.

What is uncertainty?

No computer vision system is perfect, so an image classification algorithm sometimes identifies people as not-people, or not-people as people. While we usually care about the class with the highest output (the “most likely” class), we can treat the softmax outputs of a classifier as uncertainty estimates. (After all, that is how we trained a model when treating the softmax outputs of a classifier as a probability distribution, and minimizing the negative log likelihood of the model given the data.) For example, out of 1000 classifications made with an output of 0.8, approximately 800 should be correct if the system is well-calibrated.

(Example output of a YOLO object detection network, with the probability estimates. Image source: Analytics Vidhya.)

Ideally, we want our system to be 100% correct, but we rarely have access to an all-knowing Oracle. In cases where it is hard to distinguish between two categories (like on the cat-dog below) we want the uncertainties to be well-calibrated, so that predictions are neither overly confident nor insufficiently confident.

(Image source: Google Brain)

Our results

Interim report

Link to report (PDF)

Our initial experiments showed that our baseline model is already well-calibrated when trained on the EMNIST By-Class dataset. Calibration worsened when we used only a subset of the training set. We found that increasing regularization increases calibration, but too much regularization leads to a decrease in both accuracy and calibration. (See figure below.) This contradicts the findings of (Guo, Pleiss, Sun, & Weinberger, 2017, sec. 3), who found that model calibration can improve by increasing the weight decay constant, well after the model achieves minimum classification accuracy. One of our main findings is that cross-entropy error is not a good indicator of model calibration.

(ECE: expected calibration error. The lower the better.)

Final report

Link to report (PDF)

We replicate the findings of (Guo, Pleiss, Sun, & Weinberger, 2017)£ that deep neural networks achieve higher accuracy but worse calibration than shallow nets, and compare different approaches for improving the calibration of neural networks (see figure below). As the baseline approach, we consider the calibration of the softmax outputs from a single network; this is compared to deep ensembles, MC dropout, and concrete dropout. Through experiments on the CIFAR-100 data set, we find that a large neural network can be significantly over-confident about its predictions. We show on a classification problem that an ensemble of deep networks has better classification accuracy and calibration compared to a single network, and that MC dropout and concrete dropout significantly improve the calibration of a large network.

(Top row: confidence plots for a deep neural net. The more skewed to the right, the better. Bottom row: corresponding calibration plots. The more close to the diagonal, the better.)

Things I would do differently

With a little more experience behind my back now, I would make the following changes in experiment design and writing the report:

Use a validation set. We only used a training set because we trained for minimum error, and we expected calibration to be independent from accuracy, but that is a strong assumption (and likely incorrect, seeing our results in the interim report).
Use better biblography sources. Instead of Google Scholar, I would search DBLP, where the information is more correct and consistent.
Use pastel colors. I let my collaborator have it his way, but ever since this submission I’m having nightmares in purple and glowing green :D

In future work, I would like to test the calibration of a Bayesian neural network, where the weights of the network have a probability distribution instead of a point estimate.

References

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q.(2017). On Calibration of Modern Neural Networks. In D. Precup & Y. W. Teh (Eds.), Proceedings of the 34th International Conference on Machine Learning (Vol. 70, pp. 1321–1330). International Convention Centre, Sydney, Australia: PMLR. Retrieved from http://proceedings.mlr.press/v70/guo17a.html

Paper summary: Abbeel, Ng: Inverse Reinforcement Learning (2004)

2019-08-19T00:00:00+02:00

This post is a summary of the seminal paper on inverse reinforcement learning: Pieter Abbeel, Andrew Y. Ng: Apprenticeship Learning via Inverse Reinforcement Learning (2004) [link].

Traditional reinforcement learning (RL) starts with specifying a reward function, and during training we search for policies that maximize this reward function¹. In contrast, inverse reinforcement learning (IRL) starts with expert demonstrations of the desired behavior, infers a reward function that the expert likely followed, and trains a policy to maximize that.

IRL is useful for learning complex tasks where it is hard to manually specify a reward function that makes desirable trade-offs between desiderata; such tasks include driving a car or teaching a robot to do a backflip, where we want the car to reach to the destination promptly but also safely, or the robot to flip with its arms straight and sticking the landing.

In contrast with previous attempts at apprenticeship learning (i.e. learning from an expert), which tried to mimic the expert demonstrations directly, assumes that the expert follows a reward function that is a linear combination of the feature vectors ( $R = w^T φ(s)$ ), and finds a reward function that maximizes the received reward under the set of demonstrations. The hand-specified function $φ: S→ℝ^k$ maps a state of the Markov decision process (MDP) to a feature vector, which vector includes parameters for the different desiderata of the task, such as the distances to objects surrounding the car, the speed of the car, or the current lane.

assumes knowledge of an expert policy $π_E$ , or at least samples from it. Using these, we only care about the estimated “accumulated feature values”, $μ(π_E) ∈ ℝ^k$ , which is the expected discounted sum of the feature vectors if sampled from the policy, because then the value of a policy (parametrised by $w$ ) can be calculated from it directly: $R = w^T μ(π_E)$ .

The goal is then to find a policy whose performance is close to that of the expert’s on the unknown reward function $R_{\star} = w^T_{\star} φ$ . This is done by finding a policy whose feature vector is close to the expert’s feature vector, which assures that the value of these policies is close too.

The algorithm for IRL is the following:

Pick a random initial policy, and calculate its $μ$ .
Find the vector of weights w that lies within the unit ball and maximizes the difference between the expert feature expectations and the feature expectations of our best policy thus far.
If this maximum is small, then go to step 7.
Otherwise $w$ is our new weights for $R$ .
Calculate optimal policy for this $R$ .
Repeat from step 2.
Let the agent designer pick a policy from any of those found in step 5 in the different iterations; or find the policy in the convex closure of these policies that is closest to the expert policy.

The maximization in step 2 allows us to find a policy that is close to the expert’s, regardless of the choice of a reward function. After all, we are interested in the policy, not the reward function, and so the estimated $R$ is not necessarily correct.

This algorithm is proved to terminate within $O(k \log(k))$ steps, using at least $O(k \log(k))$ number of samples from the expert policy.

Experiments are done in a gridworld environment, where IRL learns the expert policy in approximately 100 times less sample trajectories than simply mimicking the expert. Another experiment is a car driving simulator with 3 lanes viewed from the top, where IRL is capable of learning multiple driving styles, such as “prefer the right lane but avoid collisions”. Video demonstrations of the latter show that the sentiment of the expert policy is indeed followed, although sometimes with unnecessary lane switches (most modern RL algorithms also exhibit this undesired property).

Or, more accurately, a policy that maximizes the expected utility derived from this reward function and some method of temporal discounting. ↩

Sampling from the posterior with Markov-chain Monte Carlo

2019-08-06T00:00:00+02:00

John K. Kruschke’s book, titled Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2nd ed.) (Amazon, official site), gives a very quick and practical introduction to Bayesian analysis. Compared to BDA3, it contains less proofs, but also less jargon; more explanations that are informal, and more introductions to the basics. As such, I would recommend it to someone who hasn’t had much of an exposure to statistics yet, or is not a mathematician nor a programmer.

The book includes thorough and nicely visualized descriptions of multiple Markov-chain Monte Carlo methods for sampling from a posterior distribution, of which I’ll try to summarize the most basic one in this post.

Goal of sampling

Given the prior $p(θ)$ and the likelihood $p(\D\given θ)$ , we want samples from the posterior $p(θ\given \D)$ . In the following sections I’ll use the fact that the unnormalized posterior is equal to the prior multiplied with the likelihood: $p(θ, \D) = p(θ)\,p(\D \given θ)$ . Here, I’ll talk only about continuous probability spaces; discrete spaces can be sampled similarly.

Metropolis algorithm

Just like the other methods, the Metropolis algorithm starts with a seed value for $θ$ – let’s call it $θ_0$ . (I assume in practice $θ_0$ is sampled from the prior.) Then, once you have a seed value $θ_i$ , repeat the following two steps for a prespecified number of iterations, or until an effective sample size is achieved.

Sample $θ'_{i+1}$ from a proposal distribution around $\theta_i$ , which could be a Gaussian: $\theta'_{i+1} \sim \N (θ_i, Σ)$ .

If $p(θ_{i},\D) \le p(θ'_{i+1},\D)$ – i.e. if $p(θ_{i} \given \D) \le p(θ'_{i+1} \given \D)$ – then accept the proposed parameter value: $θ_{i+1} := θ'_{i+1}$ .
Otherwise, the probability of accepting the proposed parameter is the ratio of the posterior at the proposed value and at the current value; otherwise, reject it:

$\begin{gathered} p = \frac{p(θ'_{i+1}, \D)}{p(θ_{i}, \D)} = \frac{p(θ'_{i+1} \given \D)}{p(θ_{i} \given \D)}, \\ b \sim Bernoulli(p), \\ θ_{i+1} = \begin{cases} θ_{i+1}' & \text{if } b=1,\\ θ_i & \text{if } b=0. \end{cases} \end{gathered}$

It can be proven that after a so-called “burn-in” period, the probability of any $θ_{n}$ value will be the posterior probability: $θ_n \sim p(\theta_n\given \D)$ if $n \gg 1$ , therefore if you do the procedure long enough, you’ll end up with many samples from the posterior. Note that the effective sample size will be much lower than $N$ , because neighboring samples are strongly correlated, so we have to drop most of the $θ_i$ values so obtained.

The beauty of this algorithm is that during this whole procedure, we only need to be able to compute the unnormalized posterior – so the algorithm can be easily used for sampling using the prior and the likelihood, even when the model is specified up to a multiplicative constant (as in an undirected graphical model).

This algorithm doesn’t easily escape a “probability island” – i.e. a region that is surrounded with a wide region of probability 0. (Although if the proposal distribution is wide enough, then the algorithm is theoretically able to make that jump eventually, which maybe in practice “approximately never”.)

One downside of this basic algorithm is that the proposal distribution needs to be fine-tuned for the individual application: differences in effective sample size can be orders of magnitudes, even for a simple $\text{Beta}(14,20)$ distribution (i.e. a 1-dimensional unimodal distribution with finite support).

Another downside is that in multiple dimensions this random walk is quite inefficient, and even more dependent on a correct choice of the covariance matrix $Σ$ – but apart from the obvious reason that “high-dimensional spaces are big”, I couldn’t tell why.

The well-known Metropolis–Hastings algorithm, Gibbs sampling and Hamiltonian Monte Carlo are different twists on this core idea, and they are also described in the book.

Allegedly, credit for this method is due more to Marshall and Arianna Rosenbluth – if there is agreement on that, we could rename it to Rosenbluthsian Monte Carlo.

For more information…

If you want to learn about sampling, or Bayesian data analysis, consider reading the book, it’s a great read from what I’ve read so far.

Stay tuned for more of Bayes, or Curry, or Euler, or McCarthy.

Bayesian inference: Approaching certainty through sampling

2019-07-24T00:00:00+02:00

Bayesian Data Analysis from Gelman et al. (2013), in section 3.7, presents the statistical analysis of a bioassay experiment. The parameters of the model are $(\alpha, \beta)$ , and we draw samples from the numerically calculated posterior. Then the authors write:

All of the 1000 simulation draws had positive values of $\beta$ , so the posterior probability that $\beta > 0$ is roughly estimated to exceed 0.999.

I thought this 0.999 figure is an overestimate; I analyze this question in this post.

Analysis

The event “ $\beta > 0$ ” is a Bernoulli-distributed random variable; let’s denote it with $x \sim \text{Bernoulli}(\theta)$ . If we draw $S$ samples from $x$ (and denote the results with $x_i$ ), the conditional probability distribution of $p(\theta \given \{x_i\})$ is described by the following directed graphical model:

The node for $x_i$ is filled because it’s observed, and the plate represents $S$ copies of this node (with $i$ ranging from $1$ to $S$ ).

If $n_1$ (resp. $n_0$ ) denote the number of samples where $x_i$ is true (resp. false), the likelihood is described by:

[p({x_i} \given \theta) = \text{Binomial}(n_1 \given n = S, p = \theta).]

We can assume a noninformative uniform prior on the probability $\theta$ on the unit interval. A Beta prior is conjugate to the Bernoulli likelihood, and $p(\theta) = \text{Beta}(\theta \given \alpha_0 = 1, \beta_0 = 1) = \text{Uniform}(\theta \given a = 0, b = 1)$ , and this results in the following posterior:

[p(\theta \given {x_i}) = \text{Beta}(\theta \given \alpha_0 + n_0, \beta_0 + n_1).]

With $n_1 = 1000$ and $n_0 = 0$ , this amounts to a $\text{Beta}(1001, 1)$ distribution, whose pdf is as such:

As expected, most of the probability mass is close to 1.0. But that graph is not very legible, so let’s zoom in on the right end of the x axis:

The red line marks the mean of the distribution, which is approximately $0.999$ , but not nearly all of the probability mass is on the right side of $0.999$ . Using the cdf of the posterior, we have that

[P(\theta > 0.999) = 0.63,]

meaning there’s still a 1 in 3 chance that the posterior probability that $\beta > 0$ does not exceed $0.999$ . To be fair, $0.999$ is still good for a “rough estimate”, unless one has a strong prior for $\beta < 0$ . (Given the nature of the experiment and the meaning of the parameter $\beta$ — the toxicity of a compound —, a flat prior on “ $\beta > 0$ ” is reasonable.)

Presidential elections

A similar statement was made for 1988 pre-election polls, on page 70:

All of the 1000 simulations $\theta_1 > \theta_2$ ; thus, the estimated posterior probability that Bush had more support than Dukakis in the survey population is over 99.9%.

When a presidential election is won “by a landslide”, that rarely means more than a 60-40% results; so in this case, I would rather use a prior that puts more mass on results close to 50-50%, for example $\text{Beta}(10,10)$ :

This results in the following posterior:

So in this case, the crude estimate does does not suffice, and we should rather be only 98% certain. (This is a 20-fold difference, $(1-.98)/(1-0.999)$ , and a well-calibrated superforecaster could tell them apart.) If the stakes are high, then refine your model, and draw more samples.

Conclusion

The meaning of 1000 true + 0 false simulations depends on your prior beliefs: the posterior mean could be 0.999 (with a uniform prior), or anything less than 0.99 (with a prior weighted more towards the center or zero).

I love BDA3; I’m nowhere near finished, but even the first chapters have taught me new ideas and proofs (e.g. the Bayesian cookbook in section 3.8, or modeling normal data with unknown mean and variance). The examples and exercises are a great combination of applications and theory. As you can see from this post, all I can do is nitpick some tiny details. A quick intro to practical Bayesian modeling is a presentation from Andrew Gelman.

Did you like this post, did I make a mistake, or do you know a BDA3 discussion group? Let me know in the comments below!

Code

import math
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats

%matplotlib inline
set_matplotlib_formats('svg')

posterior = st.beta(1001,1)

def plot_beta(xs, rv, alpha=1001, beta=1, **plot_kwargs):
    plt.plot(xs, rv.pdf(xs), label='pdf', **plot_kwargs)
    plt.gca().set_xlim(xs[[0,-1]])
    plt.gca().set_ylim(0)
    plt.xlabel("θ")
    plt.ylabel("Probability density function")
    plt.title(f"Pdf of Beta(θ | α = {alpha}, β = {beta})")
    plt.grid(True, alpha=0.5)

plot_beta(np.linspace(0,1,1000), posterior)
plt.gca().set_ylim(-10)
plt.gca().set_xlim(-0.01, 1.01)
plt.savefig("beta-1000-pdf-big.svg")
plt.show()

def ramanujan(n):
    """Series that converges to 1/π at an exponential rate,
    by Srinivasa Ramanujan"""
    return 8**.5 / 9801 * sum(math.factorial(4*k)
                               / math.factorial(k)**4
                               / 396**(4*k)
                               * (1103 + 26390*k)
                              for k in range(0, n))

for i in range(1,4):
    print(f"1/ramanujan({i}) - π ≈ {1/ramanujan(i) - math.pi:.2e}")

# Easter egg. Thanks for reading!

1/ramanujan(1) - π ≈ 7.64e-08

1/ramanujan(2) - π ≈ 4.44e-16

1/ramanujan(3) - π ≈ 0.00e+00

plot_beta(np.linspace(0.990,1.0,1000), posterior)
plt.vlines(posterior.mean(), 0, plt.gca().get_ylim()[1], color='r', lw=1, label='mean')
plt.gca().legend(loc='upper left')
plt.savefig("beta-1000-pdf-zoomed.svg")
plt.show()

posterior.mean()

0.999001996007984

print('P(θ > 0.999) = {:d}%'.format(int(100*(1-posterior.cdf(0.999)))))

P(θ > 0.999) = 63%

print('P(θ > 0.998) = {:d}%'.format(int(100*(1-posterior.cdf(0.998)))))

P(θ > 0.998) = 86%

References

Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2013. Bayesian Data Analysis: Third Edition. Official webpage

Evaluation of function calls in Haskell

2019-07-13T00:00:00+02:00

(This post is discussed in episode 15 of the Haskell Weekly Podcast.)

Chapter 27 of Haskell Programming from first principles (by Christopher Allen and Julie Moronuki) is about the evaluation system of Haskell, with a focus on non-strictness. In the section Preventing sharing on purpose, they write you want to prevent sharing the result of a function call when it would mean storing some big data just to calculate a small result. Two examples are provided to demonstrate the alternatives. In the first, the result of g _ is not shared but calculated twice:

Prelude> f x = (x 3) + (x 10)
Prelude> g' = \_ -> trace "hi g'" 2
Prelude> f g'
hi g'
hi g'
4

In the second, the result of g _ is shared, i.e. calculated only once and the result is stored:

Prelude> g = const (trace "hi g" 2)
Prelude> f g
hi g
4

(Edited to add:) In practice, sharing is usually achieved with a let expression or a where construct.

(Note that this latter is called a “point-free” definition.)

The authors conclude that

functions aren’t shared when there are named arguments but are when the arguments are elided, as in pointfree. So, one way to prevent sharing is adding named arguments.

(Quoted from version 1.0RC4 of the book.)

In this post I analyze the runtime differences between point-free and pointful definitions.

Behind the scenes

As Tom Ellis describes, the definitions of g and f translate to the following (in a close approximation to the “Core” language used during compilation):

f = \x -> let {x3 = x 3; x10 = x 10} in (+) x3 x10
g = let {tg = trace "hi g" 2} in \y -> const tg y
g' = \_ -> trace "hi g'" 2

(Calling f g with these definitions does not result in the same trace in GHCi 8.6.5 as with the original definitions. However, the code has the expected behavior if loaded into GHCi from a source file like that below.)

Two things to point out here. First, every function definition is a lambda. Second, g was turned into a let expression because we can only apply functions to variables or literals (in Core), not to function calls. Edited to add: It would be reasonable to ask why g = const (trace "hi g" 2) doesn’t translate to \y -> let {tg = trace "hi g" 2} in const tg y (similar to f), to which the pragmatic answer is that apparently the order is the following:

not-fully-applied functions are turned into lambdas,
parameters that are function calls are turned into named variables, and
named function arguments from the left-hand side of = are moved to the right as a lambda.

Evaluation with sharing

This is what happens during the evaluation of f g:

ans = f g

ans is a function call, so its evaluation proceeds with substituting g for the argument of f:

ans = let {x3 = g 3; x10 = g 10} in (+) x3 x10

ans is a let expression, so we put the following thunks for x3 and x10 on the heap under some unique name:

-- Heap:
ans_x3 = g 3
ans_x10 = g 10

…and then proceed with evaluating the in part:

ans = (+) ans_x3 ans_x10

During the evaluation of this function call, ans_x3 will be evaluated (or potentially ans_x10 first, or both in parallel). ans_x3 is a function call, so first we evaluate g to a lambda. As g is a let expression, we create a closure for trace "hi g" 2 on the heap, and then continue with the in part of g (\y -> const tg y). This is a lambda now, meaning it’s in weak head normal form, so the heap contents for g is overwritten with that:

-- Heap:
g_tg = trace "hi g" 2
g = \y -> const g_tg y

Back to ans_x3, now the argument 3 is substituted in the definition of g:

ans_x3 = const g_tg 3

This is a function call, with const already a lambda \x _ -> x, so the arguments can now be substituted in the body, leaving us with

-- Heap:
ans_x3 = g_tg  -- (Pointer to the same address as g_tg.)

During the evaluation of g_tg, the magic printout happens (hi g on stdout), and its value is resolved to be 2, so the heap is updated as such:

-- Heap:
g_tg = 2

And ans_x3 is a pointer to the same memory content 2.

Analogously, the evaluation of ans_x10 proceeds as such:

ans_x10 = const g_tg 10
ans_x10 = g_tg
-- let ans_x10 points to the memory location of g_tg:
ans_x10 = 2

Finally, ans = (+) ans_x3 ans_x10, which evaluates to ans = 4.

Evaluation without sharing

In contrast, the evaluation of f g' proceeds as follows:

ans' = f g'
ans' = let {x3 = g' 3; x10 = g' 10} in (+) x3 x10

-- Heap:
ans_x3'  = g' 3
ans_x10' = g' 10

ans' = (+) ans_x3' ans_x10'
ans_x3' = trace "hi g'" 2

Now hi g' is printed, and the heap is updated:

-- Heap:
ans_x3' = 2

When evaluating ans_x10', we again print hi g', and store the result of the trace under a different thunk:

-- Heap:
ans_x10' = 2

Now ans' evaluates to (+) 2 2, i.e. 4.

Attempt at verifying my translated definitions

I attempted to verify what I was saying above about the definitions of f, g, g' in Core, using the -ddump-simpl compiler flag of GHCi, but it didn’t fulfil my expectations.

Sharing.hs:

module Sharing where

import Debug.Trace

f x = (x (3::Int)) + (x 10) :: Int
g   = const (trace "hi g" (2::Int))  -- share
g'  = \_ -> trace "hi g'" (2::Int)   -- don't share
g'' = let {tg = trace "hi g" (2::Int)} in \y -> const tg y  -- share (equivalent to g)

In GHCi:

Prelude> :set -ddump-simpl -dsuppress-all -Wno-missing-signatures
Prelude> :l Sharing
[1 of 1] Compiling Sharing          ( Sharing.hs, interpreted )

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 52, types: 39, coercions: 0, joins: 0/0}

f = \ x_a1Fl -> + $fNumInt (x_a1Fl (I# 3#)) (x_a1Fl (I# 10#))
g = \ @ b_a1Gi -> const (trace (unpackCString# "hi g"#) (I# 2#))
g' = \ @ p_a1G6 -> \ _ -> trace (unpackCString# "hi g'"#) (I# 2#)
tg_r1F4 = trace (unpackCString# "hi g"#) (I# 2#)
g'' = \ @ b_a1FJ -> \ y_a1Fn -> const tg_r1F4 y_a1Fn

... and some more stuff

Nonetheless, as a SO answer describes, we can see that a function application in Core is defined as Expr Atom, where Atom is var | Literal. I attempted to install ghc-core but the build failed, so further analysis is put on the shelf.

Conclusions

So, what’s the essential difference between g and g'?

g = const (trace "hi g" 2) is a function application where the argument is a function application, which is treated as a let expression. When you evaluate g (), the auxiliary variable introduced by the let – i.e.,tg = trace "hi g" 2 – is evaluated to a literal and its value is stored on the heap. On subsequent calls, some other argument can be applied to the const tg function, but its first argument tg is already evaluated.

In contrast, g' = \_ -> trace "hi g'" 2 is a lambda, so it is already fully evaluated, and nothing in it can be simplified further. If we apply g' first to the argument (), the expression g' () will evaluate to the body of g' with the unused parameter discarded, i.e. trace "hi g'" 2. If we later evaluate g' [], then it again results in the (same) body after the (dummy) function application. Nowhere during this process did we store the value of trace "hi g'" 2: in particular, we didn’t update the definition of g' to \_ -> 2, simply because that is not the definition of g'. (But could we have updated it? Even though functions are always pure, I think the answer is generally no: sometimes the result of a function is bigger than the definition, and the result is not needed often enough to warrant this speed–memory tradeoff.)

Recall the original wording:

functions aren’t shared when there are named arguments but are when the arguments are elided, as in pointfree.

As we saw, functions themselves are never shared. Rather, if g is a partially applied function whose argument is a function application fun arg, then g is equivalent to a let expression, and after its first evaluation g will change to a lambda with fun arg already evaluated.

As a generally-okay heuristic, point-free definitions allow sharing inner function calls, whereas nothing in a lambda (or a function with all arguments on the left-hand side) is shared.

Further resources

More details on similar behavior are given by Tom Ellis in his talk Haskell programs: how do they run? (free registration required to watch the talk).

The talk of David Luposchainsky (a.k.a. quchen) goes into more depth – down to the Core –, in which he uses his own implementation of the spineless tagless graph reduction machine (STG), to visualize the evaluation of any given Haskell code (link to repo).

The wise men puzzle

2018-08-18T00:00:00+02:00

Today I understood the wise men puzzle at a conceptual level, well enough that I could explain it and possibly generalize to similar domains. This post is my attempt at explaining it.

The puzzle is described in (Huth & Ryan, 2000) as follows:

There are three wise men. It’s common knowledge—known by everyone and known to be known by everyone, etc.—that there are three red hats and two white hats. The king puts a hat on each of the wise men in such a way that they are not able to see their own hat, and asks each one in turn whether they are not able to see their own hat, and asks each one in turn whether they know the color of the hat on their head. Suppose the first man says he does not know; then the second says he does not know either. It follows that the third man must be able to say that he knows the colour of his hat. Why is this? What colour has the third man’s hat?

Let’s call the people Alpha, Beta, Gamma, in the order they speak.

One solution is to think about the puzzle in terms of possible worlds. A world in this problem is described by an assignment of hat colors to people, which is equally an ordered triple of colours $⟨c_1, c_2, c_3⟩$ , with $c_i ∈ \{R,W\}$ . There are only 2 white hats, so in the beginning, the seven possible worlds are

$\begin{array}{cc} ⟨R,R,R⟩ & ⟨R,R,W⟩ & ⟨R,W,R⟩ & ⟨R,W,W⟩ \\ ⟨W,R,R⟩ & ⟨W,R,W⟩ & ⟨W,W,R⟩ & \\ \end{array}.$

If Beta and Gamma were both wearing white hats, then Alpha would know that that his hat is red. Therefore, when Alpha says “no”, Beta and Gamma both learn that both of them cannot be white, i.e. at least one of them is red. The remaining possible worlds are

$\begin{array}{cc} {⟨R,R,R⟩} & ⟨R,R,W⟩ & ⟨R,W,R⟩ & \crossed{⟨R,W,W⟩} \\ {⟨W,R,R⟩} & ⟨W,R,W⟩ & ⟨W,W,R⟩ & \\ \end{array}.$

Now, we know that the world is one of the 6 worlds above, but Beta also sees the hats of Alpha and Gamma. What we think as outsiders only matters for whether we can tell who’s wearing what. But back to the observations of A,B,C. When Beta says “no”, that rules out the worlds where Gamma is white (because then Beta would be red).

$\begin{array}{cc} {⟨R,R,R⟩} & \crossed{⟨R,R,W⟩} & ⟨R,W,R⟩ & \crossed{⟨R,W,W⟩} \\ {⟨W,R,R⟩} & \crossed{⟨W,R,W⟩} & ⟨W,W,R⟩ & \\ \end{array}$

This means that Gamma is red, and he also knows this.

Another way

Our solution is more procedural than is necessary, and it does not show the essence of omniscient agents acting with one another. As this problem is small enough, we could list for every world every statement any agent could make, which is simply their knowledge base of true statements (i.e. whatever they can deduce from their view and from the common knowledge, CK). (Say, with atoms $R_1, R_2, R_3, W_1, W_2, W_3$ , meaning “I think person $i$ has color X”, with a $⟨R,W,R⟩$ abbreviating $R_1\wedge W_2 \wedge R_3$ .) We can only do this because we are not interested in making statements like “X knows that Y knows that Z knows that φ”. Besides, in every world, we implicitly include what is common knowledge, and what any agent can see, i.e. the whole problem statement in the opening paragraph. The common knowledge at the beginning in any of these worlds is $\lnot⟨W,W,W⟩$ . That’s not very much, but at least symmetric, which allows us to write down only three worlds.

World $⟨R,R,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,R,R⟩$ , $⟨W,R,R⟩$
Beta: $⟨R,R,R⟩$ , $⟨R,W,R⟩$
Gamma: $⟨R,R,R⟩$ , $⟨R,R,W⟩$ .

World $⟨R,R,W⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,R,W⟩$ , $⟨W,R,W⟩$
Beta: $⟨R,R,W⟩$ , $⟨R,W,W⟩$
Gamma: $⟨R,R,R⟩$ , $⟨R,R,W⟩$

World $⟨R,W,W⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,W,W⟩$
Beta: $⟨R,R,W⟩$ , $⟨R,W,W⟩$
Gamma: $⟨R,W,R⟩$ , $⟨R,W,W⟩$

When Alpha says “no” in the beginning, that means he is not in a world where from his knowledge base he can conclude his own colour. His statement becomes common knowledge (CK), i.e. CK is extended with $\lnot(W_2\wedge W_3)$ .

World $⟨R,R,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,R,R⟩$ , $⟨W,R,R⟩$
Beta: $⟨R,R,R⟩$ , $⟨R,W,R⟩$
Gamma: $⟨R,R,R⟩$ , $⟨R,R,W⟩$

World $⟨R,R,W⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,R,W⟩$ , $⟨W,R,W⟩$
Beta: $⟨R,R,W⟩$ , $\crossed{⟨R,W,W⟩}$
Gamma: $⟨R,R,R⟩$ , $⟨R,R,W⟩$

World $⟨R,W,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,W,R⟩$ , $⟨W,W,R⟩$
Beta: $⟨R,R,R⟩$ , $⟨R,W,R⟩$
Gamma: $⟨R,W,R⟩$ , $\crossed{⟨R,W,W⟩}$

World $⟨R,W,W⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $\crossed{⟨R,W,W⟩}$
Beta: $⟨R,R,W⟩$ , $\crossed{⟨R,W,W⟩}$
Gamma: $⟨R,W,R⟩$ , $\crossed{⟨R,W,W⟩}$

World $⟨W,R,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,R,R⟩$ , $⟨W,R,R⟩$
Beta: $⟨W,R,R⟩$ , $⟨W,W,R⟩$
Gamma: $⟨W,R,R⟩$ , $⟨W,R,W⟩$

World $⟨W,R,W⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,R,W⟩$ , $⟨W,R,W⟩$
Beta: $⟨W,R,W⟩$
Gamma: $⟨W,R,R⟩$ , $⟨W,R,W⟩$

World $⟨W,W,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)$ :

Alpha: $⟨R,W,R⟩$ , $⟨W,W,R⟩$
Beta: $⟨W,R,R⟩$ , $⟨W,W,R⟩$
Gamma: $⟨W,W,R⟩$

We were able to cross out some worlds! And in the world $⟨R,W,W⟩$ we were left with zero possible worlds for Alpha, i.e. Alpha’s statement would lead to a contradiction: he would have answered “yes”. In fact, this was how we eliminated possible-worlds in the previous solution. Next turn: the king asks Beta, who says “no”. The common knowledge is extended with $\lnot(W_1 \wedge W_3)$ . (Right? At this point I can imagine myself making an incorrect deduction.)

World $⟨R,R,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,R,R⟩$ , $⟨W,R,R⟩$
Beta: $⟨R,R,R⟩$ , $⟨R,W,R⟩$
Gamma: $⟨R,R,R⟩$ , $⟨R,R,W⟩$

World $⟨R,R,W⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,R,W⟩$ , $\crossed{⟨W,R,W⟩}$
Beta: $⟨R,R,W⟩$
Gamma: $⟨R,R,R⟩$ , $⟨R,R,W⟩$

World $⟨R,R,W⟩$ , CK: $\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $\crossed{⟨R,R,W}⟩$ , $\crossed{⟨W,R,W⟩}$
Beta: $\crossed{⟨R,R,W}⟩$
Gamma: $\crossed{⟨R,R,R}⟩$ , $\crossed{⟨R,R,W}⟩$

World $⟨R,W,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,W,R⟩$ , $⟨W,W,R⟩$
Beta: $⟨R,R,R⟩$ , $⟨R,W,R⟩$
Gamma: $⟨R,W,R⟩$

World $⟨W,R,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,R,R⟩$ , $⟨W,R,R⟩$
Beta: $⟨W,R,R⟩$ , $⟨W,W,R⟩$
Gamma: $⟨W,R,R⟩$ , $\crossed{⟨W,R,W⟩}$

World $⟨W,R,W⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,R,W⟩$ , $\crossed{⟨W,R,W⟩}$
Beta: $\crossed{⟨W,R,W⟩}$
Gamma: $⟨W,R,R⟩$ , $\crossed{⟨W,R,W⟩}$

World $⟨W,W,R⟩$ , CK: $\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,W,R⟩$ , $⟨W,W,R⟩$
Beta: $⟨W,R,R⟩$ , $⟨W,W,R⟩$
Gamma: $⟨W,W,R⟩$

Another world disappeared. But what about $⟨R,R,W⟩$ , why is it still there, when last time we argued that it’s not possible for Gamma to be white? In fact, it is not: in that world Beta would have said yes, as he knew what colour he had. Although never explicitly stated, we assumed that if someone’s not then he’s white, and vice versa. Use $\star$ to denote this fact:

$\star ≡ \bigwedge_{i=1}^3 (\lnot R_i → W_i) \wedge (\lnot W_i → R_i).$

We also know that common knowledge is true: for every formula $φ$ , it’s an axiom that $\mathcal C φ → φ$ . Then, it’s simple to show that Alpha is red and Gamma is white, Beta is red.

$\mathcal C \Big((R_1 \vee R_2 \vee R_3) \wedge (R_2 \vee R_3) \wedge (R_1 \vee R_3) \Big) \wedge \star \vdash (R_1 \wedge W_3) → R_2.$

Click this to fix that above: click me! (Needs JavaScript.)

Now we are left with the following worlds:

World $⟨R,R,R⟩$ , CK: $\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,R,R⟩$ , $⟨W,R,R⟩$
Beta: $⟨R,R,R⟩$ , $⟨R,W,R⟩$
Gamma: $⟨R,R,R⟩$ , $⟨R,R,W⟩$ $\crossed{⟨R,R,W⟩}$

World $⟨R,W,R⟩$ , CK: $\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,W,R⟩$ , $⟨W,W,R⟩$
Beta: $⟨R,R,R⟩$ , $⟨R,W,R⟩$
Gamma: $⟨R,W,R⟩$

World $⟨W,R,R⟩$ , CK: $\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,R,R⟩$ , $⟨W,R,R⟩$
Beta: $⟨W,R,R⟩$ , $⟨W,W,R⟩$
Gamma: $⟨W,R,R⟩$

World $⟨W,W,R⟩$ , CK: $\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)$ :

Alpha: $⟨R,W,R⟩$ , $⟨W,W,R⟩$
Beta: $⟨W,R,R⟩$ , $⟨W,W,R⟩$
Gamma: $⟨W,W,R⟩$

At first sight, Gamma’s knowledge base in some worlds ( $⟨R,R,R⟩$ ) contains a world with $\lnot R_3$ . But every four of the above worlds has $R_3$ , meaning $R_3$ is deducible from $\star$ and the CK, making $⟨R,R,W⟩$ in world $⟨R,R,R⟩$ impossible. Click me to fix that. This means $R_3$ is CK. Yay!

Note: there might be some other true statements that could be deduced, so maybe Alpha knows his colour too in some worlds—I haven’t solved the problem in full. For example, when Gamma answers “yes” in the end, it doesn’t say anything we didn’t already know, and nothing that Alpha and Beta didn’t know already, as $R_3$ can be deduced from the common knowledge. Maybe someone else knows theirs too?

Another problem

A slight modification is to map a natural number $k$ to worlds where X is able to decide their colour after $k$ utterances, if it wasn’t X who spoke last.

Related: it feels like there is a situation with $n>2$ people, where two agents can keep on discarding possible worlds just by them speaking in turns. If you know of one such problem, please let me know.

Notes

I hope I didn’t make a mistake in the calculations, I admit I enumerated the possible worlds by hand instead of with Prolog.

Conclusion

Listen to people when they say “no”.

References

Huth, M., & Ryan, M. D.(2000). Logic in Computer Science - modelling and reasoning about systems. Cambridge University Press.

Blog post summary: Medical AI safety: where are we and where are we heading

2018-07-11T00:00:00+02:00

In this post I summarize a blog post about “medical AI safety”: the potential consequences of using advanced medical systems without sufficient evidence to back up their usefulness.

Epistemic status: the author (Luke Oakden-Rayner) is a PhD candidate radiologist, and I’m not an expert in medicine.

For the first time ever, AI systems could actually be responsible for medical disasters.

The risk of a medical AI system increases with its complexity: from the lowest complexity processing systems, through triage systems that order the priority queue of patients, we are now moving towards autonomous diagnostic systems, and eventually to autonomous prediction systems.

Some systems in the wild are worse than humans in both recall and sensitivity:

“Not only did CAD [computer-aided diagnosis] increase the recalls without improving cancer detection, but, in some cases, even decreased sensitivity by missing some cancers.”

Nonetheless, we are already proceeding to the next level:

A few months ago the FDA approved a new AI system by IDx, and it makes independent medical decisions without the need for a clinician. [In this case, screening for eye disease through a retina scan.]

But on the upside, these tools improve the ratio of people screened:

But while there is a big potential upside here (about 50% of people with diabetes are not screened regularly enough), and the decision to “refer or not” is rarely immediately vision-threatening, approving a system like this without clinical testing raises some concerns.

And systems operate now on a larger scale too:

NHS is already using an automated smart-phone triage system “powered by” babylonhealth AI. This one is definitely capable of leading to serious harm, since it recommends when to go (or not to go) to hospital.

… which system gave 90% confidence to non-lethal diagnosis X, not even offering lethal diagnosis Y which was suggested by 90% of MDs on Twitter. (And I assume it’s not even an adversarial attack.) It’s fair to say that there is room for improvement. (Compare this with the amount of news coverage received by the monthly crash of an autonomous vehicle.)

The real point is that none of the FDA, NHS, nor the various regulatory agencies in other nations appear to be concerned [to the extent required] about the specific risks of autonomous decision making AI.

Are we potentially racing towards an AI event on the scale of elixir sulfanilamide [which prompted the foundation of FDA] or thalidomide [which the FDA banned before other countries, preventing 10,000 birth malformations]?

International Winter School on Gravity and Light, Tutorial 3: Multilinear Algebra – Solutions for Exercise 1

2018-06-09T00:00:00+02:00

Solutions for exercise 1 of tutorial 3 of the International Winter School on Gravity and Light. (Link to video of lecture 3.)

Notation

On this solution sheet, I’ll speak of a vector space $(V,+,\cdot)$ over a field $K$ , where $+: V\times V \rightarrow V$ is the addition and $\cdot: K \times V \rightarrow V$ is called (scalar) multiplication or S-multiplication. The field $(K, \textcolor{red}{+}, \textcolor{red}{\cdot})$ has $\textcolor{red}{+}:K\times K \rightarrow K$ as addition and $\textcolor{red}{\cdot}:K\times K \rightarrow K$ as multiplication operations. The dot is often omitted, i.e. $a \mathbf v$ is short for $a \cdot \mathbf v$ , $a b$ is short for $a \textcolor{red}{\cdot} b$ . (Note that the lecture dealt with real vector spaces, i.e. the field $K$ was always the set of reals $\mathbb R$ .) The scalars, i.e. the elements of $K$ , are denoted with normal letters $a,b$ , and the vectors, i.e. the elements of $V$ , are denoted with boldface letters $\mathbf u, \mathbf v, \mathbf w$ .

Exercise 1: True or false?

Tick the correct statements, but not the incorrect ones. Show all answers

a) Which statements on vector spaces are correct?

?. Commutativity of multiplication is a vector space axiom. Show answer

Answer: false.

Clarification:

The scalar multiplication $\cdot: K \times V \rightarrow V$ doesn’t even have the same sets in its two arguments, i.e. $\mathbf v \cdot a$ is not even defined.
The vector space has the commutativity of addition as an axiom: for any $\mathbf u,\mathbf v \in V$ , ${\mathbf u+\mathbf v} = {\mathbf v + \mathbf u}$ .
The underlying field $K$ does have the commutativity of multiplication as a field axiom: for any $a,b \in K$ , $a \textcolor{red}{\cdot} b = b \textcolor{red}{\cdot} a$ .
As a consequence, for any $\mathbf v \in V$ and $a, b \in K$ ,

$a (b \mathbf v) = (a\textcolor{red}{\cdot} b)\mathbf v = (b \textcolor{red}{\cdot} a)\mathbf v = b(a \mathbf v).$

?. Every vector is a matrix with only one column. Show answer

Answer: false.

Clarification:

By definition, a vector is an element of a vector space. If we fix a basis for the vector space, then any vector can be represented by an ordered set of numbers, which could be treated as a column vector, i.e. a matrix with one column. However, this representation depends on the choice of basis.
The official answer brings up as a counterexample the vector space of polynomials up to some finite degree. However, here again we could represent the vectors as a column vector with any choice of a basis. E.g. using the standard basis, $p(x) = 0x^2 + 4x + 5$ could be represented as $\mathbf p = [0, 4, 5]^T$ .

?. Every linear map between vector spaces can be represented by a unique quadratic matrix. Show answer

Answer: false.

Clarification:

As above, a linear map $f: V \rightarrow W$ can be represented as a unique matrix only once bases are chosen for its domain $V$ and codomain $W$ .
This matrix is quadratic only if the dimensions of $V$ and $W$ are equal.

?. Every vector space has a corresponding dual vector space. Show answer

Answer: true.

Clarification:

The dual space of a vector space $V$ is defined as the set of linear maps from $V$ to $K$ : $V^* \coloneqq Hom(V,K) \coloneqq \{φ\ \vert \ φ: V \linmap K\}$ .

?. The set of everywhere positive functions on $\mathbb R$ with pointwise addition and S-multiplication is a vector space. Show answer

Answer: false.

Clarification:

This set doesn’t have a commutative identity element: by the field axioms of $\mathbb R$ , it could only be the constant zero function, but that’s not an element of the set.
This set doesn’t have a commutative inverse for any element.
For the scalar multiplication we’d need to know the underlying field. Usually it would be $\mathbb R$ , but then S-multiplication with a negative number wouldn’t result in an everywhere positive function. (Although one can construct a field from $\mathbb R^+$ , I wonder how well that would combine with the above attempt at a vector space.)

b) What is true about tensors and their components?

?. The tensor product of two tensors is a tensor. Show answer

Answer: true.

Clarification:

The lecture didn’t mention tensor products, so a definition is in order. The product of an $(l,k)$ -tensor $S$ and an $(n,m)$ -tensor $T$ is an $(l+n,k+m)$ -tensor $S \otimes T$ , whose $(i_1, \ldots, i_{l+n}, j_1, \ldots, j_{k+m})$ -th component is the product of the relevant components of $S$ and $T$ :

$(S \otimes T)^{i_1, \ldots, i_l, i_{l+1}, \ldots, i_{l+n}}_ {j_1, \ldots, j_k, j_{k+1}, \ldots, j_{k+m} } = S^{i_1, \ldots, i_l}_ {j_1, \ldots, j_k} T^{i_{1}, \ldots, i_{n}}_ {j_{1}, \ldots, j_{m}}.$

Source: Wikipedia

This means that if the arguments of $S \otimes T$ are

the $l+n$ linear maps $φ^{(p)} = \sum^{dim V}_{i=1} \varphi^{(p)}_i \epsilon^i$ for $1 \le p \le l+n$ , and
the $k+m$ vectors $\v_{(q)} = \sum^{dim V}_{j=1} v_{(q)}^j \e_j$ for $1 \le q \le k+m$

(with some particular choice of basis vectors $\{\e_i\}_i$ and basis covectors $\{\epsilon^i\}_i$ ), then

$\begin{aligned} (S\otimes T) &(φ^{(1)}, \ldots, φ^{(l+n)}, \v_{(1)}, \ldots, \v_{(k+m)}) = \\ &= S (φ^{(1)}, \ldots, φ^{(l)}, \v_{(1)}, \ldots, \v_{(k)})\,\cdot\, T (φ^{(l+1)}, \ldots, φ^{(l+n)}, \v_{(k+1)}, \ldots, \v_{(k+m)})\\ &= \Bigg( \sum_{i_1}^{\dim V} \cdots \sum_{i_l}^{\dim V} \sum_{j_1}^{\dim V} \cdots \sum_{j_k}^{\dim V} \varphi^{(1)}_{i_1} \ldots \varphi^{(l)}_{i_l} v_{(1)}^{j_1} \ldots v_{(k)}^{j_k} S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k} \Bigg) \cdot \phantom.\\ &\phantom{=} \Bigg( \sum_{i_{l+1}}^{\dim V} \cdots \sum_{i_{l+n}}^{\dim V} \sum_{j_{k+1}}^{\dim V} \cdots \sum_{j_{k+m}}^{\dim V} \varphi^{(l+1)}_{i_{l+1}} \ldots \varphi^{(l+n)}_{i_{l+n}} v_{(k+1)}^{j_{k+1}} \ldots v_{(k+m)}^{j_{k+m}} T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}} \Bigg) \\ &= \sum_{i_1}^{\dim V} \cdots \sum_{i_{l+n}}^{\dim V} \sum_{j_1}^{\dim V} \cdots \sum_{j_{k+m}}^{\dim V} \varphi^{(1)}_{i_1} \ldots \varphi^{(l+n)}_{i_{l+n}} v_{(1)}^{j_1} \ldots v_{(k+m)}^{j_{k+m}} S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k} T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}}. \end{aligned}$

These $(l+n+k+m)$ summations are quite a mess, but the above derivation shows that the Einstein summation convention works for tensor products as well:

$\begin{aligned} (S\otimes T) &(φ^{(1)}, \ldots, φ^{(l+n)}, v_{(1)}, \ldots, v_{(k+m)}) =\\ &= S (φ^{(1)}, \ldots, φ^{(l)}, v_{(1)}, \ldots, v_{(k)})\,\cdot\, T (φ^{(l+1)}, \ldots, φ^{(l+n)}, v_{(k+1)}, \ldots, v_{(k+m)})\\ &= \Big( \varphi^{(1)}_{i_1} \ldots \varphi^{(l)}_{i_l} v_{(1)}^{j_1} \ldots v_{(k)}^{j_k} S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k} \Big) \Big( \varphi^{(l+1)}_{i_{l+1}} \ldots \varphi^{(l+n)}_{i_{l+n}} v_{(k+1)}^{j_{k+1}} \ldots v_{(k+m)}^{j_{k+m}} T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}} \Big) \\ &= \varphi^{(1)}_{i_1} \ldots \varphi^{(l+n)}_{i_{l+n}} v_{(1)}^{j_1} \ldots v_{(k+m)}^{j_{k+m}} S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k} T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}}. \end{aligned}$

?. You can always reconstruct a tensor from its components and the corresponding basis. Show answer

Answer: true.

Clarification:

If we know the basis vectors for the vector space and the dual vector space, then the components of the vector and covector arguments are uniquely determined, and we can apply the tensor to the arguments using the components of the tensor (or some relevant finite subset in case $V$ is not finite dimensional).

?. The number of indices of the tensor components depends on dimension. Show answer

Answer: false.

Clarification:

A tensor component usually has one index for each argument, e.g. for a $(2,1)$ -tensor $T$ , the components are $T^{i_1,i_2}_{j_1}$ .
The range of these indices does depend on the dimension: each index ranges from $1$ to $\dim V$ . Therefore an $(n,m)$ -tensor $T$ has $(\dim V)^{n+m}$ many components.

?. The Einstein summation convention does not apply to tensor components. Show answer

?. A change of basis does not change the tensor components. Show answer

c) Given a basis for a $d$ -dimensional vector space $V$ , …

?. …one can find exactly $d^2$ -different dual bases for the corresponding dual vector space $V^*$ . Show answer

Answer: false.

Clarification:

Given a basis of $V$ , $E = \{\mathbf{e}_i\}_{i=1}^d \subset V$ , there is a unique dual basis of $V^*$ , namely $E^* = \{\epsilon_i\}_{i=1}^d$ , where $\epsilon_i(\e_i) = 1$ and $\epsilon_i(\e_j) = 0$ for $i ≠ j$ .

?. …by removing one basis vector of the basis of $V$ , a basis for a $(d - 1)$ -dimensional vector space $V_1$ is obtained. Show answer

Answer: true.

Clarification:

The resulting set of $(d-1)$ vectors are still linearly independent, and their span is a $(d-1)$ -dimensional subspace of $V$ .

?. …the continuity of a map $f : V → W$ depends on the choice of basis for the vector space $W$ . Show answer

Answer: false.

Clarification:

The continuity of a map is defined for topological spaces, not for vector spaces.
$f$ is continuous iff the preimage of every open set in $W$ is open in $V$ . Note that no term in this definition depends on the choice of basis for either $V$ or $W$ .
Assuming that $V$ and $W$ are real vector spaces, it is customary to equip them with the standard topology. A set $A$ is open in $V$ iff either it is the union of open $ε$ -balls, or of Cartesian products of open intervals. While these definitions assume a basis for $V$ , they all result in the exact same topologies. (Meaning a set can be covered with open balls iff it can be covered with open cuboids iff it can be covered with open cubes – an interesting but easy-to-prove result.)
It’s easy to see that every linear map between real vector spaces (equipped with the standard topology) is continuous.

?. …one can extract the components of the elements of the dual vector space $V^*$ . Show answer

Answer: true.

Clarification:

a basis for $V$ uniquely determines a dual basis for $V^*$ , which uniquely determines the components of any covector.

?. …each vector of $V$ can be reconstructed from its components. Show answer

Answer: true.

Clarification:

Given the basis vectors $\mathbf{e}_i$ and components $v^i$ for $1 \leq i \leq d$ , $\mathbf{v} = \sum_{i=1}^d v^i \mathbf{e}_i$ .

Probabilistically interesting planning problems

2018-05-28T00:00:00+02:00

This post briefly describes the problem of probabilistic planning, and explains in informal terms what makes a planning problem probabilistically interesting, along with some examples.

Primer on probabilistic planning

In a nutshell, planning is about finding a way to win, and as such, the field of research on planners is vast. However, there is no single textbook definition of “planning”, so in this post I’ll try to be as general as possible. One description of a planning problem could be: given a description of an environment, find a sequence of actions that brings the environment from the initial state of the environment to a goal state. There are multiple ways to describe the environment: for example in formal logic with the situation calculus, or more commonly as a Markov decision process (MDP). In probabilistic planning problems, the functions describing the are not necessarily deterministic: executing action $a$ in state $s$ will bring the environment to state $s'$ with a probability of $T(s,a,s')$ . In contrast with the control problem of reinforcement learning, where the goal is to find an optimal policy (i.e. a mapping from states to actions), in planning one is interested only in a partial policy that brings the agent closer to a goal state, or frequently only a single action that brings the agent closer to a goal state from the current state. An example planning problem is thus: “Siri, show me a way to the library.” Then Siri responds either with a plan that I can follow from the first step to the last (i.e. a route from start to finish), or only an action that I can take right now (“go forward 100 meters”).

Graphical representation of an example MDP:

An example policy for the same MDP:

An example plan for the same MDP:

The approach taken by a planner differs based on the discounting factor \ $\gamma \$ and the distribution of rewards. In a shortest path problem the future rewards are discounted (\ $0 < \gamma < 1 \$ ), and there might be a constant negative reward for every step taken. Together with a positive reward in goal states, an agent with the goal of maximizing return – i.e. the sum of discounted expected future rewards – has incentives to minimize the length of the path to the goal. However, if there is no discounting (\ $\gamma = 1 \$ ) and there’s a positive reward only in the goal states, it is sufficient for the agent to find any way to the goal. (Some call these goal-based problems (Yoon, Fern, Givan, & Kambhampati, 2008).) In the next section we’ll see that not all plans are created equal, so even in the non-discounted case we want one that ends up in a goal state with the highest probability.

In an offline approach to deterministic planning problems, a planner is given an environment, initial state and goal state, and it needs to return a sequence of actions that brings the environment to the goal state. However, this offline approach does not work for probabilistic problems, where the outcome of an action is not always in our control. Hence a probabilistic planner is usually executed online: it makes an observation (e.g. the current state of the environment, in the fully observable case), does some magic, and outputs a single action that brings the agent closer to a goal state. Nature brings the agent to a new state, not necessarily the one you desired, and these steps are repeated, until you run out of time or end up at a goal.

Since the fourth International Planning Competition in 2004 hosted by the ICAPS (International Conference on Automated Planning and Scheduling), this event featured a probabilistic track. The winner of IPPC 2004 was FF-Replan, a planner that simplifies the probabilistic planning problem into a deterministic one by not considering the multiple potential effects of an action (Yoon, Fern, & Givan, 2007) – hence the title of the paper, “FF-Replan: A Baseline for Probabilistic Planning.”

Probabilistically interesting planning problems

Iain Little and Sylvie Thiébaux analyzed the common characteristics of planning problems that can and cannot be optimally solved by a planner like FF-Replan (Little & Thiébaux, 2007). They gave necessary and sufficient conditions for a probabilistic planning problem to be probabilistically interesting: on a problem fulfilling these conditions, a planner that determinizes the problem will lose crucial information, and will do worse than a probabilistic planner. In this section I’ll summarize these conditions using natural language, slightly diverging from the vocabulary of the paper. For formal definitions and more examples, see the original paper; it is an interesting read.

Criterion 1: there are multiple paths from the start to the goal. If there is only a single path, then any planner that finds a path will do equally good, as this will be the only one.

Counterexample:

Criterion 2: where the above two paths diverge, there is a choice about which way to go, i.e. a state \ $s_{crossroads}\$ from which action \ $a_1\$ leads to one road with a different probability than action \ $a_2\$ does. (Yes, this is a sufficient condition for the first criterion.) If it’s only luck that separates the two paths, then the agent doesn’t have much of a choice to do better.

Counterexample:

Criterion 3: there must be a non-trivially avoidable dead end in the environment. A dead end is an absorbing state that is not a goal state, i.e. a state from which there is no path to any goal state. For a dead end to be avoidable, there must be a state \ $s_{crossroads}\$ with at least two possible actions \ $a_{deadly}\$ and \ $a_{winning}\$ , such that executing \ $a_{deadly}\$ brings the agent to the dead end with a higher probability than executing \ $a_{winning}\$ . A dead end is non-trivially avoidable if \ $s_{crossroads}\$ is on a path from the initial state to a goal state, and there is a non-zero chance of reaching a goal state after executing either \ $a_{winning}\$ or \ $a_{deadly}\$ .

Counterexample: the probabilistic version of Blocksworld, where the worst case scenario is that a block is dropped accidentally, does not contain dead ends; the environment is irreducible. (This was an actual problem of IPPC 2004.)

Counterexample: all dead ends are unavoidable.

Counterexample: all dead ends are trivially avoidable.

A simple yet “interesting” planning problem

A very simple problem that is probabilistically interesting is what the authors call climber, described by the following story:

You are stuck on a roof because the ladder you climbed up on fell down. There are plenty of people around; if you call out for help someone will certainly lift the ladder up again. Or you can try to climb down without it. You aren’t a very good climber though, so there is a 40% chance that you will fall and break your neck if you do it alone. What do you do?

Graphical representation of the climber problem:

Despite the simplicity of this problem, most methods to turn it into a deterministic problem fail. Little and Thiébaux described 3 ways to determinize a problem, and they called a resulting deteministic problem a “compilation”.

The REPLAN1 approach simply drops all but the most likely outcome of every action, and finds the shortest goal trajectory. (This was the approach used by FF-Replan.) Compilation of the climber problem according to REPLAN1:

REPLAN2(shortest) turns every possible probabilistic outcome of an action into the outcome of a deterministic action, each with a cost of 1. Optimizing for smallest cost thus finds the shortest goal trajectory, but this might not be the one with the highest success probability. Compilation of the climber problem according to REPLAN2(shortest):

REPLAN2(most-likely) also turns every outcome into a separate deterministic action, but the new action costs are the negative log probability of the relevant outcome. This is the only compilation of the problem that finds the optimal path for climber, but for many other problems even this one will be suboptimal. The resulting compilation is as follows:

Summary

Finding the optimal goal trajectory in a probabilistic planning problem is computationally expensive, so most planners use some heuristics. One way to plan in a stochastic environment is to change the probabilistic planning problem into a deterministic shortest path problem and replan after (almost) every step, which is computationally efficient, but in many cases suboptimal. This article outlined the attributes of probabilistically interesting problems, where the deterministic replanning approach often fails. As such, recent probabilistic planners use more complicated methods (or often a portfolio of probabilistic planners), but replanners remain a good baseline to compare against.

References

Little, I., & Thiébaux, S. (2007). Probabilistic planning vs. replanning. Workshop, ICAPS 2007. Retrieved from http://users.cecs.anu.edu.au/ iain/icaps07.pdf
Yoon, S. W., Fern, A., & Givan, R. (2007). FF-Replan: A Baseline for Probabilistic Planning. In M. S. Boddy, M. Fox, & S. Thiébaux (Eds.), Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling, ICAPS 2007, Providence, Rhode Island, USA, September 22-26, 2007 (p. 352). AAAI. Retrieved from http://www.aaai.org/Library/ICAPS/2007/icaps07-045.php
Yoon, S. W., Fern, A., Givan, R., & Kambhampati, S. (2008). Probabilistic Planning via Determinization in Hindsight. In D. Fox & C. P. Gomes (Eds.), Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, AAAI 2008, Chicago, Illinois, USA, July 13-17, 2008 (pp. 1010–1016). AAAI Press. Retrieved from http://www.aaai.org/Library/AAAI/2008/aaai08-160.php

Change YouTube speed from your favorites bar

2018-05-23T00:00:00+02:00

Premise

While the UI of YouTube only shows only limited set of high-speed options, it is possible to set the speed to any floating point value. Even better, one can do so from their favorites bar with bookmarklets.

⇒

Method

Simply add any of the following code snippets as bookmarks.

If you have a fixed speed in mind, e.g. 2.5:

javascript:document.getElementsByTagName("video")[0].playbackRate=2.5;

Or save this line to show a prompt that asks for a floating-point input:

javascript:var%20speed=prompt("Speed:","1.");document.getElementsByTagName("video")[0].playbackRate=parseFloat(speed);

Which results in the following prompt:

Caveats

Works with YouTube and Vimeo.

The speed display in the video player will remain to show the last setting.

References

GIF: Top Gun
Script: Quora answer of John Vuong

Some versatile tools for bash

2018-05-16T00:00:00+02:00

I rarely use bash besides the basics: I could use a for loop even if woken up at night, but my knowledge of the language doesn’t go much further. Hence instead of trying to memorize all the {}%$ magic, having a few versatile commands in my toolbox comes handy.

Recently I faced the task of renaming a set of files {foo 02.jpg, …, foo 74.jpg} to {foo 06.jpg, …, foo 78.jpg}, while keeping the order. My approach contained nothing extraordinary:

#!/bin/bash

for i in `seq 74 2`
do
    printf -v oldname "foo %02d.jpg" $i
    printf -v newname "foo %02d.jpg" $(echo "$i+4" | bc)

    mv "$oldname" "$newname"
done

Yet there were some educational points in it:

One loops a variable x over the lines of a string values by for x in values; do something; something_else; done.
seq a b simply prints out the integers from a to b, inclusive, regardless of which is larger.
Variable x is assigned a value by x=foobar, where there must be no spaces around the equation sign.The value of x can then be referred to by $x.
Renaming a set of files to a similar name but later in the alphabet must be done in reverse order.
Bash has a built-in printf that seems to work as in C: first the string to be printed with format specifiers like %02d, followed by the arguments whose values are used according to the format specifiers.
With the -v option of printf, you can save the output into a variable.
One can use $( ) for executing a command and having bash treat the output as the source code. (It’s the same as using backticks, as around seq 74 2, but allows nesting and is clearer. Kinda like eval in other languages, like JavaScript.) Not shown here, but it even works in quotation marks, e.g. "$(echo hey yo)" is like writing "hey yo". Note that the trailing newline is deleted.
bc is a calculator that reads from the input and outputs nothing but the result on a single line.
Don’t forget the quotes around arguments with spaces, like with mv above.

One minute of further bash tips are provided by Julia Evans [here].

Some proofs in first-order logic

2018-02-27T00:00:00+01:00

I had the fortune to study classical logic from László Csirmaz at the Eötvös Loránd University, Budapest. Although I was not officially enrolled in the course, he was kind enough to mark my weekly homework regardless of my lack of student status. These were originally written in Hungarian, and I translated a few of them into English.

A non-standard model of Robinson arithmetics

Give a model that fulfills every axiom of the Robinson arithmetics, and which contains contains two elements that are neither greater than or equal to, nor smaller than or equal to one another; or prove that such a model doesn’t exist.

Solution (PDF).

A two-formula version of the diagonal lemma

Let $\Gamma$ be a theorem which can represent every recursive function. Prove that for every pair of formulae $\Phi(x)$ and $\Psi(x)$ with one free variable, there exist closed formulae $\eta$ and $\theta$ such that $\Gamma \proves \eta \,\leftrightarrow\, \Phi(\Godel{\theta})$ and $\Gamma \proves \theta \,\leftrightarrow\, \Psi(\Godel{\eta})$ .

Solution (PDF).

Final steps of the proof of Gödel’s completeness theorem

When proving Gödel’s completeness theorem during the lectures, I was missing a crucial step from the proof, so I proved it myself.