Laszlo Treszkaihttps://www.treszkai.com/2021-01-06T00:00:00+01:00Estimating personal COVID risk from population-level data2021-01-06T00:00:00+01:002021-01-06T00:00:00+01:00Laszlo Treszkaitag:www.treszkai.com,2021-01-06:/2021/01/06/microcovids/<p>If N people get infected per day in a country with susceptible population size S, then doing “average” activities has approximately an N/S risk of contracting it daily.</p><p><strong>In a nutshell:</strong> If <em>N</em> people get infected per day in a country with susceptible population size <em>S</em>, then doing “average” activities has approximately an <em>N/S</em> risk of contracting it on that day.</p>
<h2>Introduction</h2>
<p>The <a href="https://www.microcovid.org">microCOVID tool</a> is great for estimating the chances of contracting <span class="caps">COVID</span>-19 during a given activity.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup> It does so by estimating the risk of transmission of various activities from literature data, combining this with an estimate of a met person being <span class="caps">COVID</span>-positive.</p>
<p>In this post, I’m building a model which assumes that every citizen of the country acts exactly the same on a given day, which leads to the current infection rate. This means that if you do the same activities as “everyone else”, you’ll have the same chance of getting infected as “everyone else”.</p>
<p>Caveat: these are all back-of-the-envelope calculations of an <em>extremely</em> simplistic model.</p>
<h2>The numbers</h2>
<p>Let’s run the numbers of this model <a href="https://en.wikipedia.org/w/index.php?title=COVID-19_pandemic_in_Hungary&oldid=998710968">on Hungary, on January 6, 2021</a>.</p>
<ul>
<li>New cases (average of 7 days): 1746/day</li>
<li>Recovered: 180,000</li>
<li>Population (<sup id="fnref:kids"><a class="footnote-ref" href="#fn:kids">2</a></sup>): 9,770,000</li>
<li>Susceptible (<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">3</a></sup>): population - recovered = 9,590,000</li>
</ul>
<p>One might argue<sup id="fnref:3"><a class="footnote-ref" href="#fn:3">4</a></sup> that the actual case count is higher than those tested positive, but I assume cases which do not make the books are not severe enough to warrant a test. However, if your country doesn’t seem to do enough tests (e.g. because the ratio of positive results per test is absurdly high), then the actual case count is surely higher.</p>
<p>That means that for the “average” person, the chance of becoming infected is 1,746 / 9,590,000 / day ≈ 180 microCOVIDs / day.</p>
<p>Note that this “average” person does not refer to the most common person (the <em>mode</em>) or the median; but literally, this Average Joe makes up the entire 10 million population of the country. In this model, everyone is equal. (And nobody more equal than the others.)</p>
<p>Act like this average person for an entire year (assume January 6 conditions all year long), and you have 6% chance of contracting the disease. (Because of course, every day of the year is equal.)</p>
<h2>Comparing with microCOVID</h2>
<p>I <a href="https://www.microcovid.org/?distance=sixFt&duration=480&interaction=oneTime&personCount=1&riskProfile=average&setting=indoor&theirMask=basic&topLocation=Hungary&voice=normal&yourMask=basic">tried to simulate</a> the working day of the “average” citizen with the microCOVID tool. I did this using a one-time interaction (daily):</p>
<ul>
<li>for 8 hours</li>
<li>with 1 person</li>
<li>at 6+ feet / 2+ meters</li>
<li>indoors</li>
<li>cotton mask or bandana on me<sup id="fnref:5"><a class="footnote-ref" href="#fn:5">5</a></sup> and on them<sup id="fnref:6"><a class="footnote-ref" href="#fn:6">6</a></sup></li>
<li>having normal conversation.</li>
</ul>
<p>This adds up to about 200 microCOVIDs each day. That’s surprisingly close to my figure! I pinky-swear that I first picked these settings based on a gut feeling, and didn’t adjust them to approximate 180 microCOVID.</p>
<p>Obviously, changing the settings will move the risk away from 200 μCOVID/day. Grocery shopping and spouses/kids/relatives should also be added to the list. But the fact that these two models are in the same ballpark is good validation.</p>
<h2>Conclusion</h2>
<p>The super easy method for pandemic risk assessment:</p>
<ol>
<li>Calculate the risk for the Average Citizen, by simply dividing the daily case increase<sup id="fnref:avg"><a class="footnote-ref" href="#fn:avg">7</a></sup> with the population size (whether that’s a city or a country). Adjust this upwards if you think your country has insufficient testing practices.</li>
<li>Adjust this with some factors for how you think your behavior compares with the average. Working from home? Divide by 5. Doing grocery shopping online? Divide by 2. Meeting a dozen people at the office every day? Multiply by five. I’m just making these numbers up, but so can you. You should do this step <em>at the beginning</em>, to minimize fooling yourself.</li>
<li>Multiply this by 365 to get the risk of contracting the virus in a year<sup id="fnref:year"><a class="footnote-ref" href="#fn:year">8</a></sup>. If you want to go fancy, use lower figures for the summer, higher ones while the graphs are skyrocketing. (Again, decide beforehand how to calculate this step.)</li>
</ol>
<p>Finally, smash a generous error bar on the result: say, plus or minus an order of magnitude.</p>
<p>Can you live with 6% a chance of <span class="caps">COVID</span>-19 in the coming year? If not, then maybe you should scale back your activities. If your country’s average risk is too low for you (for example, because you’re young and live in New Zealand and are more likely to die in a car accident), then consider saying hello to the neighbors from a friendly distance.</p>
<p>Stay safe. Wear a mask, wear a helmet, wear a safety belt.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>It also introduces a <span class="caps">COVID</span> budget that you can allocate as you wish: if you target a 1% chance of contracting <span class="caps">COVID</span> in a year, then you have 200 microCOVIDs allocated for each week. (0.01 / 50 = 0.0002.) Spend it wisely. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:kids">
<p>Until recently, kids did not play a significant role in the transmission and hospitalization, therefore minors could be (or could have been) deducted from the susceptible population. <a class="footnote-backref" href="#fnref:kids" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
<li id="fn:2">
<p>microCOVID seems to compare the case count against the entire population. I count a recovered person as not immune to the virus. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 3 in the text">↩</a></p>
</li>
<li id="fn:3">
<p>In Hungary, the reported prevalence is 0.12%, but microCOVID uses an adjusted prevalence of 0.39%. This correction of 3x my model should use too. <a class="footnote-backref" href="#fnref:3" title="Jump back to footnote 4 in the text">↩</a></p>
</li>
<li id="fn:5">
<p>You should totally buy an <span class="caps">FFP</span>-2 mask and fit it snugly to your face. If you wear it for long periods then buy a handful and rotate them daily. Dispose after one month. Adjust numbers as budget allows, but <em>buy one good mask</em>. This does not constitute medical advice. <a class="footnote-backref" href="#fnref:5" title="Jump back to footnote 5 in the text">↩</a></p>
</li>
<li id="fn:6">
<p>Convince your employer to buy every employee an <span class="caps">FFP</span>-2 mask. The less time you spend on sick leave, the better it is for them. <a class="footnote-backref" href="#fnref:6" title="Jump back to footnote 6 in the text">↩</a></p>
</li>
<li id="fn:avg">
<p>Preferably averaged over the last 7 days. <a class="footnote-backref" href="#fnref:avg" title="Jump back to footnote 7 in the text">↩</a></p>
</li>
<li id="fn:year">
<p>Technically, you should calculate 1 − (1−p)<sup>365</sup>, but that’s practically 365 × p. The overall calculation has much bigger errors anyway. <a class="footnote-backref" href="#fnref:year" title="Jump back to footnote 8 in the text">↩</a></p>
</li>
</ol>
</div>The effects of daylight savings time adjustment on the incidence rate of acute myocardial infarction: a Bayesian meta-analysis (original research)2019-11-11T00:00:00+01:002019-11-11T00:00:00+01:00Laszlo Treszkaitag:www.treszkai.com,2019-11-11:/2019/11/11/dst-vs-ami/<p>A Bayesian meta-analysis to evaluate whether one is more likely to get a heart attack after losing an hour of sleep. They are, a little.</p><p><em>Laszlo Treszkai (firstname.lastname@gmail.com)</em></p>
<p>Version of 11 November, 2019.</p>
<p>This document might be revised in the future; any potential updates will be linked from here.</p>
<h2>Abstract</h2>
<h3>Background</h3>
<p>Multiple observational studies claim that the daylight savings time (<span class="caps">DST</span>) adjustment in spring causes an increase in acute myocardial infarction (<span class="caps">AMI</span>) count during the following days or weeks, attributing this increase to the reduction in sleep or the disturbance in the circadian rhythm. Previous studies used frequentist methods for interval estimation and often showed “statistically significant” differences, although the results were inconsistent and sometimes the effects in the same study were incoherent (such as a significant difference on Tuesday but not on Monday). A recent meta-analysis used frequentist methods and showed an increase in incidence rate after the spring adjustment and could not show a change after the autumn adjustment.</p>
<h3>Methods</h3>
<p>This study reanalyzes the data described in the relevant observational studies. We propose a Bayesian model that should capture the alleged phenomenon truthfully, apply this model consistently to every study, and combine the results using a fixed-effects model. Under our model, the risk ratio on Monday is the highest, it is slightly lower on Tuesday, and it decreases linearly to 1 until Saturday. We do the calculations using both analytic methods and Monte Carlo methods with the Stan software.</p>
<h3>Results</h3>
<p>In total, 7 observational studies were identified and analyzed, from which one was excluded. The remaining 6 studies included 14,024 <span class="caps">AMI</span> incidences on the week following spring <span class="caps">DST</span> adjustment, and 15,921 incidences on the week following autumn <span class="caps">DST</span> adjustment.
Together with related trend data obtained from the surrounding weeks, these figures show a risk ratio (<span class="caps">RR</span>) of 107.7% on the Monday following a spring <span class="caps">DST</span> change (95% credible interval: [104.8%, 110.7%]), and a mean <span class="caps">RR</span> of 97.7% (95% CrI: [95.1%, 100.3%]) after the autumn <span class="caps">DST</span> change. The results from analytic and Monte Carlo methods matched precisely. The credible intervals obtained from a non-informative prior yield practically the same results, and so does a slightly more complex model for the time decay of the effect.</p>
<h3>Conclusion</h3>
<p>Overall, the spring <span class="caps">DST</span> adjustment has a small but quasi-certain positive effect on <span class="caps">AMI</span> incidences, and the risk ratio in autumn is approximately 1 or slightly less than 1.
We note that the combined <span class="caps">RR</span> is less than half of what has been suggested by certain smaller but highly cited studies, but our analysis shows larger effects than the recent meta-analysis of the same data by Manfredini et al. (2019).
Our results give strong support to the hypothesis that the <span class="caps">DST</span> transitions – especially the spring transition when sleep is reduced – have a noticeable effect on our circadian rhythm.
Nonetheless, we cannot confidently claim that these results are of direct practical importance: there is no evidence that the additional <span class="caps">AMI</span> counts in the days after <span class="caps">DST</span> transition are not merely shifted earlier from the following weeks.</p>
<hr>
<h2>Introduction</h2>
<p>This study has a two-fold purpose. First, it compiles all the published data about the effects of <span class="caps">DST</span> on the risk of <span class="caps">AMI</span>, and presents a meta-analysis where the data from multiple countries and years is analyzed in a unified model. On the other hand, it demonstrates the use of Bayesian methods in an analysis or meta-analysis, explaining the thinking behind model specification and quantifying our prior beliefs about the parameters. The software required for reproducing this paper is freely available at <a href="https://github.com/treszkai/BayesianScience">https://github.com/treszkai/BayesianScience</a>.</p>
<p>Sipilä et al. (2016) explain the importance of sleep and its effects on the risk of heart disease:</p>
<blockquote>
<p>Sleep is essential for well-being and its disturbances
have been associated with disruption of numerous
physiological processes and changes in cardiovascular
risk factors (1,2). Sleep disordered breathing has been
associated with risk of coronary heart disease (3,4) and
sleep impairment with prognosis of myocardial infarction
(<span class="caps">MI</span>) (5).</p>
<p>Daylight saving time (<span class="caps">DST</span>) is used in many countries
including the United States and the members of
the European Union for prolonging of sun-light
proportion of day. Clock shifts however alter and disrupt
chronobiological rhythms and impair sleep (7,8) providing
a ‘‘natural experiment’’ for studying the effects of
rhythm and sleep disruptions on the incidence of
vascular events. Although chronobiological factors
have been shown to affect the incidence of <span class="caps">MI</span> (9,10),
studies on the association of <span class="caps">DST</span> and the incidence of <span class="caps">MI</span>
have been partly conflicting. With one exception (11), all
studies show changes in the temporal distribution of <span class="caps">MI</span>
in the week following <span class="caps">DST</span> transitions but the patterns of
change differ (12–15) and there is no agreement about
the impact of these changes on the overall incidence of
<span class="caps">MI</span> (11–16).</p>
</blockquote>
<p>We will see that there is a simple reason for the disagreement between studies: most of the studies have been critically underpowered.</p>
<p>Although the majority of medical research uses frequentist methods, this is not the first meta-analysis in medicine that uses Bayesian statistics. The following are some noteworthy examples:</p>
<ul>
<li>Gelman et al. (2013) present an example for estimating mortality ratios after a myocardial infarction between the control group and a group that uses beta-blockers, using data from 22 independent studies.</li>
<li>Devin Incerti (2015) provides a Bayesian re-analysis of the effects of mammography on breast cancer-related mortality rates.</li>
<li>Yang et al. (2017) analyze 25 randomized controlled trials of prokinetics for the treatment of functional dyspepsia in a Bayesian network meta-analysis.</li>
</ul>
<h3>Methodology shared in most papers</h3>
<p>Following the naming of (Čulić 2013), we refer to the week following the <span class="caps">DST</span> adjustment as “posttransitional week”.</p>
<p>Every study that was included compares the observed <span class="caps">AMI</span> counts against a trend prediction. The trend prediction for <span class="caps">AMI</span> counts on given days – sometimes called “control group” – was usually defined as the average of the respective days on the two weeks before and after the posttransitional week. The analysis of Sandhu et al. (2014) was the only exception, as they used a regression model that included AMIs from all year except the two weeks following the spring and autumn <span class="caps">DST</span> adjustments.</p>
<p>Years on which the <span class="caps">DST</span> adjustment coincided with Easter were usually excluded from the studies. If Easter fell on the 2 weeks following (or preceding) the <span class="caps">DST</span> adjustment, the control period was the two out of three weeks that did not include Easter.</p>
<p>Every paper adjusted the <span class="caps">AMI</span> counts for the shorter (resp. longer) Sunday following a spring (resp. autumn) <span class="caps">DST</span> transition by multiplying the real counts with <script type="math/tex">24/23</script> (resp. <script type="math/tex">24/25</script>). This sometimes resulted in fractional <span class="caps">AMI</span> counts, which we rounded to the nearest integer when treated as an observation.</p>
<h2>Materials and methods</h2>
<h3>Study selection</h3>
<p>We analyzed data from every study that was included in the meta-analysis of Manfredini et al. (2019).</p>
<p>Performing a PubMed search instead of using the list of publications from (Manfredini et al. 2019) would be a tedious process with little benefit: said meta-analysis retrieved 2633 papers dated up to 31 December 2018 (from which 7 were relevant).</p>
<h3>Analyzed data</h3>
<p>From each paper, we extracted the trend predictions and the actual <span class="caps">AMI</span> counts on each day of the spring and autumn posttransitional weeks. When the trend prediction was not available, we divided the total number of <span class="caps">AMI</span> cases by the study length in days. We restricted our analysis to the number of incidences, and ignored all variables that describe incidences, such as age and gender of patient, <span class="caps">STEMI</span> (<span class="caps">ST</span> elevation <span class="caps">MI</span>) or non-<span class="caps">STEMI</span>, or various medications taken prior to the incident.</p>
<h3>Problems with standard statistical tests</h3>
<p>The standard statistical practice for deciding whether there is a difference in a particular variable (such as <span class="caps">AMI</span> counts) between two groups is to use a <em>null hypothesis significance test</em> (<span class="caps">NHST</span>).
Using this method, one defines a <em>null hypothesis</em> as the variable of interest having some predetermined value, which in this case would correspond to zero increase in <span class="caps">AMI</span> counts after a <span class="caps">DST</span> change.
The <span class="caps">NHST</span> answers the question: assuming the null hypothesis is true, what is the probability that data which is generated according to the sampling and testing intentions has a more extreme test statistic than that of the actual observations (Kruschke, Liddell 2018). If this probability is less than some fixed threshold (typically 0.05), the effect is claimed to exist.
The <span class="caps">NHST</span> suffers from a multitude of problems, and has received its fair share of criticism from statisticians.
It encourages black-and-white thinking without allowing uncertainty (claiming that an effect either exists or not, depending on the p-value), it encourages binary classification of effects without quantifying the relationship (<em>statistically</em> significant differences might be of no <em>practical</em> relevance if they are small), and these tests are conducted <em>against</em> a given null hypothesis without any way to gain evidence <em>for</em> the null hypothesis (an inability to refute the null hypothesis is not equal to accepting it).
Recently, The American Statistician released a special issue titled <em>Moving to a World Beyond “p < 0.05”</em> (Wasserstein 2019), together with commentaries from 94 authors.</p>
<p>We can get a more accurate sense of the value of the parameter if instead of testing a hypothesis, we estimate the value of the parameter. The standard tool for this is stating the 95% confidence interval (<span class="caps">CI</span>) for a parameter, which is the set of parameter values that wouldn’t be rejected at the <script type="math/tex">p<0.05</script> level. This is the approach suggested by Cumming (2014) and Cumming and Calin-Jageman (2016), who call it the <em>New Statistics</em>.</p>
<p>While reporting intervals is better than a single value from it (i.e. the p-value), confidence intervals still suffer from deep-rooted flaws. It still encourages black-and-white thinking: parameter values inside the <span class="caps">CI</span> are compatible with the null hypothesis, those outside it are not. Confidence intervals do not give distributional information, i.e. a value close to the limits of the <span class="caps">CI</span> is not “less compatible” with the hypothesis then a value in the middle, nor is a study of large sample size “more confident” than a smaller study (although usually the <span class="caps">CI</span> of a large study is narrower). This binary nature makes it hard to aggregate the results of multiple studies and to perform a meta-analysis accurately. In addition, confidence intervals are also frequently misinterpreted: specifically, the true parameter value is <em>not</em> 95% likely to be inside the <span class="caps">CI</span>, although they are often thought to be.</p>
<p>Kruschke and Liddell (2018) compare approaches to statistical inference along two axes: whether the method uses a frequentist or Bayesian framework, and whether the method compares hypotheses or estimates parameter values. They make a detailed case that Bayesian parameter estimation is superior in most situations to the frequentist methods or Bayesian hypothesis testing, hence the title of the paper, <em>The Bayesian New Statistics</em>.</p>
<h3>Overview of our model and statistical methods</h3>
<p>In this meta-analysis we define a (Bayesian) statistical model for the parameter of interest and our observations. For every paper, we have the following observations: the <span class="caps">AMI</span> counts on each day of the posttransitional week, and the <span class="caps">AMI</span> counts predicted by the trend. The unobserved parameter is the risk ratio (<span class="caps">RR</span>), i.e. the multiplier by which mean <span class="caps">AMI</span> counts increase in the posttransitional week, compared to the same day of an ordinary week. Our description of this parameter initially also include some reasonable uncertainty in our beliefs, quantified in the <em>prior distribution</em>. The goal of the analysis is to derive the <em>posterior probability distribution</em> of the <span class="caps">RR</span> (or <em>posterior</em> for short), which is an adjustment of the prior probabilities based on the likelihood of each parameter value, i.e. the probability that a given parameter value would produce the observed data. Although the posterior is influenced by the prior and the statistical model, this influence can be insubstantial in the face of enough data, as will be the case in this analysis. Finally, the posterior is summarized in a 95% credible interval of parameter values, which is either a central credible interval or a highest density posterior interval.</p>
<h3>Notation</h3>
<p>For a particular study <script type="math/tex">s</script>, <script type="math/tex">t_i^{(s)}</script> denotes the <span class="caps">AMI</span> counts as predicted by the trend model on day <script type="math/tex">i</script> of the posttransitional week (with <script type="math/tex">d = 1,\,\ldots,\,5</script> for Monday, …, Friday after the <span class="caps">DST</span> change) and <script type="math/tex">y_d^{(s)}</script> denotes the observed count on day <script type="math/tex">i</script>. The (unobserved) mean of the distribution of <script type="math/tex">y_d^{(s)}</script> is denoted by <script type="math/tex">x_d^{(s)}</script> – the meaning of this variable will become clear in the next section.
The risk ratio for day <script type="math/tex">d</script> is denoted by <script type="math/tex">r_d^{(s)} = x_d^{(s)} / t_d^{(s)}</script>. Finally, <script type="math/tex">\mathcal D^{(s)}</script> denotes the whole dataset, i.e. all of the observations <script type="math/tex">\{y_1^{(s)},\ldots,y_5^{(s)}\}</script>. To avoid cluttered notation, sometimes the superscript is omitted, resulting in e.g. <script type="math/tex">y_1</script>.</p>
<h3>Poisson distribution</h3>
<p>The <a href="https://en.wikipedia.org/wiki/Poisson_distribution">Poisson distribution</a> is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event. In our case, the “event” is an <span class="caps">AMI</span>, and the fixed interval of time is a day. Although AMIs don’t happen at a constant rate throughout the day, the <a href="https://en.wikipedia.org/wiki/Poisson_distribution#Sums_of_Poisson-distributed_random_variables">sum of Poisson-distributed random variables</a> is also Poisson-distributed, so any day’s total will also be Poisson-distributed.</p>
<p>The distribution has a single parameter, which is a positive real number, and is often denoted <script type="math/tex">λ</script>. The mean (expected value) of <script type="math/tex">\text{Poisson}(λ)</script> is <script type="math/tex">λ</script>, and the standard deviation is <script type="math/tex">\sqrt{λ}</script>. Its probability mass function is shown below for <script type="math/tex">λ=100</script>, along with the 95% highest density interval (<span class="caps">HDI</span>) – the shortest interval that covers 95% of the probability mass.</p>
<p><img alt="Distribution of Poisson plot with mean 100" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/poisson-dist.svg"></p>
<p>The analyzed studies reported the sum of AMIs on a given day over the period of the study (e.g. all posttransitional Tuesdays during the years 2010–2013), never the <span class="caps">AMI</span> counts for individual years. This sum is denoted with <script type="math/tex">y_d</script>, where <script type="math/tex">d</script> signifies the day. We note again that the individual counts are each Poisson-distributed, so their sum is Poisson-distributed too. (However, their <em>average</em> would not be Poisson-distributed.) This means that <script type="math/tex">y_d</script> is sampled from a Poisson distribution whose parameter <script type="math/tex">x_d</script> is the sum of the trend on day <script type="math/tex">d</script> over the period of the study (<script type="math/tex">t_d</script>), multiplied with the <span class="caps">RR</span> for the given day (<script type="math/tex">r_d</script>).</p>
<p>In order for the Poisson assumption to <em>not hold</em> in this analysis, two individuals experiencing an <span class="caps">AMI</span> on a given day need to be statistically dependent <em>conditional on the day’s average</em>. This is not the case during a heat wave or a news broadcast about a major catastrophe, when the AMIs are dependent but not conditionally dependent. The rare scenarios for conditional dependence are when two people partake in a strenuous activity together (such as hiking), or when the <span class="caps">AMI</span> of a person causes an <span class="caps">AMI</span> in another.</p>
<h3>Model of posttransitional <span class="caps">AMI</span> counts</h3>
<p>We perform the analysis using a fixed-effects model, which assumes that the <span class="caps">DST</span> adjustment effects an identical increase in <span class="caps">AMI</span> counts in every country, every year. The independence of region is a strong assumption because the leading hypothesis attributes the increase in myocardial infarctions to the disruption of the circadian rhythm, and those beyond their working age do not necessarily experience sleep loss on a posttransitional Monday. Therefore, we hypothesize that the effect is likely to be lower in countries where the average age of retirement is lower – a random-effects model could account for these differences. The independence of year is a weak assumption.</p>
<p>The model for the <span class="caps">AMI</span> count on a posttransitional Monday is described by the following graph – such a graph is called a Bayes network or a directed graphical model:</p>
<p><img alt="Bayes network for the Monday counts" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/tikz_bayesnet_Mon.png"></p>
<p>Loosely speaking, the arrows denote causal or logical dependencies, where the exact formula for the dependency is shown next to the nodes (in a canonical Bayes network, the formulas are described only in the text). The model can be translated into the following sentences:</p>
<ul>
<li>The observed posttransitional <span class="caps">AMI</span> count on Monday follows a Poisson distribution.</li>
<li>The mean of the posttransitional <span class="caps">AMI</span> count on Monday is equal to the trend count on Monday, multiplied by the <span class="caps">RR</span> on Monday.</li>
<li>Monday’s <span class="caps">RR</span> is a random variable, meaning it has an associated prior belief distribution (which we define below).</li>
</ul>
<h3>Moving to a multi-day model</h3>
<p>The reviewed literature performed hypothesis tests for every day of the posttransitional week – including weekends, sometimes noting a significant difference for Tuesday, but not Monday (Sandhu 2014). Such day-by-day tests of “statistical significance” need not concern themselves of <em>consistency</em> – in the everyday sense of the word –, i.e. that prior to observations we expect any effect to be highest on Monday and wear off as time progresses.</p>
<p>When performing a Bayesian analysis, we <em>must</em> have prior expectations on the expected parameter values – these prior beliefs are then changed according to the model and the observed data, resulting in the posterior distribution. In accordance with the literature, we assume that the effect is constrained to the posttransitional week, and that if there is an effect on Monday, there is some effect on Friday too. We expect no increase on Sunday, the day of the adjustment (after adjusting for the shorter day), because relatively few people wake up at the same time on Sundays (and sleep shorter as a consequence). On Tuesday, Wednesday, Thursday, Friday, we expect the relative increase to be 80%, 60%, 40%, 20% of the increase on Monday (see figure below) – this we call the “linear weekday model”. (This linear assumption will be weakened in a later analysis.) We denote the increase in <span class="caps">RR</span> on Monday with <script type="math/tex">\theta</script> (the only parameter of the model), thus <script type="math/tex">r_\text{Mo} = 1 + \theta</script>, <script type="math/tex">r_\text{Tu} = 1 + 0.8 \cdot \theta</script>, …, <script type="math/tex">r_\text{Fr} = 1 + 0.2 \cdot \theta</script>.</p>
<p>The infarction counts on neighboring days are conditionally independent given <script type="math/tex">\theta</script> (apart from exceptional cases, such as a mass catastrophe), which means we can model the days separately and simply multiply their likelihoods. (Prior to observing the data, it feels <em>very</em> unlikely to us that there would be any effect on Friday, but one paper attempted to measure effects on the 2 and 4 weeks following <span class="caps">DST</span> adjustment, meaning they didn’t think such a long-lasting effect is completely implausible, therefore we consider including Friday as part of the expert opinion.)</p>
<p><img alt="Risk ratio on given days of the posttransitional week under the linear weekday model, for θ=0.5" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/rr_example.svg"></p>
<p>This model of all weekdays is described by the following graph:</p>
<p><img alt="Bayes network for the counts of all weekdays" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/tikz_bayesnet.png"></p>
<p>Here the rectangle means the nodes inside it should be repeated for <script type="math/tex">d = \text{Mo}..\text{Fr})</script> — this rectangle is called a “plate”. A common parameter <script type="math/tex">\theta</script> determines <script type="math/tex">r_d</script> for a given day <script type="math/tex">d</script>, which, together with <script type="math/tex">t_d</script>, determines the number of expected AMIs (<script type="math/tex">x_d</script>) and actual AMIs (<script type="math/tex">y_d</script>).</p>
<h3>Prior beliefs about <span class="caps">RR</span> (spring)</h3>
<p>We would like to estimate the value of a continuous parameter <script type="math/tex">\theta</script>, where the standard procedure is to conduct a one-sided t-test, with the null hypothesis defined as <script type="math/tex">\theta = 0</script>.</p>
<p>Gelman et al. (2013) suggest beginning Bayesian data analysis with a noninformative or <em>weakly informative prior</em> – this avoid biasing the results to any particular value, and lets the posterior represent the data more closely.</p>
<p>I believe <script type="math/tex">\theta</script> is likely to be approximately <script type="math/tex">0.0</script> (i.e., <script type="math/tex">\text{RR} \approx 1</script>, no effect), but it wouldn’t be very surprising if <script type="math/tex">\theta</script> were positive. (I find it very unlikely, less than <script type="math/tex">\approx 0.1\%</script>, that the <span class="caps">RR</span> decreases.) So I would like to place substantial probability mass close to 0.0, and spread the rest on values between <script type="math/tex">0.0</script> and <script type="math/tex">1.0</script> (<script type="math/tex">P(\theta > 1.0) \lessapprox 0.1\%</script>).</p>
<p>We can formalize this description by placing 50-50% of the prior probability mass of either there being zero effect (a Gaussian distribution with standard deviation of 0.01), or there being an increase in <span class="caps">AMI</span> counts, where the increase in <span class="caps">RR</span> has an Exponential(<script type="math/tex">\lambda=0.2^{-1}</script>) prior on it. (An Exponential(<script type="math/tex">\lambda=0.2^{-1}</script>) distribution has a mean of <script type="math/tex">0.2</script>.) This distribution is plotted on the figure below.</p>
<p><img alt="Prior distribution of Monday RR" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/prior_Monday.svg"></p>
<h3>Prior beliefs about the <span class="caps">RR</span> (autumn)</h3>
<p>The <span class="caps">AMI</span> counts on the autumn posttransitional week used the same model as the spring counts, but it assumed an (improper) uniform prior on <script type="math/tex">\theta</script>. (This prior is improper because no distribution exists that is uniformly distributed on the whole linear number line. In practice we would get the same posterior if we assumed a Uniform(−2,+2) prior.)</p>
<h3>Summary of assumptions</h3>
<p>Every statistical test makes assumptions about the data, but in most reports using null hypothesis tests significance tests, these assumptions are never mentioned, instead they are implicit in the performed tests.
Therefore, statistics is often sold as a sort of alchemy that transmutes randomness into certainty, an “uncertainty laundering” that begins with data and concludes with success as measured by statistical significance (Gelman 2016).
I view it as a <em>strength</em> of Bayesian data analysis that these assumptions must be stated explicitly. To summarize this section, we make the following assumptions in this analysis:</p>
<ol>
<li>Every region that use <span class="caps">DST</span> has the same <span class="caps">RR</span> in every year.</li>
<li>Any effect is limited to the posttransitional weekdays, and the effect is highest on Monday, 20% less on Tuesday, and so on until 0% on Saturday.</li>
<li>Our prior belief on the spring <span class="caps">RR</span> is split half-half between <script type="math/tex">1.0</script> and all values greater than <script type="math/tex">1</script>, with the probability decaying exponentially at a rate of <script type="math/tex">0.2^{-1}</script>. We make no prior assumptions about the autumn <span class="caps">RR</span>.</li>
</ol>
<h3>Posterior calculations analytically</h3>
<p>We performed our calculations for the fixed-effects model in spring analytically, using custom software written in Python. The result of these calculations was a 95% central credible interval, which is an interval of parameter values containing 95% of the posterior probability, with 2.5% on the negative and positive ends. This is not equal to the <span class="caps">HDI</span> when the distribution is skewed, but is usually a good approximation.</p>
<h3>Posterior calculations with Monte Carlo methods</h3>
<p>We also performed our posterior calculations with Monte Carlo methods using the open source statistical modeling software <a href="https://mc-stan.org/">Stan</a>. Models in Stan are written using its own description language (which comes with <a href="https://mc-stan.org/users/documentation/">extensive documentation</a> and a <a href="https://discourse.mc-stan.org/">supportive community</a>), and they need to be first compiled into binary form using an interface in R, Python, or other languages. Then, after providing the observable data to the model, Stan draws samples from the posterior distribution of the parameters, and calculates the 95% highest posterior density interval (<span class="caps">HDI</span>, a.k.a. <span class="caps">HPD</span>) – the interval that covers the most plausible parameter values. For most practical purposes, 1000 independent samples would be enough, but we drew 50,000 samples to accurately assess the equality to the analytic solution.</p>
<p>The code for the fixed-effects linear weekday Stan model is as follows:</p>
<div class="highlight"><pre><span></span><code><span class="kn">data</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">DAYS</span><span class="p">;</span> <span class="c1">// Number of days</span>
<span class="kt">int</span> <span class="n">STUDIES</span><span class="p">;</span> <span class="c1">// Number of studies</span>
<span class="kt">real</span> <span class="n">NORMAL_SIGMA</span><span class="p">;</span> <span class="c1">// The standard deviation of the normal component of the prior</span>
<span class="kt">real</span> <span class="n">EXPON_BETA</span><span class="p">;</span> <span class="c1">// The beta parameter of the exponential component of the prior</span>
<span class="c1">// The observed AMI counts and the trend predictions, for each day of each study</span>
<span class="kt">int</span><span class="o"><</span><span class="k">lower</span><span class="p">=</span><span class="mf">0</span><span class="o">></span> <span class="n">ami_obs</span><span class="p">[</span><span class="n">STUDIES</span><span class="p">,</span> <span class="n">DAYS</span><span class="p">];</span>
<span class="kt">real</span><span class="o"><</span><span class="k">lower</span><span class="p">=</span><span class="mf">0</span><span class="o">></span> <span class="n">ami_trend</span><span class="p">[</span><span class="n">STUDIES</span><span class="p">,</span> <span class="n">DAYS</span><span class="p">];</span>
<span class="p">}</span>
<span class="kn">parameters</span> <span class="p">{</span>
<span class="c1">// Monday RR - 1.</span>
<span class="c1">// (We cannot model RR_Mon directly because cannot assign a</span>
<span class="c1">// common distribution for that.)</span>
<span class="c1">// Its probabilistic value is assigned in the model block below.</span>
<span class="kt">real</span> <span class="n">rr_Mon_minus_1</span><span class="p">;</span>
<span class="p">}</span>
<span class="kn">transformed parameters</span> <span class="p">{</span>
<span class="c1">// The RR for every day</span>
<span class="kt">real</span> <span class="n">rr_day</span><span class="p">[</span><span class="n">DAYS</span><span class="p">];</span>
<span class="c1">// The posttransitional AMI counts for every day of every study.</span>
<span class="kt">real</span> <span class="n">ami_dst_mean</span><span class="p">[</span><span class="n">STUDIES</span><span class="p">,</span> <span class="n">DAYS</span><span class="p">];</span>
<span class="c1">// Specifying the RR for every day, using the linear weekday model.</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="k">in</span> <span class="mf">1</span><span class="o">:</span><span class="n">DAYS</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rr_day</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">rr_Mon_minus_1</span> <span class="o">*</span> <span class="p">(</span><span class="n">DAYS</span> <span class="o">+</span> <span class="mf">1</span> <span class="o">-</span> <span class="n">i</span><span class="p">)</span> <span class="o">/</span> <span class="n">DAYS</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="n">s</span> <span class="k">in</span> <span class="mf">1</span><span class="o">:</span><span class="n">STUDIES</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="k">in</span> <span class="mf">1</span><span class="o">:</span><span class="n">DAYS</span><span class="p">)</span> <span class="p">{</span>
<span class="n">ami_dst_mean</span><span class="p">[</span><span class="n">s</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">ami_trend</span><span class="p">[</span><span class="n">s</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="o">*</span> <span class="n">rr_day</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kn">model</span> <span class="p">{</span>
<span class="c1">// Mixture models are specified using the construct below:</span>
<span class="c1">// target += log_sum_exp(c1 * XXX_lpdf(x | p1), c2 * YYY_lpdf(x | p2));</span>
<span class="k">target +=</span> <span class="nb">log_sum_exp</span><span class="p">(</span><span class="nb">normal_lpdf</span><span class="p">(</span><span class="n">rr_Mon_minus_1</span> <span class="p">|</span> <span class="mf">0</span><span class="p">,</span> <span class="n">NORMAL_SIGMA</span><span class="p">),</span>
<span class="nb">exponential_lpdf</span><span class="p">(</span><span class="n">rr_Mon_minus_1</span> <span class="p">|</span> <span class="n">EXPON_BETA</span><span class="p">));</span>
<span class="c1">// Finally, the observations are drawn from a Poisson distribution.</span>
<span class="k">for</span> <span class="p">(</span><span class="n">s</span> <span class="k">in</span> <span class="mf">1</span><span class="o">:</span><span class="n">STUDIES</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="k">in</span> <span class="mf">1</span><span class="o">:</span><span class="n">DAYS</span><span class="p">)</span> <span class="p">{</span>
<span class="n">ami_obs</span><span class="p">[</span><span class="n">s</span><span class="p">][</span><span class="n">i</span><span class="p">]</span> <span class="o">~</span><span class="w"> </span><span class="nb">poisson</span><span class="p">(</span><span class="n">ami_dst_mean</span><span class="p">[</span><span class="n">s</span><span class="p">][</span><span class="n">i</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>The <code>data</code> and <code>parameters</code> blocks declare the observed quantities and the unobserved parameters, without specifying their distribution.</p>
<p>The <code>transformed parameters</code> block contains all quantities that can be deterministically derived from the parameters.</p>
<p>The <code>model</code> block describes both the prior distributions for the parameters and the likelihood functions.</p>
<h4>Sampling using the Python interface</h4>
<p>We can compile the Stan model and sample from it in Python using <a href="https://pystan.readthedocs.io/">PyStan</a>. Once the software and its dependencies are installed, we can use the following code to draw 50,000 samples from the posterior and plot the results. On my computer, the model compilation takes about a minute, the sampling a few seconds.</p>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">pystan</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="c1"># 6-long list of 5-long lists integers (weekday observations)</span>
<span class="n">all_obs</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">1735</span><span class="p">,</span> <span class="mi">1644</span><span class="p">,</span> <span class="mi">1555</span><span class="p">,</span> <span class="mi">1522</span><span class="p">,</span> <span class="mi">1467</span><span class="p">],</span> <span class="c1"># Janszky and Ljung 2008</span>
<span class="p">[</span><span class="mi">28</span><span class="p">,</span> <span class="mi">28</span><span class="p">,</span> <span class="mi">26</span><span class="p">,</span> <span class="mi">23</span><span class="p">,</span> <span class="mi">24</span><span class="p">],</span> <span class="c1"># Jiddou et al. 2013</span>
<span class="o">...</span>
<span class="p">]</span>
<span class="c1"># 6-long list of 5-long lists floats</span>
<span class="n">all_trend</span> <span class="o">=</span> <span class="o">...</span>
<span class="n">stan_data</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'STUDIES'</span><span class="p">:</span> <span class="mi">6</span><span class="p">,</span>
<span class="s1">'DAYS'</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
<span class="s1">'NORMAL_SIGMA'</span><span class="p">:</span> <span class="mf">0.01</span><span class="p">,</span>
<span class="s1">'EXPON_BETA'</span><span class="p">:</span> <span class="mi">1</span><span class="o">/</span><span class="mf">0.2</span><span class="p">,</span>
<span class="s1">'ami_obs'</span><span class="p">:</span> <span class="n">all_obs</span><span class="p">,</span>
<span class="s1">'ami_trend'</span><span class="p">:</span> <span class="n">all_trend</span>
<span class="p">}</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">pystan</span><span class="o">.</span><span class="n">StanModel</span><span class="p">(</span><span class="n">model_file</span><span class="o">=</span><span class="s1">'dst_model.stan'</span><span class="p">)</span>
<span class="n">fit</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">sampling</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">stan_data</span><span class="p">,</span> <span class="nb">iter</span><span class="o">=</span><span class="mi">50000</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="n">pystan</span><span class="o">.</span><span class="n">stansummary</span><span class="p">())</span>
<span class="n">plt</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">fit</span><span class="p">[</span><span class="s1">'rr_day[1]'</span><span class="p">]);</span> <span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<h4>Sampling using the R interface</h4>
<p>The R interface of Stan is called <a href="https://github.com/stan-dev/rstan/">RStan</a>, and can be used as follows:</p>
<div class="highlight"><pre><span></span><code><span class="nf">library</span><span class="p">(</span><span class="s">"rstan"</span><span class="p">)</span> <span class="c1"># observe startup messages</span>
<span class="n">stan_data</span> <span class="o"><-</span> <span class="nf">list</span><span class="p">(</span><span class="n">STUDIES</span> <span class="o">=</span> <span class="m">6</span><span class="p">,</span>
<span class="n">DAYS</span> <span class="o">=</span> <span class="m">5</span><span class="p">,</span>
<span class="n">NORMAL_SIGMA</span> <span class="o">=</span> <span class="m">0.01</span><span class="p">,</span>
<span class="n">EXPON_BETA</span> <span class="o">=</span> <span class="m">1</span><span class="o">/</span><span class="m">0.2</span><span class="p">,</span>
<span class="n">ami_obs</span> <span class="o">=</span> <span class="n">all_obs</span><span class="p">,</span>
<span class="n">ami_trend</span> <span class="o">=</span> <span class="n">all_trend</span><span class="p">)</span>
<span class="n">fit</span> <span class="o"><-</span> <span class="nf">stan</span><span class="p">(</span><span class="n">file</span> <span class="o">=</span> <span class="s">'dst_model.stan'</span><span class="p">,</span> <span class="n">data</span> <span class="o">=</span> <span class="n">stan_data</span><span class="p">,</span> <span class="n">iter</span> <span class="o">=</span> <span class="m">50000</span><span class="p">)</span>
<span class="nf">hist</span><span class="p">(</span><span class="nf">extract</span><span class="p">(</span><span class="n">fit</span><span class="p">)</span><span class="o">$$</span><span class="n">rr_day</span><span class="p">[</span><span class="m">1</span><span class="p">]</span>
</code></pre></div>
<h3>Effect of study size in a Bayesian framework</h3>
<p>For a small study, i.e. if the trend and observed <span class="caps">AMI</span> counts are low, we want to see a very slight change in the prior; for a large study, we want to see a bigger change.</p>
<p>Two factors should play into this. First, if the trend predicts low counts, then we are likely to observe relatively big fluctuations: observing 12 heart attacks on a day when the long-term average is 10 represents a +20% increase, yet it occurs once every 3 days on average. Second, if the study was small and the trend is estimated from only a few weeks’ data, our <em>estimate</em> of the trend itself has greater variance. This second factor is not yet modeled in our work, but in small studies like that of Čulić (2013), this too could play a role.</p>
<p>To see the difference between a small and a large study, we visualize the prior and the posterior for the following scenarios:</p>
<ul>
<li>Observation higher than trend, small sample size (top left);</li>
<li>Observation equals trend, large sample size (top right);</li>
<li>Observation lower than trend, large sample size (bottom left);</li>
<li>Observation higher than trend, large sample size (bottom right).</li>
</ul>
<p><img alt="Posteriors of four example sample counts for trend and observation" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/test-posteriors.svg"></p>
<p>When the sample size is small, there is only a slight change from prior to posterior. With a large sample size, the prior beliefs barely have an effect on the posterior. (In the lower right plot, the posterior peaks at more than 1.1 because with 1100 <span class="caps">AMI</span> every day, the linear weekday model fits better with a larger <span class="caps">RR</span>.)</p>
<h2>Results</h2>
<h3>Relevant studies</h3>
<p>The list of studies analyzed are identical to those analyzed in (Manfredini et al., 2019):</p>
<ul>
<li>Janszky and Ljung (2008)</li>
<li>Janszky et al. (2012)</li>
<li>Čulić (2013)</li>
<li>Jiddou et al. (2013)</li>
<li>Sandhu et al. (2014)</li>
<li>Kirchberger et al. (2015)</li>
<li>Sipilä et al. (2016)</li>
</ul>
<p>We excluded the study of Janszky et al. (2012), as the population is a strict subset of (Janszky and Ljung, 2008), with no additional information that is relevant for our analysis. The meta-analysis of Manfredini et al. (2019) did not exclude this study, which biased their results significantly, as the population size of this study is the second largest of all.</p>
<p>Key characteristics of the above studies can be found in the table below, with more details in the appendix.</p>
<table>
<thead>
<tr>
<th>Paper</th>
<th>Sun</th>
<th>Mon</th>
<th>Tue</th>
<th>Wed</th>
<th>Thu</th>
<th>Fri</th>
<th>Sat</th>
</tr>
</thead>
<tbody>
<tr>
<td>(Janszky and Ljung, 2008)</td>
<td>(1374)</td>
<td>(1636)</td>
<td>(1494)</td>
<td>(1471)</td>
<td>(1484)</td>
<td>(1422)</td>
<td>(1370)</td>
</tr>
<tr>
<td></td>
<td>1439</td>
<td>1735</td>
<td>1644</td>
<td>1555</td>
<td>1522</td>
<td>1467</td>
<td>1414</td>
</tr>
<tr>
<td>(Jiddou et al., 2013)</td>
<td>(13)</td>
<td>(29)</td>
<td>(20)</td>
<td>(23)</td>
<td>(17)</td>
<td>(25)</td>
<td>(16)</td>
</tr>
<tr>
<td></td>
<td>23</td>
<td>28</td>
<td>28</td>
<td>26</td>
<td>23</td>
<td>24</td>
<td>18</td>
</tr>
<tr>
<td>(Čulić, 2013)</td>
<td>(6)</td>
<td>(7)</td>
<td>(6)</td>
<td>(7)</td>
<td>(6)</td>
<td>(6)</td>
<td>(5)</td>
</tr>
<tr>
<td></td>
<td>5</td>
<td>14</td>
<td>6</td>
<td>9</td>
<td>6</td>
<td>5</td>
<td>8</td>
</tr>
<tr>
<td>(Kirchberger et al., 2015)</td>
<td>(70)</td>
<td>(70)</td>
<td>(70)</td>
<td>(70)</td>
<td>(70)</td>
<td>(70)</td>
<td>(70)</td>
</tr>
<tr>
<td></td>
<td>66</td>
<td>85</td>
<td>83</td>
<td>76</td>
<td>77</td>
<td>85</td>
<td>60</td>
</tr>
<tr>
<td>(Sandhu et al., 2014)</td>
<td>(111)</td>
<td>(138)</td>
<td>(127)</td>
<td>(125)</td>
<td>(120)</td>
<td>(120)</td>
<td>(110)</td>
</tr>
<tr>
<td></td>
<td>108</td>
<td>170</td>
<td>125</td>
<td>122</td>
<td>117</td>
<td>117</td>
<td>114</td>
</tr>
<tr>
<td>(Sipilä et al., 2016)</td>
<td>(208)</td>
<td>(269)</td>
<td>(243)</td>
<td>(259)</td>
<td>(227)</td>
<td>(227)</td>
<td>(198)</td>
</tr>
<tr>
<td></td>
<td>201</td>
<td>229</td>
<td>253</td>
<td>254</td>
<td>262</td>
<td>242</td>
<td>179</td>
</tr>
</tbody>
</table>
<p><em>(Spring <span class="caps">AMI</span> counts. Trend predictions in parentheses, under them the number of incidences on the posttransitional week. Total count on the posttransitional week: 14,024.)</em></p>
<table>
<thead>
<tr>
<th>Paper</th>
<th>Sun</th>
<th>Mon</th>
<th>Tue</th>
<th>Wed</th>
<th>Thu</th>
<th>Fri</th>
<th>Sat</th>
</tr>
</thead>
<tbody>
<tr>
<td>(Janszky and Ljung, 2008)</td>
<td>(1780)</td>
<td>(2140)</td>
<td>(1991)</td>
<td>(1910)</td>
<td>(1941)</td>
<td>(1949)</td>
<td>(1781)</td>
</tr>
<tr>
<td></td>
<td>1777</td>
<td>2038</td>
<td>1958</td>
<td>1895</td>
<td>1916</td>
<td>1977</td>
<td>1732</td>
</tr>
<tr>
<td>(Jiddou et al., 2013)</td>
<td>(18)</td>
<td>(24)</td>
<td>(21)</td>
<td>(27)</td>
<td>(22)</td>
<td>(24)</td>
<td>(20)</td>
</tr>
<tr>
<td></td>
<td>11</td>
<td>34</td>
<td>25</td>
<td>19</td>
<td>20</td>
<td>18</td>
<td>30</td>
</tr>
<tr>
<td>(Kirchberger et al., 2015)</td>
<td>(67)</td>
<td>(67)</td>
<td>(67)</td>
<td>(67)</td>
<td>(67)</td>
<td>(67)</td>
<td>(67)</td>
</tr>
<tr>
<td></td>
<td>60</td>
<td>57</td>
<td>77</td>
<td>73</td>
<td>77</td>
<td>84</td>
<td>60</td>
</tr>
<tr>
<td>(Sandhu et al., 2014)</td>
<td>(86)</td>
<td>(107)</td>
<td>(99)</td>
<td>(97)</td>
<td>(93)</td>
<td>(93)</td>
<td>(85)</td>
</tr>
<tr>
<td></td>
<td>89</td>
<td>102</td>
<td>79</td>
<td>93</td>
<td>104</td>
<td>86</td>
<td>99</td>
</tr>
<tr>
<td>(Sipilä et al., 2016)</td>
<td>(159)</td>
<td>(197)</td>
<td>(193)</td>
<td>(170)</td>
<td>(201)</td>
<td>(178)</td>
<td>(157)</td>
</tr>
<tr>
<td></td>
<td>160</td>
<td>214</td>
<td>180</td>
<td>198</td>
<td>199</td>
<td>172</td>
<td>153</td>
</tr>
<tr>
<td>(Čulić, 2013)</td>
<td>(6)</td>
<td>(7)</td>
<td>(6)</td>
<td>(7)</td>
<td>(6)</td>
<td>(6)</td>
<td>(5)</td>
</tr>
<tr>
<td></td>
<td>7</td>
<td>9</td>
<td>12</td>
<td>6</td>
<td>12</td>
<td>5</td>
<td>4</td>
</tr>
</tbody>
</table>
<p><em>(Autumn <span class="caps">AMI</span> counts. Trend predictions in parentheses, under them the number of incidences on the posttransitional week. Total count on the posttransitional week: 15,921.)</em></p>
<h3><span class="caps">AMI</span> risk after spring transition</h3>
<p>The posteriors after the individual papers are shown below, along with their 95% central credible interval (CCrI).</p>
<p><img alt="Forest plot that shows the posterior after the individual papers" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/forest_plot.svg"></p>
<p>The width of the 95% CCrI is a measure of the precision of the estimate. The 95% CCrI after (Janszky and Ljung, 2008) and (Sipilä et al. 2016) are comparably narrow, but they are centered around 1.085 and 1.001, respectively. In fact, as we can see from the likelihood functions (not shown here), the study of Sipilä et al. (2016) presents a case for a slight <em>decrease</em> in <span class="caps">AMI</span> risk under this model.</p>
<p>In the fixed effects model the posterior is weighted heavily towards the study with the largest sample size (Janszky and Ljung 2008), and the other studies barely play a role.
Specifically, the posterior mean of the <span class="caps">RR</span> is 107.7% (95% central credible interval: <script type="math/tex">[104.7\%, 110.7\%]</script>) – the posterior is shown below. We emphasize again that the relative weights of the studies is not arbitrary, but is fully determined by the model and the data through the rules of probability theory.</p>
<p><img alt="Posterior probability after every paper included in this analysis" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/combined_posterior.svg"></p>
<p>We arrive at the same posterior when drawing samples from it through a Monte Carlo method with Stan. Furthermore, as the tails of posterior are symmetric, the 95% highest density interval of [104.8%, 110.7%] closely aligns with the 95% central credible interval obtained earlier ([104.7%, 110.7%]). (This fact merely verifies that the two methods compute the model correctly, it does not provide additional evidence about the quality of the data.)</p>
<p><img alt="Posterior after spring data – with Stan" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/posterior_mc.svg"></p>
<p>The studies together provide so many data points that the choice of prior does not play an important role. Assuming a uniform prior on the risk ratio, i.e. assuming that we have no more <em>prior</em> evidence for +2% than for +20% or −30% change in risk, we arrive at practically the same posterior, and a 95% <span class="caps">HDI</span> of [104.7%, 110.7%].</p>
<p><img alt="Posterior after spring data, uniform prior" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/posterior_mc_uniform.svg"></p>
<h4>Exponential weekday model</h4>
<p>The exponential weekday model relaxes the assumption of linear decrease in <span class="caps">RR</span> throughout the week, and instead models the daily RRs as exponentially decreasing. That is, for a parameter <script type="math/tex">\alpha \in [0,1]</script>, the risk ratios are determined as:</p>
<ul>
<li>
<script type="math/tex">r_\text{Mon} = 1 + \theta</script>,</li>
<li>
<script type="math/tex">r_\text{Tue} = 1 + \alpha \cdot \theta</script>,</li>
<li>
<script type="math/tex">r_\text{Wed} = 1 + \alpha^2 \cdot \theta</script>,</li>
<li>etc.</li>
</ul>
<p>Assuming a uniform prior on both <script type="math/tex">\alpha</script> and <script type="math/tex">\theta</script>, the posterior for this model looks as follows:</p>
<p><img alt="Posterior of alpha and theta visualized together" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/posterior_mc_exp.svg"></p>
<p>I expected the posterior on <script type="math/tex">\alpha</script> to be centered much closer to zero (meaning a rapid decrease in risk after Monday), but the posterior shows the opposite: most of the plausible values of <script type="math/tex">\alpha</script> correspond to an <script type="math/tex">r_\text{Fri} / r_\text{Mon}</script> ratio greater than the 0.2 ratio assumed previously (<script type="math/tex">0.7^4 \approx 0.24</script>). The 95% <span class="caps">HDI</span> for <script type="math/tex">\theta</script> is [104.0%, 110.2%] (mean 107.1%), which is close to the linear weekday model, and the 95% <span class="caps">HDI</span> for <script type="math/tex">\alpha</script> is [0.66, 1.0] (mean 0.83). The figure below shows the risk ratios over the week for 20 of the sampled combinations of <script type="math/tex">(\alpha, \theta)</script>.</p>
<p><img alt="Risk ratios over the week for 20 sampled combinations of alpha-theta" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/risk_ratios_exp.svg"></p>
<p>Over the five weekdays this posterior corresponds to an average risk ratio of 105.0% (95% <span class="caps">HDI</span>: [103.1%, 107.0%]). Assuming an affected population of 1.6 billion globally, with <span class="caps">AMI</span> rates standard across the <span class="caps">USA</span> <a href="https://www.cdc.gov/heartdisease/heart_attack.htm"><script type="math/tex">^\textsf{[source]}</script></a>, this means that over the whole posttransitional week the an additional 2700 people experience <span class="caps">AMI</span> (95% <span class="caps">HDI</span>: [1600, 3700]), on top of the regular 53,000 per week.</p>
<h3><span class="caps">AMI</span> risk after autumn transition</h3>
<p>The posterior for the autumn data, using the linear weekday model with uniform prior on <script type="math/tex">\theta</script> is shown below. The 95% <span class="caps">HDI</span> of [95.1%, 100.3%] suggests a decrease in <span class="caps">AMI</span> risk, but the hypothesis of “no change in risk” (<script type="math/tex">\theta = 100.0\%</script>) is also compatible with the data.</p>
<p><img alt="Posterior after autumn data" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/posterior_mc_autumn.svg"></p>
<p>Globally, this translates to a change of <span class="caps">AMI</span> counts over the whole week of −700 (95% <span class="caps">HDI</span> [−1600, +100]), from the original 53,000.</p>
<h2>Visualizing the observations and the posterior predictive distribution</h2>
<h3>Posterior predictive distribution</h3>
<p>In the figure below we visualize the posterior predictive distribution (for each day of each paper) on the spring posttransitional week, together with the actual observations.</p>
<p>These predictive distributions on <script type="math/tex">\tilde y</script> can be calculated by integrating the likelihoods <script type="math/tex">P(\tilde y \given \theta)</script> over the parameter space, weighted by the posterior probability of the parameter values <script type="math/tex">p(\theta \given \mathcal D)</script>, using the following formula:</p>
<script type="math/tex; mode=display">% <![CDATA[
P(\tilde y \given \mathcal D) =
\int P(\tilde y \given \theta, \mathcal D) \,d\theta =
\int P(\tilde y \given \theta) p(\theta \given \mathcal D) \,d\theta %]]></script>
<p><img alt="Posterior predictive distribution" src="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/posterior_predictive_95.svg"></p>
<p><em>(Posterior predictive distribution for spring.)</em></p>
<p>Only the Monday observation of (Sipilä et al., 2016) falls out of the 95% central credible interval (CCrI), and in addition the Thursday observation of (Sipilä et al., 2016) and the Monday observation of (Čulić, 2013) falls out of the 90% CCrI (shown <a href="https://www.treszkai.com/2019/11/11/dst-vs-ami/figs/posterior_predictive_90.svg">here</a>), indicating a good fit of the model.</p>
<h2>Further research</h2>
<p>The importance of this issue depends on whether the increase in AMIs on the posttransitional week is merely a shift from the weeks afterwards. In other words, how many of these additional AMIs would have been asymptomatic, had it not been for the <span class="caps">DST</span>? We suspect that this number is quite low, because effectively the transition shifts the sleep schedule by an hour, which happens relatively often (e.g. when traveling), and single-day sleep deprivations are even more common. One way to approach this question is to collect the <span class="caps">AMI</span> counts in the few weeks following a <span class="caps">DST</span> transition, and compare the results obtained from regions with <span class="caps">DST</span> and regions without <span class="caps">DST</span>.</p>
<p>The main deficiency of this meta-analysis is the assumption of equal effects regardless of country, while using the fixed effects model. This assumption could be relaxed in a random effects model, although that would introduce a subjective choice of inter-country variance, making the results harder to interpret correctly and simpler to misinterpret.<sup><a href="#fn-misinterpret">[fn-1]</a><a id="fn-src-misinterpret"></a> ↓</sup></p>
<p>As the absolute effect of <span class="caps">DST</span> transitions on <span class="caps">AMI</span> incidences is not substantial (given the low base rate), even on a global scale, I suggest no further research on this specific topic.<sup><a href="#fn-further">[fn-2]</a><a id="fn-src-further"></a> ↓</sup> There are many research areas around either sleep or cardiovascular health that are more important.</p>
<h2>Conclusion</h2>
<p>A standard argument against Bayesian methods is that the subjective choice of prior influences the results arbitrarily. Although this is a philosophical question, we believe meaningful and consistent probabilistic inference cannot be done without describing our initial beliefs and defining how different parameter values would result in different observations. However, in our case the likelihood of the observed data dominated the prior, rendering the choice of prior almost irrelevant.</p>
<p>Our analysis showed an increase in <span class="caps">AMI</span> risk during spring (relative risk increase 5–11% on Monday, less on later days), which translates to an additional 1600–3700 <span class="caps">AMI</span> incidences over the whole affected period. The data from the autumn transition showed either no change or a slight decrease in <span class="caps">AMI</span> risk (at most 5% relative risk decrease), translating to an estimated change in incidence counts somewhere between −1600 and +100.
These figures alone do not provide an argument against the institution of <span class="caps">DST</span>, especially without evidence that these changes are not merely the result of future <span class="caps">AMI</span> incidences advanced (in spring) or postponed (in autumn), which is the default position.
However, the analysis provides strong evidence for the hypothesis that our body can react negatively to a single hour shift in our sleep cycles, which should be a crucial factor in the evaluation of <span class="caps">DST</span>, and shows the importance of a consistent sleep schedule.</p>
<h2>License</h2>
<p><a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br /><i><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">The effects of daylight savings time adjustment on the incidence rate of acute myocardial infarction: a Bayesian meta-analysis</span></i> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://treszkai.github.io/2019/11/11/dst-vs-ami" property="cc:attributionName" rel="cc:attributionURL">Laszlo Treszkai</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a> (<span class="caps">CC</span>-<span class="caps">BY</span>-4.0).</p>
<p>The data presented in the <a href="#Relevant-studies">Relevant studies</a> section belong to the original authors and they do not fall under the above <span class="caps">CC</span>-<span class="caps">BY</span>-4.0 license.</p>
<p>The software used for this analysis is distributed under the <span class="caps">MIT</span> license.</p>
<p>Please cite this work as follows:</p>
<p>Laszlo Treszkai. 2019. <em>The effects of daylight savings time adjustment on the incidence rate of acute myocardial infarction: a Bayesian meta-analysis</em>. <a href="http://treszkai.github.io/2019/11/11/dst-vs-ami">http://treszkai.github.io/2019/11/11/dst-vs-ami</a></p>
<p>BibTeX:</p>
<div class="highlight"><pre><span></span><code><span class="nc">@misc</span><span class="p">{</span><span class="err">,</span>
<span class="nl">title</span> <span class="err">=</span> <span class="err">{The</span> <span class="err">effects</span> <span class="err">of</span> <span class="err">daylight</span> <span class="err">savings</span> <span class="err">time</span> <span class="err">adjustment</span> <span class="err">on</span> <span class="err">the</span> <span class="err">incidence</span> <span class="err">rate</span> <span class="err">of</span> <span class="err">acute</span> <span class="err">myocardial</span> <span class="err">infarction:</span> <span class="err">a</span> <span class="err">{B</span><span class="p">}</span><span class="c">ayesian meta-analysis},</span>
<span class="c">author = {Laszlo Treszkai},</span>
<span class="c">howpublished = {\url{http://treszkai.github.io/2019/11/11/dst-vs-ami}},</span>
<span class="c">% note = {Accessed: yyyy-mm-dd} % Optional. The document at this URL is not going to change.</span>
<span class="c">year = {2019}</span>
<span class="c">month = {oct}</span>
<span class="c">}</span>
</code></pre></div>
<h1>References</h1>
<p>Cumming, G. (2014). <em>The new statistics why and how.</em> Psychological Science, 25(1), 7–29.</p>
<p>Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. <em>Bayesian Data Analysis.</em> <a href="http://www.stat.columbia.edu/~gelman/book/">link</a></p>
<p>Ronald L. Wasserstein, Nicole A. Lazar. 2016. <em>The <span class="caps">ASA</span> Statement on p-Values: Context, Process, and Purpose.</em> The American Statistician. Volume 70, Issue 2, pp. 129-133. <a href="https://doi.org/10.1080/00031305.2016.1154108">link (<span class="caps">OA</span>)</a></p>
<p>Ronald L. Wasserstein, Allen L. Schirm <span class="amp">&</span> Nicole A. Lazar. 2019. <em>Moving to a World Beyond “p < 0.05”.</em> Volume 73, pp. 1–19. <a href="https://doi.org/10.1080/00031305.2019.1583913">link (<span class="caps">OA</span>)</a></p>
<p>John K. Kruschke, Torrin M. Liddell, 2018. <em>The Bayesian New Statistics.</em> Psychonomic Bulletin <span class="amp">&</span> Review. Volume 25, Issue 1, pp 178–206. <a href="https://link.springer.com/article/10.3758/s13423-016-1221-4">link (<span class="caps">OA</span>)</a></p>
<p>Amneet Sandhu, Milan Seth, Hitinder S. Gurm. 2014. <em>Daylight savings time and myocardial infarction.</em> Open Heart. <a href="http://dx.doi.org/10.1136/openhrt-2013-000019">link</a></p>
<p>Roberto Manfredini, Fabio Fabbian, Rosaria Cappadona, Alfredo De Giorgi, Francesca Bravi, Tiziano Carradori, Maria Elena Flacco, Lamberto Manzoli. 2019.
<em>Daylight Saving Time and Acute Myocardial Infarction: A Meta-Analysis</em>. Journal of Clinical Medicine. 2019, <em>8</em>, 404; <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6463000/">link</a></p>
<p>Kirchberger et al. 2015. <em>Are daylight saving time transitions associated with changes in myocardial infarction incidence? Results from the German <span class="caps">MONICA</span>/<span class="caps">KORA</span> Myocardial Infarction Registry</em>. <span class="caps">BMC</span> Public Health. 2015; 15: 778. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4535383/">link</a></p>
<p>Janszky and Ljung. 2008. <em>Shifts to and from Daylight Saving Time and Incidence of Myocardial Infarction</em>. The New England Journal of Medicine. <span class="caps">BMC</span> Public Health. 359; 18. <a href="https://www.nejm.org/doi/full/10.1056/NEJMc0807104">link</a></p>
<p>Viktor Čulić. 2013. <em>Daylight saving time transitions and acute myocardial infarction</em>. Chronobiology International. 2013; 30(5): 662–668. <a href="https://www.tandfonline.com/doi/abs/10.3109/07420528.2013.775144">link</a></p>
<p>Janszky, Ahnve, Ljung, Mukamal, Gautam, Wallentin, Stenestrand. 2012. <em>Daylight saving time shifts and incidence of acute myocardial infarction – Swedish Register of Information and Knowledge About Swedish Heart Intensive Care Admissions (<span class="caps">RIKS</span>-<span class="caps">HIA</span>)</em>. Sleep Medicine 13 (2012) 237–242. <a href="https://www.sciencedirect.com/science/article/abs/pii/S1389945711003832">link</a></p>
<p>Monica R. Jiddou, <span class="caps">MD</span>, Mark Pica, <span class="caps">BS</span>, Judy Boura, <span class="caps">MS</span>, Lihua Qu, <span class="caps">MS</span>, and Barry A. Franklin, PhD. 2013. <em>Incidence of Myocardial Infarction With Shifts to and From Daylight Savings Time</em>. The American Journal of Cardiology. Volume 111, Issue 5, Pages 631–635. <a href="http://dx.doi.org/10.1016/j.amjcard.2012.11.010">link</a></p>
<p>Jussi <span class="caps">O.T.</span> Sipilä, Päivi Rautava <span class="amp">&</span> Ville Kytö. 2016. <em>Association of daylight saving time transitions with incidence and in-hospital mortality of myocardial infarction in Finland</em>. Annals of Medicine, 48:1-2, 10-16. <a href="http://dx.doi.org/10.3109/07853890.2015.1119302">link</a></p>
<p>Young Joo Yang, Chang Seok Bang, Gwang Ho Baik, Tae Young Park, Suk Pyo Shin, Ki Tae Suk, Dong Joon Kim. 2017.
<em>Prokinetics for the treatment of functional dyspepsia: Bayesian network meta-analysis</em>.
<span class="caps">BMC</span> Gastroenterology 17:83 <span class="caps">DOI</span> 10.1186/s12876-017-0639-0. <a href="https://bmcgastroenterol.biomedcentral.com/track/pdf/10.1186/s12876-017-0639-0">link (<span class="caps">OA</span>)</a></p>
<p>Xiaole Su, Xinfang Xie, Lijun Liu, Jicheng Lv, Fujian Song, Vlado Perkovic, Hong Zhang. 2017.
<em>Comparative Effectiveness of 12 Treatment Strategies for Preventing Contrast-Induced Acute Kidney Injury: A Systematic Review and Bayesian Network Meta-analysis</em>
Volume 69, Issue 1, pp. 69–77.
<span class="caps">DOI</span>: 10.1053/j.ajkd.2016.07.033, <a href="https://www.ajkd.org/article/S0272-6386(16)30421-8/fulltext">link</a></p>
<p>Devin Incerti. 2015. <em>Bayesian Meta-Analysis with R and Stan</em>. Self-published, online. https://devinincerti.com/2015/10/31/bayesian-meta-analysis.html. Retrieved 4 Oct 2019.</p>
<hr>
<h1>Appendix</h1>
<h2>Characteristics of studies</h2>
<h3>Janszky and Ljung (2008)</h3>
<p><strong>Data</strong>:</p>
<ul>
<li>source: the Swedish registry of acute myocardial infarction (“which provides high-quality information on all acute myocardial infarctions in the country since 1987”)</li>
<li>years: 1987–2006</li>
<li>observations: the incidence of <span class="caps">AMI</span> during each of the first 7 days after the spring or autumn transition</li>
<li>trend: the mean of the incidences on the corresponding weekdays 2 weeks before and 2 weeks after the day of interest</li>
<li>total <span class="caps">AMI</span> cases on spring posttransitional week: 10,776</li>
</ul>
<p><strong>Quotes</strong>:</p>
<blockquote>
<p>The effects of transitions were consistently more pronounced for people under 65 years of age than for those 65 years of age or older.</p>
</blockquote>
<p>The authors properly controlled for the Easter holiday.</p>
<blockquote>
<p>Analyses of the data for the spring shift are based on the 15 years between 1987 and
2006 in which Easter Sunday was not the transition day.
[…]
For years in which Easter
Sunday was celebrated 2 weeks after the Sunday of the spring shift, we defined the control period for the Sunday of
the shift as the Sunday 3 weeks before and the Sunday 3 weeks after (thus skipping Easter Sunday).</p>
</blockquote>
<p><strong>Overanalysis</strong>:</p>
<p>The following observations do not have any plausible explanation, and are probably just noise. Question: did later studies confirm these findings?</p>
<p>1.</p>
<blockquote>
<p>When we did not exclude Easter if it coincided with the exposure or control days, we observed an even higher effect size associated with the spring transition.</p>
</blockquote>
<p>2.</p>
<blockquote>
<p>For the autumn shift, in contrast to the analyses of all acute myocardial infarctions, analyses restricted to fatal cases showed a smaller decrease in the incidence of acute myocardial infarction on Monday, and the risk of fatal acute myocardial infarction increased during the first week after the shift.</p>
</blockquote>
<p>3.</p>
<blockquote>
<p>The effect of the spring transition to daylight saving time on the incidence of acute myocardial infarction was somewhat more pronounced in women than in men, and the autumn effect was more pronounced in men than in women.</p>
</blockquote>
<p><strong>Additional information</strong>:</p>
<p>The authors were employed by institutions in Stockholm, Sweden, meaning the use of the Swedish registry is <em>no evidence for selection bias</em>. Furthermore, the end of the 30-year period of their study is only a year away from the date of the publication.</p>
<h3>Janszky et al. (2012)</h3>
<p><strong>Data</strong>:</p>
<ul>
<li>those <span class="caps">AMI</span> patients who were admitted to CCUs at participating hospitals</li>
<li>from 1995 to 2007</li>
<li>dataset: Register of Information and Knowledge about Swedish Heart Intensive Care Admissions (<span class="caps">RIKS</span>-<span class="caps">HIA</span>)</li>
<li>total <span class="caps">AMI</span> cases during spring posttransitional week: 3235.9</li>
</ul>
<p>This study didn’t publish per-day <span class="caps">AMI</span> counts, only the total during the whole posttransitional week.</p>
<p>The time period matches exactly that of Janszky and Ljung (2008), and every case included in this study was also included in Janszky and Ljung (2008). As such, this study doesn’t add new information to the previous work with regards to the variables we consider, and it is <strong>excluded from our meta-analysis</strong> in order to avoid double-counting.</p>
<p>As the authors put it:</p>
<blockquote>
<p>The study populations of the present and our previous study
overlapped substantially. Our previous analyses included all AMIs
detected either at a hospital or at an autopsy in Sweden from
1987 to 2006, a clear strength. In the present work, we investigated
only those <span class="caps">AMI</span> patients who were admitted to CCUs at participating
hospitals from 1995 to 2007. Although this limited our power
substantially, it allowed us to examine clinical factors that might
modify the risks related to <span class="caps">DST</span> transitions.</p>
</blockquote>
<h3>Čulić (2013)</h3>
<p><strong>Data</strong>:</p>
<ul>
<li>patients hospitalized because of <span class="caps">AMI</span></li>
<li>from 1990 to 1996</li>
<li>40 patients on workdays following <span class="caps">DST</span> change</li>
<li>at University Hospital Centre Split in Split, Croatia</li>
</ul>
<p>It is unclear whether the trend prediction is made from the 2 weeks before and after the posttransitional week, or from all 50 nontransitional weeks:</p>
<blockquote>
<p>The incidence ratios of <span class="caps">AMI</span> for the first week after the
two <span class="caps">DST</span> shifts (posttransitional weeks) and each day of
that week were estimated by dividing the incidence
during those periods with the average incidences during
corresponding days and weeks throughout the year: 2
wks before and 2 wks after the posttransitional week,
and the 50 nontransitional weeks of the year altogether.</p>
</blockquote>
<p>It is unclear why exactly the data from 1990 to 1996 was analyzed, if the study was conducted in 2013. This is <em>suggestive of selection bias</em>.</p>
<p><strong>Overanalysis</strong>:</p>
<p>23 additional variables were analyzed (sex, employment status, use of β-blocker, etc.); some were bound to have low p-values:</p>
<blockquote>
<p>The independent predictors for <span class="caps">AMI</span> during
this period in spring were male sex (p = 0.03) and nonengagement in physical activity (p = 0.02) and there was a trend
for the lower risk of incident among those taking calcium antagonists (p = 0.07). In autumn, the predictors were
female sex (p = 0.04), current employment (p = 0.006), not taking b-blocker (p = 0.03), and nonengagement in
physical activity (p = 0.02).</p>
</blockquote>
<h3>Jiddou et al. (2013)</h3>
<p><strong>Data</strong>:</p>
<ul>
<li>a retrospective electronic chart review</li>
<li>all patients presenting to the emergency centers at Beaumont Hospitals in Royal Oak and Troy, Michigan, with the primary diagnosis of <span class="caps">AMI</span></li>
<li>age: patients who were aged >18 years, resulting in 70±15 years</li>
<li>exclusion conditions: minor, pregnant</li>
<li>from October 2006 to April 2012 (7 years)</li>
<li>trend: patients admitted with comparable diagnoses on the corresponding weekdays 2 weeks before and 2 weeks after the shifts to and from <span class="caps">DST</span></li>
<li>additional variables: demographic data, medical history, tobacco use, prescribed medications, whether the patient underwent cardiac catheterization; diagnosis of hypertension, hyperlipidemia, and coronary artery disease.</li>
</ul>
<p><strong>Quotes</strong>:</p>
<blockquote>
<p>2 AMIs occurred on Easter Sunday and were considered potential confounders and excluded.</p>
</blockquote>
<p>It is correct to note the incidences on Easter Sunday, but even more important would be the incidences on Easter <em>Monday</em>. But even then, is only correct to exclude the patients entirely if the relevant control incidences are also reduced – it is unclear whether this trend correction happened.</p>
<h3>Sandhu et al. (2014)</h3>
<p><strong>Data</strong>:</p>
<ul>
<li>Time: 1 January 2010 – 15 September 2013 (3 fall and 4 spring <span class="caps">DST</span> changes; 1354 days)</li>
<li>Procedural data for hospital admissions where <span class="caps">PCI</span> was performed in the setting of <span class="caps">AMI</span></li>
<li>Number of cases: 42,060 hospital admissions for <span class="caps">AMI</span> requiring <span class="caps">PCI</span> occurred during the study period.</li>
<li>The median daily <span class="caps">AMI</span> total was 31, ranging from a minimum of 14 to a maximum of 53 admissions.</li>
</ul>
<p><strong>Results</strong>:</p>
<blockquote>
<p>There was no difference in the total weekly number of PCIs performed for <span class="caps">AMI</span> for either the fall or spring time changes in the time period analysed. After adjustment for trend and seasonal effects, the Monday following spring time changes was associated with a 24% increase in daily <span class="caps">AMI</span> counts (p=0.011), and the Tuesday following fall changes was conversely associated with a 21% reduction (p=0.044). No other weekdays in the weeks following <span class="caps">DST</span> changes demonstrated significant associations.</p>
</blockquote>
<p><strong>Analysis</strong>:</p>
<p>I was unable to obtain the data at <a href="https://bmc2.org">Blue Cross Blue Shield of Michigan</a> and the study did not include the number of <span class="caps">AMI</span> cases numerically, therefore I estimated it from the chart in Figure 3 (which was accurate to 0.4 <span class="caps">AMI</span>).</p>
<h3>Kirchberger et al. (2015)</h3>
<p><strong>Data</strong>:</p>
<ul>
<li><span class="caps">AMI</span> count: 25,499 cases of <span class="caps">AMI</span></li>
<li>data source: <span class="caps">MONICA</span>/<span class="caps">KORA</span> Myocardial Infarction Registry (<a href="https://www.helmholtz-muenchen.de/herzschlag-info/">link</a>; public data should be published yearly according to <a href="http://www.gbe-bund.de/gbe10/abrechnung.prc_abr_test_logon?p_uid=gast&p_aid=0&p_knoten=FID&p_sprache=E&p_suchstring=7014">this website</a>, but I did not find a link to download the dataset)</li>
<li>time period: 1 January 1985 and 31 October 2010 (26 spring and 25 fall <span class="caps">DST</span> changes – 2010 fall adjustment was on 31 October)</li>
<li>ages: 25–74</li>
<li>includes: coronary death and <span class="caps">AMI</span></li>
<li>location: city of Augsburg (Germany) and the two adjacent counties (about 600,000 inhabitants)</li>
<li>additional variables: information on re-infarction, various medication prior to <span class="caps">AMI</span>, current occupation, history of hypertension, hyperlipidemia, diabetes, smoking, and obesity.</li>
<li>confounders accounted for: global time trend, temperature, relative humidity, barometric pressure, and indicators for month of the year, weekday and holiday</li>
</ul>
<p><strong>Quotes</strong>:</p>
<blockquote>
<p>The final model included the following covariates: time trend and previous two day mean relative humidity as regression splines with four and two degrees of freedom, respectively, previous two day mean temperature as a linear term and day of the week as categorical variable.</p>
<p>The optimized spring model [of the data from March and April, excluding the week in question] included time trend and same day mean relative humidity as regression splines with six and three degrees of freedom.</p>
</blockquote>
<p>Six d.o.f. for 2 months is probably overfitting the data, even though it was the sum of 26 years. However, it shouldn’t make a predictible effect, and its overall effect is probably negligible.</p>
<blockquote>
<p>The incidence rate ratio was assessed as observed over expected events per day and the mean per weekday and corresponding 95% confidence intervals were calculated.</p>
</blockquote>
<p>However, it is not stated how the confidence intervals were calculated: most importantly, which statistical test was used?</p>
<p><strong>Analysis</strong>:</p>
<p>The paper stated only the calculated RRs for the spring and autumn prediction models (for all seven days), not the actual <span class="caps">AMI</span> counts.
Assuming the researchers analyzed the data in an honest manner (i.e. not picking model parameters for lower trend prediction and thus more significant observed increase), and that the model didn’t predict large deviations from the 2.7 <span class="caps">AMI</span>/day average, we can calculate a close approximation of the observations as <script type="math/tex">\mathrm{RR}_d \cdot \mathrm{trend}</script>.</p>
<h3>Sipilä et al. (2016)</h3>
<p><strong>Data</strong>:</p>
<ul>
<li>years: 2001–2009, except 2002 and 2005 (due to Easter). 7 years.</li>
<li>Exclusion criterion: age < 18.</li>
<li>Age: mean age 71.2, <span class="caps">SD</span> 12.6 years</li>
<li>2 weeks prior and 3 weeks after <span class="caps">DST</span> transition</li>
<li>all 22 Finnish hospitals with coronary catheterization laboratory that treat emergency cardiac patients</li>
<li>
<p>database: Finnish Care Register for Health Care (<span class="caps">CRHC</span>), a nationwide, obligatory and automatically collected hospital discharge database.</p>
</li>
<li>
<p>Study group: posttransitional week</p>
</li>
<li>Control group: 2 weeks before/after posttransitional week</li>
<li>Easter in study group: 2002, 2005. “Years with <span class="caps">DST</span> spring transition on Easter Sunday were excluded from the analysis (2002 and 2005) to increase international comparability and avoid confounding”</li>
<li>Easter in control group: “When Easter Sunday was celebrated within 2 weeks after <span class="caps">DST</span> transition, post-<span class="caps">DST</span> control weeks after Easter were selected.”</li>
<li>Spring study+control group size: 1269+5029 = 6298</li>
<li>Standardized incidence of <span class="caps">MI</span> admissions in participating hospitals during spring study period was 259/100,000 person-years.</li>
</ul>
<p><strong>Quotes</strong>:</p>
<blockquote>
<p>Incidence of <span class="caps">MI</span> admissions was similar to control
weeks for Sunday–Tuesday after <span class="caps">DST</span> transition
(Figure 1). However, on fourth day after transition
(Wednesday), there was a significant increase in <span class="caps">MI</span>
incidence compared to control weeks (<span class="caps">IR</span> 1.16; <span class="caps">CI</span> 1.01– 1.34).</p>
</blockquote>
<p>Is there anything special about the <em>Wednesday</em> that follows a <span class="caps">DST</span> transition? One should not be surprised if a value falls outside of a 95% confidence/credible interval – after all, it happens <em>at least</em> 5% of the time even in the absence of any “interesting” effect.</p>
<blockquote>
<p>Patients admitted
during the week after <span class="caps">DST</span> transition were less likely to
have diagnosed diabetes or ventricular arrhythmias
compared to patients admitted during control weeks,
but had diagnosed renal failure more often.</p>
</blockquote>
<p>There is no simple and plausible explanation for this, therefore it is more probable that this is a result of finding patterns in noise.</p>
<blockquote>
<p>Population-based incidence
of <span class="caps">MI</span> admissions to participating hospitals during
spring and autumn periods were calculated using
corresponding population data of mainland Finland
obtained from Statistics Finland and standardized to
European standard population 2013 by using the direct method.</p>
</blockquote>
<p>The meaning of the above statement is unclear.</p>
<h2>Footnotes</h2>
<h4>Footnote 1</h4>
<p>“Sleep researchers show a 20% increase in risk of heart attacks in Michigan but a 10% decrease in Finland, so it is advised to travel to Europe for this week.”</p>
<p><a href="#fn-src-misinterpret">[back to source]</a> ↑</p>
<h4>Footnote 2</h4>
<p>Originally, I wrote the following:</p>
<blockquote>
<p>Further research could analyze the publication bias (if you know how to do that in a Bayesian framework, please mention it in the comments below), or analyze more data, preferably from multiple countries. Maybe the <span class="caps">DST</span> transition has a smaller effect on the Finnish population than on the Swedish population, which could easily be analyzed using Bayesian statistics.</p>
</blockquote>
<p>But then I calculated the absolute global effect, which is quite small, therefore the updated recommendation.</p>
<p><a href="#fn-src-further">[back to source]</a> ↑</p>Trust in numbers — notes of a talk given by Sir David Spiegelhalter2019-10-08T00:00:00+02:002019-10-08T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2019-10-08:/2019/10/08/trust-in-numbers/<p>Summary of a keynote talk given by Sir David Spiegelhalter about the reporting of medical results.</p><p>The Institute of Medical Statistics of the Center for Medical Statistics, Informatics and Intelligent Systems at the Medical University of Vienna <a href="https://cemsiis.meduniwien.ac.at/50years-of-ms/">just turned 50 years old</a>, and they organized a two-day event around it. I was fortunate to have attended the keynote talk of Sir David Spiegelhalter (<a href="https://en.wikipedia.org/wiki/David_Spiegelhalter">wiki</a>), who is a British statistician and <a href="https://en.wikipedia.org/wiki/Winton_Professorship_of_the_Public_Understanding_of_Risk">Winton Professor of the Public Understanding of Risk</a> at the Faculty of Mathematics, University of Cambridge, which was one of the most entertaining <em>and</em> informative talk I have heard. There is no way I can do justice to the talk, and I wouldn’t even attempt to bring through the humor (his <em>humour</em>) – the goal of this post is to increase your vigilance a little bit when it comes to any reports about science, and to shed light on the work of Spiegelhalter.</p>
<p>The professor has authored several academic books on statistics, and was interviewed by the <span class="caps">CNN</span> with the title, <a href="https://edition.cnn.com/videos/tv/2019/04/01/amanpour-david-spiegelhalter-statistics.cnn"><em>Why statistics should make you suspicious</em></a>. And keeps doing a huge service to science in a number of other ways.</p>
<p>The problem explained in the talk was that <strong>numbers are used to persuade people, not to inform them</strong>. (Actually, that was only the first half – the second half offered a handful of steps we could take when presenting our data.) Take for example politics, and the campaign around Brexit. Even if it were true that it costs £350 million a week for the <span class="caps">UK</span> to be a member of the <span class="caps">EU</span>, it would be much less misleading if it said that it costs 80 pence <em>per person per day</em> to be a member of the <span class="caps">EU</span>. The cost of a bag of potato chips. (The other side committed similar errors too – I’m not trying to win a battle here.)</p>
<p><img alt="We send the EU £350 million a week; let’s fund our NHS instead. Vote Leave." src="https://www.treszkai.com/2019/10/08/trust-in-numbers/nhs.png"></p>
<p>As Eliezer Yudkowsky says, <a href="https://www.lesswrong.com/posts/9weLK2AJ9JEt2Tt8f/politics-is-the-mind-killer">politics is the mind-killer</a>, but of course, using numbers to mislead instead of to show an honest representation of reality is done everywhere where there are numbers. My favorite topic these days: <strong>medical statistics</strong>. I’m picking a topic from the talk as an example (which Spiegelhalter analyzed in more detail in a <a href="https://medium.com/wintoncentre/are-we-individuals-or-members-of-populations-the-deeper-issues-behind-the-sausage-wars-a067aebf2063">Medium post</a>): dietary advice about processed meat consumption. <span class="caps">CNN</span> did a <a href="https://edition.cnn.com/2019/04/17/health/colorectal-cancer-risk-red-processed-meat-study-intl/index.html">great job</a> with picking the title of their article to be as close to the original conclusions as possible: <em>Eating just one slice of bacon a day linked to higher risk of colorectal cancer, says study</em>. But by the time this study reaches The Sun, it gets reported as the following:</p>
<p><img alt="Rasher of bacon a day is deadly" src="https://www.treszkai.com/2019/10/08/trust-in-numbers/bacon.jpg">.</p>
<p>Boy, that escalated quickly. And what does “higher risk of colorectal cancer” mean anyhow? In this case, the study showed a 19% increase. As Peter Attia explains in his detailed post series on science, <a href="https://peterattiamd.com/ns001/">Studying Studies</a>, such big numbers generally mean an increase in <em>relative risk</em>, not in <em>absolute risk</em>. Relative risk is meaningless without knowing the base rate of the disease. In this case, 5% of <span class="caps">US</span> men and women born today are expected to be diagnosed with colorectal cancer sometime during their lives. Add 19% to that 5% figure (i.e., multiply it by 1.19), and you get 6%, for the people who eat 1 slice of bacon a day. (The 5% figure is surprisingly high, by the way! Fortunately, it has a five-year survival rate of 65%. I don’t know how much of the 5% is a false positive; I guess it doesn’t include the disconfirmed cases. These figures I just gathered from <a href="https://en.wikipedia.org/wiki/Colorectal_cancer#Epidemiology">Wikipedia</a>, <span class="caps">FWIW</span>.)</p>
<p>You can take the extra step and visualize these numbers using what <a href="https://en.wikipedia.org/wiki/Gerd_Gigerenzer">Gigerenzer</a> calls natural frequencies. As one Wikipedia author puts it, “the problem is not simply in the human mind, but in the representation of the information”, so let’s deliver using things we evolved to understand: a small tribe of human-like icons.</p>
<p><em>All</em> of these people below eat a clean diet without processed meat, and those with a distraught face will get colorectal cancer:</p>
<p>😎😎😎😎😎😎😫😎😎😎<br />
😎😎😎😎😎😎😎😎😎😎<br />
😎😎😎😎😎😎😎😎😎😎<br />
😎😎😎😫😎😎😎😎😎😎<br />
😎😎😫😎😎😎😎😎😎😎<br />
😎😎😎😎😎😎😎😎😎😎<br />
😎😎😎😎😎😎😎😎😎😫<br />
😎😎😎😎😎😎😎😎😎😎<br />
😎😎😎😎😎😎😎😎😎😎<br />😎😎😎😎😎😎😎😫😎😎</p>
<p>And <em>all</em> of these people eat a slice of <a href="https://en.wikipedia.org/wiki/Extrawurst">Extrawurst</a> daily:</p>
<p>😎😎😎😎😎😎😫😎😎😎<br />
😎😎😎😎😎😎😎😎😎😎<br />
😎😎😎😎😎😎😎😎😎😎<br />
😎😎😎😫😎😎😎😎😎😎<br />
😎😎😫😎😎😎😎😎😎😎<br />
😎😎😎😎😎😎😎😎😎😎<br />
😎😎😎😎😎😎😎😎😎😫<br />
😎😎😎😎😎😎😎😎😎😎<br />
😫😎😎😎😎😎😎😎😎😎<br />😎😎😎😎😎😎😎😫😎😎</p>
<p>See the difference? It’s that one troubled guy in row 9.</p>
<p>Now, I’m not saying bacon is good for health, or that that additional risk factor would be negligible (admittedly, my mocking tone above suggests otherwise). But if the scientists, journalists, and clinicians report the risk honestly, <em>and</em> no-one is trying to influence you into eating more burgers by playing at our primal instincts (including the marketing division of McDonald’s and our social group who calls you chicken if you don’t eat your <a href="https://en.wikipedia.org/wiki/Black_pudding">black pudding</a>), then us puny humans could make more educated decisions about which sacrifices we are willing to make.</p>
<p>This post was just a tiny part of what was said at the talk. In parting, I have two takeaway quotes. First,</p>
<blockquote>
<p>80% of statistics are false.</p>
</blockquote>
<p>(From anonymous statistician, a comedian, and also <a href="https://www.youtube.com/embed/aHGd6LqAVzw?start=43">Elon Musk</a>.) Unfortunately, this factoid alone doesn’t enable one to navigate reality.</p>
<p>The second quote is of a little more value, but still doesn’t help one to sieve through statistics:</p>
<blockquote>
<p>There’s no point in being trustworthy if you’re boring.</p>
</blockquote>
<p>(From Spiegelhalter in today’s talk.)</p>
<p>This talk was anything but boring. If you have a chance to see Spiegelhalter in person, do so: he gets my highest grade recommendation. (He also has a book, titled <a href="https://smile.amazon.com/Art-Statistics-How-Learn-Data/dp/1541618513"><em>The Art of Statistics</em></a>, which I haven’t read.)</p>
<p>(Somewhat related: just today on my way home I learned of Edward Tufte’s book, <em>The Visual Display of Quantitative Information,</em> which also <a href="https://www.edwardtufte.com/tufte/books_vdqi">looks amazing</a>.)</p>On the overconfidence of modern neural networks2019-09-26T00:00:00+02:002019-09-26T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2019-09-26:/2019/09/26/overconfidence/<p>Evaluating various methods to improve the calibration of deep neural networks.</p><p><em>On the overconfidence of modern neural networks</em>. This is the title of the coursework I did with a fellow student at the University of Edinburgh. (<span class="caps">PDF</span>: <a href="https://www.treszkai.com/2019/09/26/overconfidence/mlp-cw3.pdf">Part 1</a>, <a href="https://www.treszkai.com/2019/09/26/overconfidence/mlp-cw4.pdf">Part 2</a>.)</p>
<p>Our topic was influenced by a previous study, titled <em>On Calibration of Modern Neural Networks</em> <!-- {% cite Guo2017-calibration %} --> <a class="citation" href="#Guo2017-calibration">(Guo, Pleiss, Sun, <span class="amp">&</span> Weinberger, 2017)</a>.</p>
<p>Applications of uncertainty estimation include threshold-based outlier detection, active learning, uncertainty-driven exploration of reinforcement learning, or certain safety-critical applications.</p>
<h2>What is uncertainty?</h2>
<p>No computer vision system is perfect, so an image classification algorithm sometimes identifies people as not-people, or not-people as people.
While we usually care about the class with the highest output (the “most likely” class), we can treat the softmax outputs of a classifier as uncertainty estimates.
(After all, that is how we trained a model when treating the softmax outputs of a classifier as a probability distribution, and minimizing the negative log likelihood of the model given the data.)
For example, out of 1000 classifications made with an output of 0.8, approximately 800 should be correct <em>if the system is well-calibrated</em>.</p>
<p><img alt="Example output of a YOLO object detection network" src="https://www.treszkai.com/2019/09/26/overconfidence/yolo.png"></p>
<p>(Example output of a <span class="caps">YOLO</span> object detection network, with the probability estimates. Image source: <a href="https://www.analyticsvidhya.com/blog/2018/12/practical-guide-object-detection-yolo-framewor-python/">Analytics Vidhya</a>.)</p>
<p>Ideally, we want our system to be 100% correct, but we rarely have access to an all-knowing Oracle. In cases where it is hard to distinguish between two categories (like on the cat-dog below) we want the uncertainties to be well-calibrated, so that predictions are neither overly confident nor insufficiently confident.</p>
<p><img alt="Image of a cat that could be mistaken for a dog" src="https://www.treszkai.com/2019/09/26/overconfidence/catdog.jpeg"></p>
<p>(Image source: Google Brain)</p>
<h2>Our results</h2>
<h3>Interim report</h3>
<p><a href="https://www.treszkai.com/2019/09/26/overconfidence/mlp-cw3.pdf">Link to report (<span class="caps">PDF</span>)</a></p>
<p>Our initial experiments showed that our baseline model is already well-calibrated when trained on the <span class="caps">EMNIST</span> By-Class dataset.
Calibration worsened when we used only a subset of the training set.
We found that increasing regularization increases calibration, but too much regularization leads to a decrease in both accuracy and calibration. (See figure below.)
This contradicts the findings of <!-- {% cite Guo2017-calibration -L section -l 3 %} --> <a class="citation" href="#Guo2017-calibration">(Guo, Pleiss, Sun, <span class="amp">&</span> Weinberger, 2017, sec. 3)</a>, who found that model calibration can improve by increasing the weight decay constant, well after the model achieves minimum classification accuracy.
One of our main findings is that cross-entropy error is not a good indicator of model calibration.</p>
<p><img alt="Figure 5 of our interim report." src="https://www.treszkai.com/2019/09/26/overconfidence/mlp-cw3-fig5.png"></p>
<p>(<span class="caps">ECE</span>: expected calibration error. The lower the better.)</p>
<h3>Final report</h3>
<p><a href="https://www.treszkai.com/2019/09/26/overconfidence/mlp-cw4.pdf">Link to report (<span class="caps">PDF</span>)</a></p>
<p>We replicate the findings of <!-- {% cite Guo2017-calibration %} --> <a class="citation" href="#Guo2017-calibration">(Guo, Pleiss, Sun, <span class="amp">&</span> Weinberger, 2017)</a>£ that deep neural networks achieve higher accuracy but worse calibration than shallow nets, and compare different approaches for improving the calibration of neural networks (see figure below). As the baseline approach, we consider the calibration of the softmax outputs from a single network; this is compared to <em>deep ensembles</em>, <em><span class="caps">MC</span> dropout</em>, and <em>concrete dropout</em>. Through experiments on the <span class="caps">CIFAR</span>-100 data set, we find that a large neural network can be significantly over-confident about its predictions. We show on a classification problem that an ensemble of deep networks has better classification accuracy and calibration compared to a single network, and that <span class="caps">MC</span> dropout and concrete dropout significantly improve the calibration of a large network.</p>
<p><img alt="Confidence and calibration plots for BigNet. (Figure 2 of our report)" src="https://www.treszkai.com/2019/09/26/overconfidence/mlp-cw4-fig2.png"></p>
<p>(<em>Top row:</em> confidence plots for a deep neural net. The more skewed to the right, the better. <em>Bottom row:</em> corresponding calibration plots. The more close to the diagonal, the better.)</p>
<h2>Things I would do differently</h2>
<p>With a little more experience behind my back now, I would make the following changes in experiment design and writing the report:</p>
<ul>
<li><em>Use a validation set.</em> We only used a training set because we trained for minimum error, and we expected <em>calibration</em> to be independent from <em>accuracy</em>, but that is a strong assumption (and likely incorrect, seeing our results in the interim report).</li>
<li><em>Use better biblography sources.</em> Instead of Google Scholar, I would search <a href="https://dblp.uni-trier.de/"><span class="caps">DBLP</span></a>, where the information is more correct and consistent.</li>
<li><em>Use pastel colors.</em> I let my collaborator have it his way, but ever since this submission I’m having nightmares in purple and glowing green :D</li>
</ul>
<p>In future work, I would like to test the calibration of a Bayesian neural network, where the weights of the network have a probability distribution instead of a point estimate.</p>
<h2>References</h2>
<!— {% bibliography —cited %} —>
<ol class="bibliography"><li><span id="Guo2017-calibration">Guo, C., Pleiss, G., Sun, Y., <span class="amp">&</span> Weinberger, <span class="caps">K. Q.</span>(2017). On Calibration of Modern Neural Networks. In D. Precup <span class="amp">&</span> <span class="caps">Y. W.</span> Teh (Eds.), <i>Proceedings of the 34th International Conference on Machine Learning</i> (Vol. 70, pp. 1321–1330). International Convention Centre, Sydney, Australia: <span class="caps">PMLR</span>. Retrieved from http://proceedings.mlr.press/v70/guo17a.html</span></li></ol>Paper summary: Abbeel, Ng: Inverse Reinforcement Learning (2004)2019-08-19T00:00:00+02:002019-08-19T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2019-08-19:/2019/08/19/irl-summary/<p>Summary of the seminal paper on inverse reinforcement learning.</p><p>This post is a summary of the seminal paper on inverse reinforcement learning: Pieter Abbeel, Andrew Y. Ng: <em>Apprenticeship Learning via Inverse Reinforcement Learning</em> (2004) [<a href="http://ai.stanford.edu/~pabbeel/irl/">link</a>].</p>
<p>Traditional <a href="http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html">reinforcement learning</a> (<span class="caps">RL</span>) starts with specifying a reward function, and during training we search for policies that maximize this reward function<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. In contrast, inverse reinforcement learning (<span class="caps">IRL</span>) starts with expert demonstrations of the desired behavior, infers a reward function that the expert likely followed, and trains a policy to maximize that.</p>
<!-- more -->
<p><span class="caps">IRL</span> is useful for learning complex tasks where it is hard to manually specify a reward function that makes desirable trade-offs between desiderata; such tasks include driving a car or teaching a robot to do a backflip, where we want the car to reach to the destination promptly but also safely, or the robot to flip with its arms straight and <a href="https://youtu.be/xet3KDUfS_U?t=50">sticking the landing</a>.</p>
<p>In contrast with previous attempts at apprenticeship learning (i.e. learning from an expert), which tried to mimic the expert demonstrations directly, <span class="caps">IRL</span> assumes that the expert follows a reward function that is a linear combination of the feature vectors (<script type="math/tex">R = w^T φ(s)</script>), and finds a reward function that maximizes the received reward under the set of demonstrations. The hand-specified function <script type="math/tex">φ: S→ℝ^k</script> maps a state of the Markov decision process (<span class="caps">MDP</span>) to a feature vector, which vector includes parameters for the different desiderata of the task, such as the distances to objects surrounding the car, the speed of the car, or the current lane.</p>
<p><span class="caps">IRL</span> assumes knowledge of an expert policy <script type="math/tex">π_E</script>, or at least samples from it. Using these, we only care about the estimated “accumulated feature values”, <script type="math/tex">μ(π_E) ∈ ℝ^k</script>, which is the expected discounted sum of the feature vectors if sampled from the policy, because then the value of a policy (parametrised by <script type="math/tex">w</script>) can be calculated from it directly: <script type="math/tex">R = w^T μ(π_E)</script>.</p>
<p>The goal is then to find a policy whose performance is close to that of the expert’s on the unknown reward function <script type="math/tex">R_{\star} = w^T_{\star} φ</script>. This is done by finding a policy whose feature vector is close to the expert’s feature vector, which assures that the value of these policies is close too.</p>
<p>The algorithm for <span class="caps">IRL</span> is the following:</p>
<ol>
<li>Pick a random initial policy, and calculate its <script type="math/tex">μ</script>.</li>
<li>Find the vector of weights w that lies within the unit ball and <em>maximizes</em> the difference between the expert feature expectations and the feature expectations of our best policy thus far.</li>
<li>If this maximum is small, then go to step 7.</li>
<li>Otherwise <script type="math/tex">w</script> is our new weights for <script type="math/tex">R</script>.</li>
<li>Calculate optimal policy for this <script type="math/tex">R</script>.</li>
<li>Repeat from step 2.</li>
<li>Let the agent designer pick a policy from any of those found in step 5 in the different iterations; or find the policy in the convex closure of these policies that is closest to the expert policy.</li>
</ol>
<p>The maximization in step 2 allows us to find a policy that is close to the expert’s, regardless of the choice of a reward function. After all, we are interested in the policy, not the reward function, and so the estimated <script type="math/tex">R</script> is not necessarily correct.</p>
<p>This algorithm is proved to terminate within <script type="math/tex">O(k \log(k))</script> steps, using at least <script type="math/tex">O(k \log(k))</script> number of samples from the expert policy.</p>
<p>Experiments are done in a gridworld environment, where <span class="caps">IRL</span> learns the expert policy in approximately 100 times less sample trajectories than simply mimicking the expert. Another experiment is a car driving simulator with 3 lanes viewed from the top, where <span class="caps">IRL</span> is capable of learning multiple driving styles, such as “prefer the right lane but avoid collisions”. Video demonstrations of the latter show that the sentiment of the expert policy is indeed followed, although sometimes with unnecessary lane switches (most modern <span class="caps">RL</span> algorithms also exhibit this undesired property).</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>Or, more accurately, a policy that maximizes the expected utility derived from this reward function and some method of temporal discounting. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>Sampling from the posterior with Markov-chain Monte Carlo2019-08-06T00:00:00+02:002019-08-06T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2019-08-06:/2019/08/06/mcmc/<p>Description of the sampling algorithm of Metropolis et al. in 500 words.</p><p>John K. Kruschke’s book, titled <em>Doing Bayesian Data Analysis: A Tutorial with R, <span class="caps">JAGS</span>, and Stan (2nd ed.)</em> (<a href="https://www.amazon.com/Doing-Bayesian-Data-Analysis-Tutorial/dp/0124058884">Amazon</a>, <a href="https://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/">official site</a>), gives a very quick and practical introduction to Bayesian analysis. Compared to <span class="caps">BDA3</span>, it contains less proofs, but also less jargon; more explanations that are informal, and more introductions to the basics. As such, I would recommend it to someone who hasn’t had much of an exposure to statistics yet, or is not a mathematician nor a programmer.</p>
<p>The book includes thorough and nicely visualized descriptions of multiple Markov-chain Monte Carlo methods for sampling from a posterior distribution, of which I’ll try to summarize the most basic one in this post.</p>
<h2>Goal of sampling</h2>
<p>Given the prior <script type="math/tex">p(θ)</script> and the likelihood <script type="math/tex">p(\D\given θ)</script>, we want samples from the posterior <script type="math/tex">p(θ\given \D)</script>. In the following sections I’ll use the fact that the unnormalized posterior is equal to the prior multiplied with the likelihood: <script type="math/tex">p(θ, \D) = p(θ)\,p(\D \given θ)</script>. Here, I’ll talk only about continuous probability spaces; discrete spaces can be sampled similarly.</p>
<h2>Metropolis algorithm</h2>
<p>Just like the other <span class="caps">MC</span> methods, the Metropolis algorithm starts with a seed value for <script type="math/tex">θ</script> – let’s call it <script type="math/tex">θ_0</script>. (I assume in practice <script type="math/tex">θ_0</script> is sampled from the prior.) Then, once you have a seed value <script type="math/tex">θ_i</script>, repeat the following two steps for a prespecified number of iterations, or until an effective sample size is achieved.</p>
<ol>
<li>Sample <script type="math/tex">θ'_{i+1}</script> from a proposal distribution around <script type="math/tex">\theta_i</script>, which could be a Gaussian: <script type="math/tex">\theta'_{i+1} \sim \N (θ_i, Σ)</script>.</li>
</ol>
<p>2.</p>
<ul>
<li>If <script type="math/tex">p(θ_{i},\D) \le p(θ'_{i+1},\D)</script> – i.e. if <script type="math/tex">p(θ_{i} \given \D) \le p(θ'_{i+1} \given \D)</script> – then <em>accept</em> the proposed parameter value: <script type="math/tex">θ_{i+1} := θ'_{i+1}</script>.</li>
<li>Otherwise, the probability of accepting the proposed parameter is the ratio of the posterior at the proposed value and at the current value; otherwise, reject it:</li>
</ul>
<script type="math/tex; mode=display">% <![CDATA[
\begin{gathered}
p = \frac{p(θ'_{i+1}, \D)}{p(θ_{i}, \D)} = \frac{p(θ'_{i+1} \given \D)}{p(θ_{i} \given \D)}, \\
b \sim Bernoulli(p), \\
θ_{i+1} =
\begin{cases}
θ_{i+1}' & \text{if } b=1,\\
θ_i & \text{if } b=0.
\end{cases}
\end{gathered} %]]></script>
<p>It can be proven that after a so-called “burn-in” period, the probability of any <script type="math/tex">θ_{n}</script> value will be the posterior probability: <script type="math/tex">θ_n \sim p(\theta_n\given \D)</script> if <script type="math/tex">n \gg 1</script>, therefore if you do the procedure long enough, you’ll end up with many samples from the posterior. Note that the <em>effective sample size</em> will be much lower than <script type="math/tex">N</script>, because neighboring samples are strongly correlated, so we have to drop most of the <script type="math/tex">θ_i</script> values so obtained.</p>
<p>The beauty of this algorithm is that during this whole procedure, we only need to be able to compute the <em>unnormalized posterior</em> – so the algorithm can be easily used for sampling using the prior and the likelihood, even when the model is specified up to a multiplicative constant (as in an undirected graphical model).</p>
<p>This algorithm doesn’t easily escape a “probability island” – i.e. a region that is surrounded with a wide region of probability 0. (Although if the proposal distribution is wide enough, then the algorithm is theoretically able to make that jump <em>eventually</em>, which maybe in practice “approximately never”.)</p>
<p>One downside of this basic algorithm is that the proposal distribution needs to be fine-tuned for the individual application: differences in effective sample size can be orders of magnitudes, even for a simple <script type="math/tex">\text{Beta}(14,20)</script> distribution (i.e. a 1-dimensional unimodal distribution with finite support).</p>
<p>Another downside is that in multiple dimensions this random walk is quite inefficient, and <em>even more</em> dependent on a correct choice of the covariance matrix <script type="math/tex">Σ</script> – but apart from the obvious reason that “high-dimensional spaces are big”, I couldn’t tell why.</p>
<p>The well-known Metropolis–Hastings algorithm, Gibbs sampling and Hamiltonian Monte Carlo are different twists on this core idea, and they are also described in the book.</p>
<p>Allegedly, credit for this method is due more to Marshall and Arianna Rosenbluth – if there is agreement on that, we could rename it to Rosenbluthsian Monte Carlo.</p>
<h2>For more information…</h2>
<p>If you want to learn about sampling, or Bayesian data analysis, consider reading <a href="https://www.amazon.com/Doing-Bayesian-Data-Analysis-Tutorial/dp/0124058884">the book</a>, it’s a great read from what I’ve read so far.</p>
<p>Stay tuned for more of Bayes, or Curry, or Euler, or McCarthy.</p>Bayesian inference: Approaching certainty through sampling2019-07-24T00:00:00+02:002019-07-24T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2019-07-24:/2019/07/24/approaching-certainty/<p>Analysis of a slightly incorrect statement from <span class="caps">BDA3</span>: if all of the 1000 draws from Bernoulli(p) are 1, what are likely values of p?</p><p><em>Bayesian Data Analysis</em> from Gelman et al. (2013), in section 3.7, presents the statistical analysis of a bioassay experiment. The parameters of the model are <script type="math/tex">(\alpha, \beta)</script>, and we draw samples from the numerically calculated posterior. Then the authors write:</p>
<blockquote>
<p>All of the 1000 simulation draws had positive values of <script type="math/tex">\beta</script>, so the posterior probability that <script type="math/tex">\beta > 0</script> is roughly estimated to exceed 0.999.</p>
</blockquote>
<p>I thought this 0.999 figure is an overestimate; I analyze this question in this post.</p>
<!--more-->
<h2>Analysis</h2>
<p>The event “<script type="math/tex">\beta > 0</script>” is a Bernoulli-distributed random variable; let’s denote it with <script type="math/tex">x \sim \text{Bernoulli}(\theta)</script>. If we draw <script type="math/tex">S</script> samples from <script type="math/tex">x</script> (and denote the results with <script type="math/tex">x_i</script>), the conditional probability distribution of <script type="math/tex">p(\theta \given \{x_i\})</script> is described by the following directed graphical model:</p>
<p><img alt="Bayes net for x_i and theta" src="https://www.treszkai.com/2019/07/24/approaching-certainty/dgm-theta.svg"></p>
<p>The node for <script type="math/tex">x_i</script> is filled because it’s observed, and the plate represents <script type="math/tex">S</script> copies of this node (with <script type="math/tex">i</script> ranging from <script type="math/tex">1</script> to <script type="math/tex">S</script>).</p>
<p>If <script type="math/tex">n_1</script> (resp. <script type="math/tex">n_0</script>) denote the number of samples where <script type="math/tex">x_i</script> is true (resp. false), the likelihood is described by:</p>
<p>[p({x_i} \given \theta) = \text{Binomial}(n_1 \given n = S, p = \theta).]</p>
<p>We can assume a noninformative uniform prior on the probability <script type="math/tex">\theta</script> on the unit interval. A Beta prior is conjugate to the Bernoulli likelihood, and <script type="math/tex">p(\theta) = \text{Beta}(\theta \given \alpha_0 = 1, \beta_0 = 1) = \text{Uniform}(\theta \given a = 0, b = 1)</script>, and this results in the following posterior:</p>
<p>[p(\theta \given {x_i}) = \text{Beta}(\theta \given \alpha_0 + n_0, \beta_0 + n_1).]</p>
<p>With <script type="math/tex">n_1 = 1000</script> and <script type="math/tex">n_0 = 0</script>, this amounts to a <script type="math/tex">\text{Beta}(1001, 1)</script> distribution, whose <a href="https://en.wikipedia.org/wiki/Beta_distribution#Probability_density_function">pdf</a> is as such:</p>
<p><img alt="Pdf of Beta(1001,1)" src="https://www.treszkai.com/2019/07/24/approaching-certainty/beta-1000-pdf-big.svg"></p>
<p>As expected, most of the probability mass is close to 1.0. But that graph is not very legible, so let’s zoom in on the right end of the <em>x</em> axis:</p>
<p><img alt="Pdf of Beta(1001,1) in [.99,1.0] interval" src="https://www.treszkai.com/2019/07/24/approaching-certainty/beta-1000-pdf-zoomed.svg"></p>
<p>The red line marks the mean of the distribution, which is approximately <script type="math/tex">0.999</script>, but not nearly all of the probability mass is on the right side of <script type="math/tex">0.999</script>. Using the <a href="https://en.wikipedia.org/wiki/Cumulative_distribution_function">cdf</a> of the posterior, we have that</p>
<p>[P(\theta > 0.999) = 0.63,]</p>
<p>meaning there’s still a 1 in 3 chance that the posterior probability that <script type="math/tex">\beta > 0</script> does <em>not</em> exceed <script type="math/tex">0.999</script>. To be fair, <strong><script type="math/tex">0.999</script> is still good for a “rough estimate”</strong>, unless one has a strong prior for <script type="math/tex">\beta < 0</script>. (Given the nature of the experiment and the meaning of the parameter <script type="math/tex">\beta</script> — the toxicity of a compound —, a flat prior on “<script type="math/tex">\beta > 0</script>” is reasonable.)</p>
<h3>Presidential elections</h3>
<p>A similar statement was made for 1988 pre-election polls, on page 70:</p>
<blockquote>
<p>All of the 1000 simulations <script type="math/tex">\theta_1 > \theta_2</script>; thus, the estimated posterior probability that Bush had more support than Dukakis in the survey population is over 99.9%.</p>
</blockquote>
<p>When a presidential election is won “by a landslide”, that rarely means more than a 60-40% results; so in this case, I would rather use a prior that puts more mass on results close to 50-50%, for example <script type="math/tex">\text{Beta}(10,10)</script>:</p>
<p><img alt="Pdf of Beta(10,10)" src="https://www.treszkai.com/2019/07/24/approaching-certainty/beta-10-pdf.svg"></p>
<p>This results in the following posterior:</p>
<p><img alt="Pdf of Beta(1010,10)" src="https://www.treszkai.com/2019/07/24/approaching-certainty/beta-1010-pdf.svg"></p>
<p>So in this case, the crude estimate does does not suffice, and we should rather be only 98% certain. (This is a 20-fold difference, <script type="math/tex">(1-.98)/(1-0.999)</script>, and a well-calibrated <a href="https://goodjudgment.com/philip-tetlocks-10-commandments-of-superforecasting/">superforecaster</a> could tell them apart.) If the stakes are high, then refine your model, and draw more samples.</p>
<h2>Conclusion</h2>
<p>The meaning of 1000 true + 0 false simulations depends on your prior beliefs: the posterior mean could be 0.999 (with a uniform prior), or anything less than 0.99 (with a prior weighted more towards the center or zero).</p>
<p>I love <span class="caps">BDA3</span>; I’m nowhere near finished, but even the first chapters have taught me new ideas and proofs (e.g. the Bayesian cookbook in section 3.8, or modeling normal data with unknown mean <em>and</em> variance). The examples and exercises are a great combination of applications and theory. As you can see from this post, all I can do is nitpick some tiny details. A quick intro to practical Bayesian modeling is a <a href="https://www.youtube.com/watch?v=T1gYvX5c2sM">presentation from Andrew Gelman</a>.</p>
<p>Did you like this post, did I make a mistake, or do you know a <span class="caps">BDA3</span> discussion group? Let me know in the comments below!</p>
<h2>Code</h2>
<div class="highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">math</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="nn">st</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="kn">from</span> <span class="nn">IPython.display</span> <span class="kn">import</span> <span class="n">set_matplotlib_formats</span>
<span class="o">%</span><span class="n">matplotlib</span> <span class="n">inline</span>
<span class="n">set_matplotlib_formats</span><span class="p">(</span><span class="s1">'svg'</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="n">posterior</span> <span class="o">=</span> <span class="n">st</span><span class="o">.</span><span class="n">beta</span><span class="p">(</span><span class="mi">1001</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">plot_beta</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">rv</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mi">1001</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="o">**</span><span class="n">plot_kwargs</span><span class="p">):</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">xs</span><span class="p">,</span> <span class="n">rv</span><span class="o">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">xs</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s1">'pdf'</span><span class="p">,</span> <span class="o">**</span><span class="n">plot_kwargs</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="n">xs</span><span class="p">[[</span><span class="mi">0</span><span class="p">,</span><span class="o">-</span><span class="mi">1</span><span class="p">]])</span>
<span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"θ"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Probability density function"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Pdf of Beta(θ | α = </span><span class="si">{</span><span class="n">alpha</span><span class="si">}</span><span class="s2">, β = </span><span class="si">{</span><span class="n">beta</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">grid</span><span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="n">plot_beta</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1000</span><span class="p">),</span> <span class="n">posterior</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="o">-</span><span class="mi">10</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="o">-</span><span class="mf">0.01</span><span class="p">,</span> <span class="mf">1.01</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">savefig</span><span class="p">(</span><span class="s2">"beta-1000-pdf-big.svg"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="svg" src="https://www.treszkai.com/2019/07/24/approaching-certainty/beta-1000-pdf-big.svg"></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">ramanujan</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="sd">"""Series that converges to 1/π at an exponential rate,</span>
<span class="sd"> by Srinivasa Ramanujan"""</span>
<span class="k">return</span> <span class="mi">8</span><span class="o">**.</span><span class="mi">5</span> <span class="o">/</span> <span class="mi">9801</span> <span class="o">*</span> <span class="nb">sum</span><span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">factorial</span><span class="p">(</span><span class="mi">4</span><span class="o">*</span><span class="n">k</span><span class="p">)</span>
<span class="o">/</span> <span class="n">math</span><span class="o">.</span><span class="n">factorial</span><span class="p">(</span><span class="n">k</span><span class="p">)</span><span class="o">**</span><span class="mi">4</span>
<span class="o">/</span> <span class="mi">396</span><span class="o">**</span><span class="p">(</span><span class="mi">4</span><span class="o">*</span><span class="n">k</span><span class="p">)</span>
<span class="o">*</span> <span class="p">(</span><span class="mi">1103</span> <span class="o">+</span> <span class="mi">26390</span><span class="o">*</span><span class="n">k</span><span class="p">)</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">4</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"1/ramanujan(</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s2">) - π ≈ </span><span class="si">{</span><span class="mi">1</span><span class="o">/</span><span class="n">ramanujan</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">-</span> <span class="n">math</span><span class="o">.</span><span class="n">pi</span><span class="si">:</span><span class="s2">.2e</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="c1"># Easter egg. Thanks for reading!</span>
</code></pre></div>
<blockquote>
<p>1/ramanujan(1) - π ≈ 7.64e-08</p>
<p>1/ramanujan(2) - π ≈ 4.44e-16</p>
<p>1/ramanujan(3) - π ≈ 0.00e+00</p>
</blockquote>
<div class="highlight"><pre><span></span><code><span class="n">plot_beta</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.990</span><span class="p">,</span><span class="mf">1.0</span><span class="p">,</span><span class="mi">1000</span><span class="p">),</span> <span class="n">posterior</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">vlines</span><span class="p">(</span><span class="n">posterior</span><span class="o">.</span><span class="n">mean</span><span class="p">(),</span> <span class="mi">0</span><span class="p">,</span> <span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">get_ylim</span><span class="p">()[</span><span class="mi">1</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="s1">'r'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s1">'mean'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">gca</span><span class="p">()</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="s1">'upper left'</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">savefig</span><span class="p">(</span><span class="s2">"beta-1000-pdf-zoomed.svg"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="svg" src="https://www.treszkai.com/2019/07/24/approaching-certainty/beta-1000-pdf-zoomed.svg"></p>
<div class="highlight"><pre><span></span><code><span class="n">posterior</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
</code></pre></div>
<blockquote>
<p>0.999001996007984</p>
</blockquote>
<div class="highlight"><pre><span></span><code><span class="nb">print</span><span class="p">(</span><span class="s1">'P(θ > 0.999) = </span><span class="si">{:d}</span><span class="s1">%'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">posterior</span><span class="o">.</span><span class="n">cdf</span><span class="p">(</span><span class="mf">0.999</span><span class="p">)))))</span>
</code></pre></div>
<blockquote>
<p>P(θ > 0.999) = 63%</p>
</blockquote>
<div class="highlight"><pre><span></span><code><span class="nb">print</span><span class="p">(</span><span class="s1">'P(θ > 0.998) = </span><span class="si">{:d}</span><span class="s1">%'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="mi">100</span><span class="o">*</span><span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">posterior</span><span class="o">.</span><span class="n">cdf</span><span class="p">(</span><span class="mf">0.998</span><span class="p">)))))</span>
</code></pre></div>
<blockquote>
<p>P(θ > 0.998) = 86%</p>
</blockquote>
<h2>References</h2>
<p>Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. 2013. <em>Bayesian Data Analysis: Third Edition</em>. <a href="http://www.stat.columbia.edu/~gelman/book/">Official webpage</a></p>Evaluation of function calls in Haskell2019-07-13T00:00:00+02:002019-07-13T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2019-07-13:/2019/07/13/haskell-eval/<p>Analyzing why point-free definitions in Haskell allow sharing the result of an inner function application, whereas pointful definitions do not.</p><p><em>(This post is discussed in <a href="https://haskellweekly.news/episode/15.html">episode 15</a> of the</em> Haskell Weekly Podcast.<em>)</em></p>
<p>Chapter 27 of <a href="http://haskellbook.com/"><em>Haskell Programming from first principles</em></a> (by Christopher Allen and Julie Moronuki) is about the evaluation system of Haskell, with a focus on non-strictness. In the section <em>Preventing sharing on purpose</em>, they write you want to prevent sharing the result of a function call when it would mean storing some big data just to calculate a small result. Two examples are provided to demonstrate the alternatives. In the first, the result of <code>g _</code> is not shared but calculated twice:</p>
<div class="highlight"><pre><span></span><code><span class="kt">Prelude</span><span class="o">></span> <span class="n">f</span> <span class="n">x</span> <span class="ow">=</span> <span class="p">(</span><span class="n">x</span> <span class="mi">3</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span> <span class="mi">10</span><span class="p">)</span>
<span class="kt">Prelude</span><span class="o">></span> <span class="n">g'</span> <span class="ow">=</span> <span class="nf">\</span><span class="kr">_</span> <span class="ow">-></span> <span class="n">trace</span> <span class="s">"hi g'"</span> <span class="mi">2</span>
<span class="kt">Prelude</span><span class="o">></span> <span class="n">f</span> <span class="n">g'</span>
<span class="nf">hi</span> <span class="n">g'</span>
<span class="nf">hi</span> <span class="n">g'</span>
<span class="mi">4</span>
</code></pre></div>
<p>In the second, the result of <code>g _</code> <em>is</em> shared, i.e. calculated only once and the result is stored:</p>
<div class="highlight"><pre><span></span><code><span class="kt">Prelude</span><span class="o">></span> <span class="n">g</span> <span class="ow">=</span> <span class="n">const</span> <span class="p">(</span><span class="n">trace</span> <span class="s">"hi g"</span> <span class="mi">2</span><span class="p">)</span>
<span class="kt">Prelude</span><span class="o">></span> <span class="n">f</span> <span class="n">g</span>
<span class="nf">hi</span> <span class="n">g</span>
<span class="mi">4</span>
</code></pre></div>
<p>(Edited to add:) In practice, sharing is usually achieved with a <code>let</code> expression or a <code>where</code> construct.</p>
<p>(Note that this latter is called a <a href="https://wiki.haskell.org/Pointfree">“point-free” definition</a>.)</p>
<p>The authors conclude that</p>
<blockquote>
<p>functions aren’t shared when there are named arguments but are when the arguments are elided, as in pointfree. So, one way to prevent sharing is adding named arguments.</p>
</blockquote>
<p>(Quoted from version 1.<span class="caps">0RC4</span> of the book.)</p>
<p>In this post I analyze the runtime differences between point-free and pointful definitions.</p>
<h2>Behind the scenes</h2>
<p>As <a href="#Further-resources">Tom Ellis describes</a>, the definitions of <code>g</code> and <code>f</code> translate to the following (in a close approximation to the “Core” language used during compilation):</p>
<div class="highlight"><pre><span></span><code><span class="nf">f</span> <span class="ow">=</span> <span class="nf">\</span><span class="n">x</span> <span class="ow">-></span> <span class="kr">let</span> <span class="p">{</span><span class="n">x3</span> <span class="ow">=</span> <span class="n">x</span> <span class="mi">3</span><span class="p">;</span> <span class="n">x10</span> <span class="ow">=</span> <span class="n">x</span> <span class="mi">10</span><span class="p">}</span> <span class="kr">in</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="n">x3</span> <span class="n">x10</span>
<span class="nf">g</span> <span class="ow">=</span> <span class="kr">let</span> <span class="p">{</span><span class="n">tg</span> <span class="ow">=</span> <span class="n">trace</span> <span class="s">"hi g"</span> <span class="mi">2</span><span class="p">}</span> <span class="kr">in</span> <span class="nf">\</span><span class="n">y</span> <span class="ow">-></span> <span class="n">const</span> <span class="n">tg</span> <span class="n">y</span>
<span class="nf">g'</span> <span class="ow">=</span> <span class="nf">\</span><span class="kr">_</span> <span class="ow">-></span> <span class="n">trace</span> <span class="s">"hi g'"</span> <span class="mi">2</span>
</code></pre></div>
<p>(Calling <code>f g</code> with these definitions does <em>not</em> result in the same trace in GHCi 8.6.5 as with the original definitions. However, the code has the expected behavior if loaded into GHCi from a source file like <a href="#Sharing">that below</a>.)</p>
<p>Two things to point out here. First, every function definition is a lambda. Second, <code>g</code> was turned into a <em>let</em> expression because we can only apply functions to variables or literals (in Core), not to function calls. <em>Edited to add:</em> It would be reasonable to ask why <code>g = const (trace "hi g" 2)</code> doesn’t translate to <code>\y -> let {tg = trace "hi g" 2} in const tg y</code> (similar to <code>f</code>), to which the pragmatic answer is that <em>apparently</em> the order is the following:</p>
<ol>
<li>not-fully-applied functions are turned into lambdas,</li>
<li>parameters that are function calls are turned into named variables, and</li>
<li>named function arguments from the left-hand side of <code>=</code> are moved to the right as a lambda.</li>
</ol>
<h2>Evaluation with sharing</h2>
<p>This is what happens during the evaluation of <code>f g</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nf">ans</span> <span class="ow">=</span> <span class="n">f</span> <span class="n">g</span>
</code></pre></div>
<p><code>ans</code> is a function call, so its evaluation proceeds with substituting <code>g</code> for the argument of <code>f</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nf">ans</span> <span class="ow">=</span> <span class="kr">let</span> <span class="p">{</span><span class="n">x3</span> <span class="ow">=</span> <span class="n">g</span> <span class="mi">3</span><span class="p">;</span> <span class="n">x10</span> <span class="ow">=</span> <span class="n">g</span> <span class="mi">10</span><span class="p">}</span> <span class="kr">in</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="n">x3</span> <span class="n">x10</span>
</code></pre></div>
<p><code>ans</code> is a <em>let</em> expression, so we put the following <em>thunks</em> for <code>x3</code> and <code>x10</code> on the heap under some unique name:</p>
<div class="highlight"><pre><span></span><code><span class="c1">-- Heap:</span>
<span class="nf">ans_x3</span> <span class="ow">=</span> <span class="n">g</span> <span class="mi">3</span>
<span class="nf">ans_x10</span> <span class="ow">=</span> <span class="n">g</span> <span class="mi">10</span>
</code></pre></div>
<p>…and then proceed with evaluating the <em>in</em> part:</p>
<div class="highlight"><pre><span></span><code><span class="nf">ans</span> <span class="ow">=</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="n">ans_x3</span> <span class="n">ans_x10</span>
</code></pre></div>
<p>During the evaluation of this function call, <code>ans_x3</code> will be evaluated (or potentially <code>ans_x10</code> first, or both in parallel). <code>ans_x3</code> is a function call, so first we evaluate <code>g</code> to a lambda. As <code>g</code> is a <em>let</em> expression, we create a closure for <code>trace "hi g" 2</code> on the heap, and then continue with the <em>in</em> part of <code>g</code> (<code>\y -> const tg y</code>). This is a lambda now, meaning it’s in weak head normal form, so the heap contents for <code>g</code> is overwritten with that:</p>
<div class="highlight"><pre><span></span><code><span class="c1">-- Heap:</span>
<span class="nf">g_tg</span> <span class="ow">=</span> <span class="n">trace</span> <span class="s">"hi g"</span> <span class="mi">2</span>
<span class="nf">g</span> <span class="ow">=</span> <span class="nf">\</span><span class="n">y</span> <span class="ow">-></span> <span class="n">const</span> <span class="n">g_tg</span> <span class="n">y</span>
</code></pre></div>
<p>Back to <code>ans_x3</code>, now the argument <code>3</code> is substituted in the definition of <code>g</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nf">ans_x3</span> <span class="ow">=</span> <span class="n">const</span> <span class="n">g_tg</span> <span class="mi">3</span>
</code></pre></div>
<p>This is a function call, with <code>const</code> already a lambda <code>\x _ -> x</code>, so the arguments can now be substituted in the body, leaving us with</p>
<div class="highlight"><pre><span></span><code><span class="c1">-- Heap:</span>
<span class="nf">ans_x3</span> <span class="ow">=</span> <span class="n">g_tg</span> <span class="c1">-- (Pointer to the same address as g_tg.)</span>
</code></pre></div>
<p>During the evaluation of <code>g_tg</code>, the magic printout happens (<code>hi g</code> on stdout), and its value is resolved to be <code>2</code>, so the heap is updated as such:</p>
<div class="highlight"><pre><span></span><code><span class="c1">-- Heap:</span>
<span class="nf">g_tg</span> <span class="ow">=</span> <span class="mi">2</span>
</code></pre></div>
<p>And <code>ans_x3</code> is a pointer to the same memory content <code>2</code>.</p>
<p>Analogously, the evaluation of <code>ans_x10</code> proceeds as such:</p>
<div class="highlight"><pre><span></span><code><span class="nf">ans_x10</span> <span class="ow">=</span> <span class="n">const</span> <span class="n">g_tg</span> <span class="mi">10</span>
<span class="nf">ans_x10</span> <span class="ow">=</span> <span class="n">g_tg</span>
<span class="c1">-- let ans_x10 points to the memory location of g_tg:</span>
<span class="nf">ans_x10</span> <span class="ow">=</span> <span class="mi">2</span>
</code></pre></div>
<p>Finally, <code>ans = (+) ans_x3 ans_x10</code>, which evaluates to <code>ans = 4</code>.</p>
<h2>Evaluation without sharing</h2>
<p>In contrast, the evaluation of <code>f g'</code> proceeds as follows:</p>
<div class="highlight"><pre><span></span><code><span class="nf">ans'</span> <span class="ow">=</span> <span class="n">f</span> <span class="n">g'</span>
<span class="nf">ans'</span> <span class="ow">=</span> <span class="kr">let</span> <span class="p">{</span><span class="n">x3</span> <span class="ow">=</span> <span class="n">g'</span> <span class="mi">3</span><span class="p">;</span> <span class="n">x10</span> <span class="ow">=</span> <span class="n">g'</span> <span class="mi">10</span><span class="p">}</span> <span class="kr">in</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="n">x3</span> <span class="n">x10</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="c1">-- Heap:</span>
<span class="nf">ans_x3'</span> <span class="ow">=</span> <span class="n">g'</span> <span class="mi">3</span>
<span class="nf">ans_x10'</span> <span class="ow">=</span> <span class="n">g'</span> <span class="mi">10</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="nf">ans'</span> <span class="ow">=</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="n">ans_x3'</span> <span class="n">ans_x10'</span>
<span class="nf">ans_x3'</span> <span class="ow">=</span> <span class="n">trace</span> <span class="s">"hi g'"</span> <span class="mi">2</span>
</code></pre></div>
<p>Now <code>hi g'</code> is printed, and the heap is updated:</p>
<div class="highlight"><pre><span></span><code><span class="c1">-- Heap:</span>
<span class="nf">ans_x3'</span> <span class="ow">=</span> <span class="mi">2</span>
</code></pre></div>
<p>When evaluating <code>ans_x10'</code>, we <strong>again print</strong> <code>hi g'</code>, and store the result of the trace under a different thunk:</p>
<div class="highlight"><pre><span></span><code><span class="c1">-- Heap:</span>
<span class="nf">ans_x10'</span> <span class="ow">=</span> <span class="mi">2</span>
</code></pre></div>
<p>Now <code>ans'</code> evaluates to <code>(+) 2 2</code>, i.e. <code>4</code>.</p>
<h2>Attempt at verifying my translated definitions</h2>
<p>I attempted to verify what I was saying above about the definitions of <code>f</code>, <code>g</code>, <code>g'</code> in Core, using the <code>-ddump-simpl</code> compiler flag of GHCi, but it didn’t fulfil my expectations.</p>
<p><a name="Sharing"></a>Sharing.hs:</p>
<div class="highlight"><pre><span></span><code><span class="kr">module</span> <span class="nn">Sharing</span> <span class="kr">where</span>
<span class="kr">import</span> <span class="nn">Debug.Trace</span>
<span class="nf">f</span> <span class="n">x</span> <span class="ow">=</span> <span class="p">(</span><span class="n">x</span> <span class="p">(</span><span class="mi">3</span><span class="ow">::</span><span class="kt">Int</span><span class="p">))</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span> <span class="mi">10</span><span class="p">)</span> <span class="ow">::</span> <span class="kt">Int</span>
<span class="nf">g</span> <span class="ow">=</span> <span class="n">const</span> <span class="p">(</span><span class="n">trace</span> <span class="s">"hi g"</span> <span class="p">(</span><span class="mi">2</span><span class="ow">::</span><span class="kt">Int</span><span class="p">))</span> <span class="c1">-- share</span>
<span class="nf">g'</span> <span class="ow">=</span> <span class="nf">\</span><span class="kr">_</span> <span class="ow">-></span> <span class="n">trace</span> <span class="s">"hi g'"</span> <span class="p">(</span><span class="mi">2</span><span class="ow">::</span><span class="kt">Int</span><span class="p">)</span> <span class="c1">-- don't share</span>
<span class="nf">g''</span> <span class="ow">=</span> <span class="kr">let</span> <span class="p">{</span><span class="n">tg</span> <span class="ow">=</span> <span class="n">trace</span> <span class="s">"hi g"</span> <span class="p">(</span><span class="mi">2</span><span class="ow">::</span><span class="kt">Int</span><span class="p">)}</span> <span class="kr">in</span> <span class="nf">\</span><span class="n">y</span> <span class="ow">-></span> <span class="n">const</span> <span class="n">tg</span> <span class="n">y</span> <span class="c1">-- share (equivalent to g)</span>
</code></pre></div>
<p>In GHCi:</p>
<div class="highlight"><pre><span></span><code><span class="n">Prelude</span><span class="o">></span> <span class="p">:</span><span class="k">set</span> <span class="o">-</span><span class="n">ddump</span><span class="o">-</span><span class="n">simpl</span> <span class="o">-</span><span class="n">dsuppress</span><span class="o">-</span><span class="k">all</span> <span class="o">-</span><span class="n">Wno</span><span class="o">-</span><span class="n">missing</span><span class="o">-</span><span class="n">signatures</span>
<span class="n">Prelude</span><span class="o">></span> <span class="p">:</span><span class="n">l</span> <span class="n">Sharing</span>
<span class="p">[</span><span class="mi">1</span> <span class="k">of</span> <span class="mi">1</span><span class="p">]</span> <span class="n">Compiling</span> <span class="n">Sharing</span> <span class="p">(</span> <span class="n">Sharing</span><span class="p">.</span><span class="n">hs</span><span class="p">,</span> <span class="n">interpreted</span> <span class="p">)</span>
<span class="o">====================</span> <span class="n">Tidy</span> <span class="n">Core</span> <span class="o">====================</span>
<span class="k">Result</span> <span class="k">size</span> <span class="k">of</span> <span class="n">Tidy</span> <span class="n">Core</span>
<span class="o">=</span> <span class="err">{</span><span class="n">terms</span><span class="p">:</span> <span class="mi">52</span><span class="p">,</span> <span class="n">types</span><span class="p">:</span> <span class="mi">39</span><span class="p">,</span> <span class="n">coercions</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="n">joins</span><span class="p">:</span> <span class="mi">0</span><span class="o">/</span><span class="mi">0</span><span class="err">}</span>
<span class="n">f</span> <span class="o">=</span> <span class="err">\</span> <span class="n">x_a1Fl</span> <span class="o">-></span> <span class="o">+</span> <span class="err">$</span><span class="n">fNumInt</span> <span class="p">(</span><span class="n">x_a1Fl</span> <span class="p">(</span><span class="n">I</span><span class="o">#</span> <span class="mi">3</span><span class="o">#</span><span class="p">))</span> <span class="p">(</span><span class="n">x_a1Fl</span> <span class="p">(</span><span class="n">I</span><span class="o">#</span> <span class="mi">10</span><span class="o">#</span><span class="p">))</span>
<span class="k">g</span> <span class="o">=</span> <span class="err">\</span> <span class="o">@</span> <span class="n">b_a1Gi</span> <span class="o">-></span> <span class="n">const</span> <span class="p">(</span><span class="n">trace</span> <span class="p">(</span><span class="n">unpackCString</span><span class="o">#</span> <span class="ss">"hi g"</span><span class="o">#</span><span class="p">)</span> <span class="p">(</span><span class="n">I</span><span class="o">#</span> <span class="mi">2</span><span class="o">#</span><span class="p">))</span>
<span class="k">g</span><span class="s1">' = \ @ p_a1G6 -> \ _ -> trace (unpackCString# "hi g'</span><span class="ss">"#) (I# 2#)</span>
<span class="ss">tg_r1F4 = trace (unpackCString# "</span><span class="n">hi</span> <span class="k">g</span><span class="err">"</span><span class="o">#</span><span class="p">)</span> <span class="p">(</span><span class="n">I</span><span class="o">#</span> <span class="mi">2</span><span class="o">#</span><span class="p">)</span>
<span class="k">g</span><span class="s1">''</span> <span class="o">=</span> <span class="err">\</span> <span class="o">@</span> <span class="n">b_a1FJ</span> <span class="o">-></span> <span class="err">\</span> <span class="n">y_a1Fn</span> <span class="o">-></span> <span class="n">const</span> <span class="n">tg_r1F4</span> <span class="n">y_a1Fn</span>
<span class="p">...</span> <span class="k">and</span> <span class="k">some</span> <span class="k">more</span> <span class="n">stuff</span>
</code></pre></div>
<p>Nonetheless, as <a href="https://stackoverflow.com/a/6121495/8424390">a <span class="caps">SO</span> answer describes</a>, we can see that a function application in Core is defined as <code>Expr Atom</code>, where <em>Atom</em> is <code>var | Literal</code>. I attempted to install <a href="http://hackage.haskell.org/package/ghc-core">ghc-core</a> but the build failed, so further analysis is put on the shelf.</p>
<h2>Conclusions</h2>
<p>So, what’s the essential difference between <code>g</code> and <code>g'</code>?</p>
<p><code>g = const (trace "hi g" 2)</code> is a function application where the argument is a function application, which is treated as a <em>let</em> expression. When you evaluate <code>g ()</code>, the auxiliary variable introduced by the <em>let</em> – i.e.,<code>tg = trace "hi g" 2</code> – is evaluated to a literal and its value is stored on the heap. On subsequent calls, some other argument can be applied to the <code>const tg</code> function, but its first argument <code>tg</code> is already evaluated.</p>
<p>In contrast, <code>g' = \_ -> trace "hi g'" 2</code> is a lambda, so it is already fully evaluated, and nothing in it can be simplified further. If we apply <code>g'</code> first to the argument <code>()</code>, the expression <code>g' ()</code> will evaluate to the body of <code>g'</code> with the unused parameter discarded, i.e. <code>trace "hi g'" 2</code>. If we later evaluate <code>g' []</code>, then it again results in the (same) body after the (dummy) function application. Nowhere during this process did we store the value of <code>trace "hi g'" 2</code>: in particular, we didn’t update the definition of <code>g'</code> to <code>\_ -> 2</code>, simply because that is not the definition of <code>g'</code>. (But could we have updated it? Even though functions are always pure, I think the answer is generally <em>no</em>: sometimes the result of a function is bigger than the definition, and the result is not needed often enough to warrant this speed–memory tradeoff.)</p>
<p>Recall the original wording:</p>
<blockquote>
<p>functions aren’t shared when there are named arguments but are when the arguments are elided, as in pointfree.</p>
</blockquote>
<p>As we saw, <em>functions</em> themselves are never shared. Rather, if <code>g</code> is a partially applied function whose argument is a function application <code>fun arg</code>, then <code>g</code> is equivalent to a <em>let</em> expression, and after its first evaluation <code>g</code> will <em>change</em> to a lambda with <code>fun arg</code> already evaluated.</p>
<p>As a generally-okay heuristic, point-free definitions allow sharing inner function calls, whereas nothing in a lambda (or a function with all arguments on the left-hand side) is shared.</p>
<h2>Further resources</h2>
<p>More details on similar behavior are given by Tom Ellis in his talk <a href="https://skillsmatter.com/skillscasts/8726-haskell-programs-how-do-they-run"><em>Haskell programs: how do they run?</em></a> (free registration required to watch the talk).</p>
<p>The <a href="https://skillsmatter.com/skillscasts/8800-functional-and-low-level-watching-the-stg-execute">talk of David Luposchainsky (a.k.a. <code>quchen</code>)</a> goes into more depth – down to the Core –, in which he uses his own implementation of the spineless tagless graph reduction machine (<span class="caps">STG</span>), to visualize the evaluation of any given Haskell code (<a href="https://github.com/quchen/stgi">link to repo</a>).</p>The wise men puzzle2018-08-18T00:00:00+02:002018-08-18T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2018-08-18:/2018/08/18/wise-men/<p>Analyzing the wise men puzzle of modal logic.</p><p>Today I understood the wise men puzzle at a conceptual level, well enough that I could explain it and possibly generalize to similar domains. This post is my attempt at explaining it.</p>
<p>The puzzle is described in <!-- {% cite Huth2000-Logic-book %} --> <a class="citation" href="#Huth2000-Logic-book">(Huth <span class="amp">&</span> Ryan, 2000)</a> as follows:</p>
<blockquote>
<p>There are three wise men. It’s common knowledge—known by everyone and known to be known by everyone, etc.—that there are three red hats and two white hats. The king puts a hat on each of the wise men in such a way that they are not able to see their own hat, and asks each one in turn whether they are not able to see their own hat, and asks each one in turn whether they know the color of the hat on their head. Suppose the first man says he does not know; then the second says he does not know either.
It follows that the third man must be able to say that he knows the colour of his hat. Why is this? What colour has the third man’s hat?</p>
</blockquote>
<p>Let’s call the people Alpha, Beta, Gamma, in the order they speak.</p>
<p>One solution is to think about the puzzle in terms of possible worlds. A world in this problem is described by an assignment of hat colors to people, which is equally an ordered triple of colours <script type="math/tex">⟨c_1, c_2, c_3⟩</script>, with <script type="math/tex">c_i ∈ \{R,W\}</script>. There are only 2 white hats, so in the beginning, the seven possible worlds are</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{cc}
⟨R,R,R⟩ & ⟨R,R,W⟩ & ⟨R,W,R⟩ & ⟨R,W,W⟩ \\
⟨W,R,R⟩ & ⟨W,R,W⟩ & ⟨W,W,R⟩ & \\
\end{array}. %]]></script>
<p>If Beta and Gamma were both wearing white hats, then Alpha would know that that his hat is red. Therefore, when Alpha says “no”, Beta and Gamma both learn that both of them cannot be white, i.e. at least one of them is red. The remaining possible worlds are</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{cc}
{⟨R,R,R⟩} & ⟨R,R,W⟩ & ⟨R,W,R⟩ & \crossed{⟨R,W,W⟩} \\
{⟨W,R,R⟩} & ⟨W,R,W⟩ & ⟨W,W,R⟩ & \\
\end{array}. %]]></script>
<p>Now, <em>we</em> know that the world is one of the 6 worlds above, but Beta also sees the hats of Alpha and Gamma. What we think as outsiders only matters for whether <em>we</em> can tell who’s wearing what.
But back to the observations of A,B,C. When Beta says “no”, that rules out the worlds where Gamma is white (because then Beta would be red).</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{array}{cc}
{⟨R,R,R⟩} & \crossed{⟨R,R,W⟩} & ⟨R,W,R⟩ & \crossed{⟨R,W,W⟩} \\
{⟨W,R,R⟩} & \crossed{⟨W,R,W⟩} & ⟨W,W,R⟩ & \\
\end{array} %]]></script>
<p>This means that Gamma is red, and he also knows this.</p>
<h1>Another way</h1>
<p>Our solution is more procedural than is necessary, and it does not show the essence of omniscient agents acting with one another. As this problem is small enough, we could list for every world every statement any agent could make, which is simply their knowledge base of true statements (i.e. whatever they can deduce from their view and from the common knowledge, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>). (Say, with atoms <script type="math/tex">R_1, R_2, R_3, W_1, W_2, W_3</script>, meaning “I think person <script type="math/tex">i</script> has color X”, with a <script type="math/tex">⟨R,W,R⟩</script> abbreviating <script type="math/tex">R_1\wedge W_2 \wedge R_3</script>.) We can only do this because we are not interested in making statements like “X knows that Y knows that Z knows that φ”.
Besides, in every world, we implicitly include what is common knowledge, and what any agent can see, i.e. the whole problem statement in the opening paragraph.
The common knowledge at the beginning in any of these worlds is <script type="math/tex">\lnot⟨W,W,W⟩</script>. That’s not very much, but at least symmetric, which allows us to write down only three worlds.</p>
<p><strong>World</strong> <script type="math/tex">⟨R,R,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨W,R,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,R,W⟩</script>.</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨R,R,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">⟨W,R,W⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">⟨R,W,W⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,R,W⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨R,W,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,W,W⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">⟨R,W,W⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">⟨R,W,W⟩</script>
</li>
</ul>
<p>When Alpha says “no” in the beginning, that means he is not in a world where from his knowledge base he can conclude his own colour. His statement becomes common knowledge (<abbr title="Common knowledge"><span class="caps">CK</span></abbr>), i.e. <abbr title="Common knowledge"><span class="caps">CK</span></abbr> is extended with <script type="math/tex">\lnot(W_2\wedge W_3)</script>.</p>
<p><strong>World</strong> <script type="math/tex">⟨R,R,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨W,R,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,R,W⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨R,R,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">⟨W,R,W⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">\crossed{⟨R,W,W⟩}</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,R,W⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨R,W,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">\crossed{⟨R,W,W⟩}</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨R,W,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">\crossed{⟨R,W,W⟩}</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">\crossed{⟨R,W,W⟩}</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">\crossed{⟨R,W,W⟩}</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,R,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨W,R,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,R,W⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,R,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">⟨W,R,W⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨W,R,W⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,R,W⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,W,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,W,R⟩</script>
</li>
</ul>
<p>We were able to cross out some worlds! And in the world <script type="math/tex">⟨R,W,W⟩</script> we were left with zero possible worlds for Alpha, i.e. Alpha’s statement would lead to a contradiction: he would have answered “yes”. In fact, this was how we eliminated possible-worlds in the previous solution. Next turn: the king asks Beta, who says “no”. The common knowledge is extended with <script type="math/tex">\lnot(W_1 \wedge W_3)</script>. (Right? At this point I can imagine myself making an incorrect deduction.)</p>
<p><strong>World</strong> <script type="math/tex">⟨R,R,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨W,R,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,R,W⟩</script>
</li>
</ul>
<div id="mistaken1" style="display: block;">
<p><strong>World</strong> <script type="math/tex">⟨R,R,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">\crossed{⟨W,R,W⟩}</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,W⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,R,W⟩</script>
</li>
</ul>
</div>
<div id="fixed1" style="display: none;">
<p><strong>World</strong> <script type="math/tex">⟨R,R,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">\crossed{⟨R,R,W}⟩</script>, <script type="math/tex">\crossed{⟨W,R,W⟩}</script>
</li>
<li>Beta: <script type="math/tex">\crossed{⟨R,R,W}⟩</script>
</li>
<li>Gamma: <script type="math/tex">\crossed{⟨R,R,R}⟩</script>, <script type="math/tex">\crossed{⟨R,R,W}⟩</script>
</li>
</ul>
</div>
<p><strong>World</strong> <script type="math/tex">⟨R,W,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,W,R⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,R,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨W,R,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">\crossed{⟨W,R,W⟩}</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,R,W⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,W⟩</script>, <script type="math/tex">\crossed{⟨W,R,W⟩}</script>
</li>
<li>Beta: <script type="math/tex">\crossed{⟨W,R,W⟩}</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">\crossed{⟨W,R,W⟩}</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,W,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,W,R⟩</script>
</li>
</ul>
<p>Another world disappeared. But what about <script type="math/tex">⟨R,R,W⟩</script>, why is it still there, when last time we argued that it’s not possible for Gamma to be white? In fact, it is not: in that world Beta would have said yes, as he knew what colour he had.
Although never explicitly stated, we assumed that if someone’s not then he’s white, and vice versa. Use <script type="math/tex">\star</script> to denote this fact:</p>
<script type="math/tex; mode=display">% <![CDATA[
\star ≡ \bigwedge_{i=1}^3 (\lnot R_i → W_i) \wedge (\lnot W_i → R_i). %]]></script>
<p>We also know that common knowledge is true: for every formula <script type="math/tex">φ</script>, it’s an axiom that <script type="math/tex">\mathcal C φ → φ</script>.
Then, it’s simple to show that Alpha is red and Gamma is white, Beta is red.</p>
<script type="math/tex; mode=display">% <![CDATA[
\mathcal C \Big((R_1 \vee R_2 \vee R_3) \wedge (R_2 \vee R_3) \wedge (R_1 \vee R_3) \Big) \wedge \star \vdash
(R_1 \wedge W_3) → R_2. %]]></script>
<script type="text/javascript">
function showById(id, btn, displayStyle) {
document.getElementById(id).style.display = 'block';
btn.style.display = 'none';
}
function showInlineById(id, btn, displayStyle) {
document.getElementById(id).style.display = 'inline';
btn.style.display = 'none';
}
function hideById(id, btn) {
document.getElementById(id).style.display = 'none';
btn.style.display = 'none';
}
</script>
<p>Click this to fix that above: <a href="#" onclick="showById('fixed1', this); hideById('mistaken1', this); return false;">click me!</a> (Needs JavaScript.)</p>
<p>Now we are left with the following worlds:</p>
<p><strong>World</strong> <script type="math/tex">⟨R,R,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨W,R,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,R,R⟩</script>, <span id="mistaken2" markdown="1"><script type="math/tex">⟨R,R,W⟩</script></span> <span id="fixed2" style="display: none;"><script type="math/tex">\crossed{⟨R,R,W⟩}</script></span></li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨R,W,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨R,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨R,W,R⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,R,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,R,R⟩</script>, <script type="math/tex">⟨W,R,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,R,R⟩</script>
</li>
</ul>
<p><strong>World</strong> <script type="math/tex">⟨W,W,R⟩</script>, <abbr title="Common knowledge"><span class="caps">CK</span></abbr>: <script type="math/tex">\star \wedge \lnot⟨W,W,W⟩\ \wedge \lnot(W_2\wedge W_3) \wedge \lnot(W_1\wedge W_3)</script>:</p>
<ul>
<li>Alpha: <script type="math/tex">⟨R,W,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Beta: <script type="math/tex">⟨W,R,R⟩</script>, <script type="math/tex">⟨W,W,R⟩</script>
</li>
<li>Gamma: <script type="math/tex">⟨W,W,R⟩</script>
</li>
</ul>
<p>At first sight, Gamma’s knowledge base in some worlds (<script type="math/tex">⟨R,R,R⟩</script>) contains a world with <script type="math/tex">\lnot R_3</script>. But every four of the above worlds has <script type="math/tex">R_3</script>, meaning <script type="math/tex">R_3</script> is deducible from <script type="math/tex">\star</script> and the <abbr title="Common knowledge"><span class="caps">CK</span></abbr>, making <script type="math/tex">⟨R,R,W⟩</script> in world <script type="math/tex">⟨R,R,R⟩</script> impossible. <a href="#" onclick="showInlineById('fixed2', this); hideById('mistaken2', this); return false;">Click me to fix that.</a> This means <script type="math/tex">R_3</script> is <abbr title="Common knowledge"><span class="caps">CK</span></abbr>. Yay!</p>
<p>Note: there might be some other true statements that could be deduced, so maybe Alpha knows his colour too in some worlds—I haven’t solved the problem in full. For example, when Gamma answers “yes” in the end, it doesn’t say anything we didn’t already know, and nothing that Alpha and Beta didn’t know already, as <script type="math/tex">R_3</script> can be deduced from the common knowledge. Maybe someone else knows theirs too?</p>
<h1>Another problem</h1>
<p>A slight modification is to map a natural number <script type="math/tex">k</script> to worlds where X is able to decide their colour after <script type="math/tex">k</script> utterances, if it wasn’t X who spoke last.</p>
<p>Related: it feels like there is a situation with <script type="math/tex">n>2</script> people, where two agents can keep on discarding possible worlds just by them speaking in turns. If you know of one such problem, please let me know.</p>
<h1>Notes</h1>
<p>I hope I didn’t make a mistake in the calculations, I admit I enumerated the possible worlds by hand instead of with Prolog.</p>
<h1>Conclusion</h1>
<p>Listen to people when they say “no”.</p>
<h1>References</h1>
<!— {% bibliography —cited %} —>
<ol class="bibliography"><li><span id="Huth2000-Logic-book">Huth, M., <span class="amp">&</span> Ryan, <span class="caps">M. D.</span>(2000). <i>Logic in Computer Science - modelling and reasoning about systems</i>. Cambridge University Press.</span></li></ol>Blog post summary: Medical AI safety: where are we and where are we heading2018-07-11T00:00:00+02:002018-07-11T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2018-07-11:/2018/07/11/medical-safety/<p>I summarize a blog post about medical <span class="caps">AI</span> safety, which describes the potential consequences of using advanced medical systems without sufficient evidence to back up their usefulness.</p><p>In this post I summarize a <a href="https://lukeoakdenrayner.wordpress.com/2018/07/11/medical-ai-safety-we-have-a-problem/">blog post about “medical <span class="caps">AI</span> safety”</a>: the potential consequences of using advanced medical systems without sufficient evidence to back up their usefulness.</p>
<p><em>Epistemic status: the author (Luke Oakden-Rayner) is a PhD candidate radiologist, and I’m not an expert in medicine.</em></p>
<blockquote>
<p>For the first time ever, <span class="caps">AI</span> systems could actually be responsible for medical disasters.</p>
</blockquote>
<p>The risk of a medical <span class="caps">AI</span> system increases with its complexity: from the lowest complexity <em>processing systems</em>, through <em>triage systems</em> that order the priority queue of patients, we are now moving towards autonomous <em>diagnostic systems</em>, and eventually to autonomous <em>prediction systems</em>.</p>
<p>Some systems in the wild are worse than humans in both recall and sensitivity:</p>
<blockquote>
<p>“Not only did <span class="caps">CAD</span> [computer-aided diagnosis] increase the recalls without improving cancer detection, but, in some cases, even decreased sensitivity by missing some cancers.”</p>
</blockquote>
<p>Nonetheless, we are already proceeding to the next level:</p>
<blockquote>
<p>A few months ago the <span class="caps">FDA</span> approved a new <span class="caps">AI</span> system by IDx, and it makes independent medical decisions without the need for a clinician. [In this case, screening for eye disease through a retina scan.]</p>
</blockquote>
<p>But on the upside, these tools improve the ratio of people screened:</p>
<blockquote>
<p>But while there is a big potential upside here (about 50% of people with diabetes are not screened regularly enough), and the decision to “refer or not” is rarely immediately vision-threatening, approving a system like this without <em>clinical testing</em> raises some concerns.</p>
</blockquote>
<p>And systems operate now on a larger scale too:</p>
<blockquote>
<p><span class="caps">NHS</span> is already using an automated smart-phone triage system “powered by” babylonhealth <span class="caps">AI</span>. This one is definitely capable of leading to serious harm, since it recommends when to go (or not to go) to hospital.</p>
</blockquote>
<p>… which system gave 90% confidence to non-lethal diagnosis X, not even offering lethal diagnosis Y which was suggested by 90% of MDs on Twitter. (And I assume it’s not even an adversarial attack.) It’s fair to say that there is room for improvement. (Compare this with the amount of news coverage received by the monthly crash of an autonomous vehicle.)</p>
<blockquote>
<p>The real point is that none of the <span class="caps">FDA</span>, <span class="caps">NHS</span>, nor the various regulatory agencies in other nations appear to be concerned [to the extent required] about the specific risks of autonomous decision making <span class="caps">AI</span>.</p>
<p>Are we potentially racing towards an <span class="caps">AI</span> event on the scale of elixir sulfanilamide [which prompted the foundation of <span class="caps">FDA</span>] or thalidomide [which the <span class="caps">FDA</span> banned before other countries, preventing 10,000 birth malformations]?</p>
</blockquote>International Winter School on Gravity and Light, Tutorial 3: Multilinear Algebra – Solutions for Exercise 12018-06-09T00:00:00+02:002018-06-09T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2018-06-09:/2018/06/09/multilinear-tutorial/<p>Solutions for exercise 1 of tutorial 3 of the International Winter School on Gravity and Light.</p><p>Solutions for exercise 1 of <a href="https://www.youtube.com/watch?v=5oeWX3NUhMA">tutorial 3</a> of the <a href="https://gravity-and-light.herokuapp.com">International Winter School on Gravity and Light</a>. (<a href="https://www.youtube.com/watch?v=mbv3T15nWq0">Link to video of lecture 3</a>.)</p>
<h2>Notation</h2>
<p>On this solution sheet, I’ll speak of a vector space <script type="math/tex">(V,+,\cdot)</script> over a field <script type="math/tex">K</script>, where <script type="math/tex">+: V\times V \rightarrow V</script> is the addition and <script type="math/tex">\cdot: K \times V \rightarrow V</script> is called (scalar) multiplication or S-multiplication. The field <script type="math/tex">(K, \textcolor{red}{+}, \textcolor{red}{\cdot})</script> has <script type="math/tex">\textcolor{red}{+}:K\times K \rightarrow K</script> as addition and <script type="math/tex">\textcolor{red}{\cdot}:K\times K \rightarrow K</script> as multiplication operations. The dot is often omitted, i.e. <script type="math/tex">a \mathbf v</script> is short for <script type="math/tex">a \cdot \mathbf v</script>, <script type="math/tex">a b</script> is short for <script type="math/tex">a \textcolor{red}{\cdot} b</script>. (Note that the lecture dealt with real vector spaces, i.e. the field <script type="math/tex">K</script> was always the set of reals <script type="math/tex">\mathbb R</script>.)
The scalars, i.e. the elements of <script type="math/tex">K</script>, are denoted with normal letters <script type="math/tex">a,b</script>, and the vectors, i.e. the elements of <script type="math/tex">V</script>, are denoted with boldface letters <script type="math/tex">\mathbf u, \mathbf v, \mathbf w</script>.</p>
<script type="text/javascript">
function showById(id, btn) {
document.getElementById(id).style.display = 'block';
btn.style.display = 'none';
}
function showByClass(cls, btn) {
for (var x of document.getElementsByClassName(cls))
x.style.display = 'block';
btn.style.display = 'none';
}
function hideByClass(cls) {
for (var x of document.getElementsByClassName(cls))
x.style.display = 'none';
}
</script>
<h1>Exercise 1: True or false?</h1>
<p><em>Tick the correct statements, but not the incorrect ones.</em> <a href="#" onclick="showByClass('answer', this); hideByClass('show-answer'); return false;">Show all answers</a></p>
<p><em>a) Which statements on vector spaces are correct?</em></p>
<p><strong>?.</strong> <em>Commutativity of multiplication is a vector space axiom.</em> <a href="#" onclick="showById('answer1', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer1" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>The scalar multiplication <script type="math/tex">\cdot: K \times V \rightarrow V</script> doesn’t even have the same sets in its two arguments, i.e. <script type="math/tex">\mathbf v \cdot a</script> is not even defined.</li>
<li>The vector space has the commutativity of <em>addition</em> as an axiom: for any <script type="math/tex">\mathbf u,\mathbf v \in V</script>, <script type="math/tex">{\mathbf u+\mathbf v} = {\mathbf v + \mathbf u}</script>.</li>
<li>The underlying field <script type="math/tex">K</script>
<em>does</em> have the commutativity of multiplication as a field axiom: for any <script type="math/tex">a,b \in K</script>, <script type="math/tex">a \textcolor{red}{\cdot} b = b \textcolor{red}{\cdot} a</script>.</li>
<li>As a consequence, for any <script type="math/tex">\mathbf v \in V</script> and <script type="math/tex">a, b \in K</script>,</li>
</ul>
<script type="math/tex; mode=display">% <![CDATA[
a (b \mathbf v) = (a\textcolor{red}{\cdot} b)\mathbf v = (b \textcolor{red}{\cdot} a)\mathbf v = b(a \mathbf v). %]]></script>
</div>
<p><strong>?.</strong> <em>Every vector is a matrix with only one column.</em> <a href="#" onclick="showById('answer2', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer2" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>By definition, a vector is an element of a vector space. If we fix a basis for the vector space, then any vector can be represented by an ordered set of numbers, which could be treated as a column vector, i.e. a matrix with one column. However, this representation depends on the choice of basis.</li>
<li>The <a href="https://youtu.be/5oeWX3NUhMA?t=1m09s">official answer</a> brings up as a counterexample the vector space of polynomials up to some finite degree. However, here again we could represent the vectors as a column vector with any choice of a basis. E.g. using the standard basis, <script type="math/tex">p(x) = 0x^2 + 4x + 5 </script> could be represented as <script type="math/tex">\mathbf p = [0, 4, 5]^T</script>.</li>
</ul>
</div>
<p><strong>?.</strong> <em>Every linear map between vector spaces can be represented by a unique quadratic matrix.</em> <a href="#" onclick="showById('answer3', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer3" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>As above, a linear map <script type="math/tex">f: V \rightarrow W </script> can be represented as a unique matrix only once bases are chosen for its domain <script type="math/tex">V</script> and codomain <script type="math/tex">W</script>.</li>
<li>This matrix is quadratic only if the dimensions of <script type="math/tex">V</script> and <script type="math/tex">W</script> are equal.</li>
</ul>
</div>
<p><strong>?.</strong> <em>Every vector space has a corresponding dual vector space.</em> <a href="#" onclick="showById('answer4', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer4" style="display: none;">
<p><em>Answer:</em> true.</p>
<p>Clarification:</p>
<ul>
<li>The dual space of a vector space <script type="math/tex">V</script> is defined as the set of linear maps from <script type="math/tex">V</script> to <script type="math/tex">K</script>: <script type="math/tex">V^* \coloneqq Hom(V,K) \coloneqq \{φ\ \vert \ φ: V \linmap K\} </script>.</li>
</ul>
</div>
<p><strong>?.</strong> <em>The set of everywhere positive functions on <script type="math/tex">\mathbb R</script> with pointwise addition and S-multiplication is a vector space.</em> <a href="#" onclick="showById('answer5', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer5" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>This set doesn’t have a commutative identity element: by the field axioms of <script type="math/tex">\mathbb R</script>, it could only be the constant zero function, but that’s not an element of the set.</li>
<li>This set doesn’t have a commutative inverse for any element.</li>
<li>For the scalar multiplication we’d need to know the underlying field. Usually it would be <script type="math/tex">\mathbb R</script>, but then S-multiplication with a negative number wouldn’t result in an everywhere positive function. (Although one can construct a field from <script type="math/tex">\mathbb R^+</script>, I wonder how well that would combine with the above attempt at a vector space.)</li>
</ul>
</div>
<p>b) What is true about tensors and their components?</p>
<p><strong>?.</strong> <em>The tensor product of two tensors is a tensor.</em> <a href="#" onclick="showById('answer6', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer6" style="display: none;">
<p><em>Answer:</em> true.</p>
<p>Clarification:</p>
<ul>
<li>The lecture didn’t mention tensor products, so a definition is in order. The product of an <script type="math/tex"> (l,k) </script>-tensor <script type="math/tex">S</script> and an <script type="math/tex"> (n,m) </script>-tensor <script type="math/tex">T</script> is an <script type="math/tex"> (l+n,k+m) </script>-tensor <script type="math/tex"> S \otimes T </script>, whose <script type="math/tex"> (i_1, \ldots, i_{l+n}, j_1, \ldots, j_{k+m}) </script>-th component is the product of the relevant components of <script type="math/tex">S</script> and <script type="math/tex">T</script>:</li>
</ul>
<script type="math/tex; mode=display">% <![CDATA[
(S \otimes T)^{i_1, \ldots, i_l, i_{l+1}, \ldots, i_{l+n}}_ {j_1, \ldots, j_k, j_{k+1}, \ldots, j_{k+m} } =
S^{i_1, \ldots, i_l}_ {j_1, \ldots, j_k}
T^{i_{1}, \ldots, i_{n}}_ {j_{1}, \ldots, j_{m}}. %]]></script>
<p><a href="https://en.wikipedia.org/wiki/Tensor#Tensor_product">Source: Wikipedia</a></p>
<p>This means that if the arguments of <script type="math/tex"> S \otimes T </script> are</p>
<ul>
<li>the <script type="math/tex">l+n</script> linear maps <script type="math/tex">φ^{(p)} = \sum^{dim V}_{i=1} \varphi^{(p)}_i \epsilon^i</script> for <script type="math/tex">1 \le p \le l+n</script>, and</li>
<li>the <script type="math/tex">k+m</script> vectors <script type="math/tex"> \v_{(q)} = \sum^{dim V}_{j=1} v_{(q)}^j \e_j </script> for <script type="math/tex">1 \le q \le k+m</script>
</li>
</ul>
<p>(with some particular choice of basis vectors <script type="math/tex">\{\e_i\}_i</script> and basis covectors <script type="math/tex">\{\epsilon^i\}_i</script> ), then</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
(S\otimes T) &(φ^{(1)}, \ldots, φ^{(l+n)}, \v_{(1)}, \ldots, \v_{(k+m)}) = \\
&= S (φ^{(1)}, \ldots, φ^{(l)}, \v_{(1)}, \ldots, \v_{(k)})\,\cdot\,
T (φ^{(l+1)}, \ldots, φ^{(l+n)}, \v_{(k+1)}, \ldots, \v_{(k+m)})\\
&= \Bigg(
\sum_{i_1}^{\dim V} \cdots \sum_{i_l}^{\dim V}
\sum_{j_1}^{\dim V} \cdots \sum_{j_k}^{\dim V}
\varphi^{(1)}_{i_1} \ldots \varphi^{(l)}_{i_l}
v_{(1)}^{j_1} \ldots v_{(k)}^{j_k}
S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k}
\Bigg) \cdot \phantom.\\
&\phantom{=} \Bigg(
\sum_{i_{l+1}}^{\dim V} \cdots \sum_{i_{l+n}}^{\dim V}
\sum_{j_{k+1}}^{\dim V} \cdots \sum_{j_{k+m}}^{\dim V}
\varphi^{(l+1)}_{i_{l+1}} \ldots \varphi^{(l+n)}_{i_{l+n}}
v_{(k+1)}^{j_{k+1}} \ldots v_{(k+m)}^{j_{k+m}}
T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}}
\Bigg) \\
&= \sum_{i_1}^{\dim V} \cdots \sum_{i_{l+n}}^{\dim V}
\sum_{j_1}^{\dim V} \cdots \sum_{j_{k+m}}^{\dim V}
\varphi^{(1)}_{i_1} \ldots \varphi^{(l+n)}_{i_{l+n}}
v_{(1)}^{j_1} \ldots v_{(k+m)}^{j_{k+m}}
S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k}
T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}}.
\end{aligned} %]]></script>
<p>These <script type="math/tex"> (l+n+k+m) </script> summations are quite a mess, but the above derivation shows that the <a href="http://mathworld.wolfram.com/EinsteinSummation.html">Einstein summation convention</a> works for tensor products as well:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned}
(S\otimes T) &(φ^{(1)}, \ldots, φ^{(l+n)}, v_{(1)}, \ldots, v_{(k+m)}) =\\
&= S (φ^{(1)}, \ldots, φ^{(l)}, v_{(1)}, \ldots, v_{(k)})\,\cdot\,
T (φ^{(l+1)}, \ldots, φ^{(l+n)}, v_{(k+1)}, \ldots, v_{(k+m)})\\
&= \Big(
\varphi^{(1)}_{i_1} \ldots \varphi^{(l)}_{i_l}
v_{(1)}^{j_1} \ldots v_{(k)}^{j_k}
S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k}
\Big)
\Big(
\varphi^{(l+1)}_{i_{l+1}} \ldots \varphi^{(l+n)}_{i_{l+n}}
v_{(k+1)}^{j_{k+1}} \ldots v_{(k+m)}^{j_{k+m}}
T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}}
\Big) \\
&= \varphi^{(1)}_{i_1} \ldots \varphi^{(l+n)}_{i_{l+n}}
v_{(1)}^{j_1} \ldots v_{(k+m)}^{j_{k+m}}
S^{i_1, \ldots, i_l}_{j_1, \ldots, j_k}
T^{i_{l+1}, \ldots, i_{l+n}}_{j_{k+1}, \ldots, j_{k+n}}.
\end{aligned} %]]></script>
</div>
<p><strong>?.</strong> <em>You can always reconstruct a tensor from its components and the corresponding basis.</em> <a href="#" onclick="showById('answer7', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer7" style="display: none;">
<p><em>Answer:</em> true.</p>
<p>Clarification:</p>
<ul>
<li>If we know the basis vectors for the vector space and the dual vector space, then the components of the vector and covector arguments are uniquely determined, and we can apply the tensor to the arguments using the components of the tensor (or some relevant finite subset in case <script type="math/tex">V</script> is not finite dimensional).</li>
</ul>
</div>
<p><strong>?.</strong> <em>The number of indices of the tensor components depends on dimension.</em> <a href="#" onclick="showById('answer8', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer8" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>A tensor component usually has one index for each argument, e.g. for a <script type="math/tex">(2,1)</script>-tensor <script type="math/tex">T</script>, the components are <script type="math/tex">T^{i_1,i_2}_{j_1}</script>.</li>
<li>The <em>range</em> of these indices does depend on the dimension: each index ranges from <script type="math/tex">1</script> to <script type="math/tex">\dim V</script>. Therefore an <script type="math/tex"> (n,m) </script>-tensor <script type="math/tex">T</script> has <script type="math/tex"> (\dim V)^{n+m} </script> many components.</li>
</ul>
</div>
<p><strong>?.</strong> <em>The Einstein summation convention does not apply to tensor components.</em> <a href="#" onclick="showById('answer9', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer9" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification: see above.</p>
</div>
<p><strong>?.</strong> <em>A change of basis does not change the tensor components.</em> <a href="#" onclick="showById('answer10', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer10" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>the tensor components are defined with respect to a given basis.</li>
</ul>
</div>
<p>c) Given a basis for a <script type="math/tex">d</script>-dimensional vector space <script type="math/tex">V</script>, …</p>
<p><strong>?.</strong> …<em>one can find exactly <script type="math/tex">d^2</script>-different dual bases for the corresponding dual vector space <script type="math/tex"> V^* </script>.</em> <a href="#" onclick="showById('answer11', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer11" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>Given a basis of <script type="math/tex">V</script>, <script type="math/tex">E = \{\mathbf{e}_i\}_{i=1}^d \subset V</script>, there is a <em>unique</em> dual basis of <script type="math/tex">V^* </script>, namely <script type="math/tex">E^* = \{\epsilon_i\}_{i=1}^d</script>, where <script type="math/tex">\epsilon_i(\e_i) = 1</script> and <script type="math/tex">\epsilon_i(\e_j) = 0</script> for <script type="math/tex">i ≠ j</script>.</li>
</ul>
</div>
<p><strong>?.</strong> …<em>by removing one basis vector of the basis of <script type="math/tex">V</script>, a basis for a <script type="math/tex">(d - 1)</script>-dimensional vector space <script type="math/tex">V_1</script> is obtained.</em> <a href="#" onclick="showById('answer12', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer12" style="display: none;">
<p><em>Answer:</em> true.</p>
<p>Clarification:</p>
<ul>
<li>The resulting set of <script type="math/tex">(d-1)</script> vectors are still linearly independent, and their span is a <script type="math/tex">(d-1)</script>-dimensional subspace of <script type="math/tex">V</script>.</li>
</ul>
</div>
<p><strong>?.</strong> …<em>the continuity of a map <script type="math/tex">f : V → W</script> depends on the choice of basis for the vector space <script type="math/tex">W</script>.</em> <a href="#" onclick="showById('answer13', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer13" style="display: none;">
<p><em>Answer:</em> false.</p>
<p>Clarification:</p>
<ul>
<li>The continuity of a map is defined for <em>topological spaces</em>, not for vector spaces.</li>
<li>
<script type="math/tex">f</script> is continuous <em>iff</em> the preimage of every open set in <script type="math/tex">W</script> is open in <script type="math/tex">V</script>. Note that no term in this definition depends on the choice of basis for either <script type="math/tex">V</script> or <script type="math/tex">W</script>.</li>
<li>Assuming that <script type="math/tex">V</script> and <script type="math/tex">W</script> are real vector spaces, it is customary to equip them with the standard topology. A set <script type="math/tex">A</script> is open in <script type="math/tex">V</script>
<em>iff</em> either it is the union of open <script type="math/tex">ε</script>-balls, or of Cartesian products of open intervals. While these definitions assume a basis for <script type="math/tex">V</script>, they all result in the exact same topologies. (Meaning a set can be covered with open balls <em>iff</em> it can be covered with open cuboids <em>iff</em> it can be covered with open cubes – an interesting but easy-to-prove result.)</li>
<li>It’s easy to see that every <em>linear</em> map between real vector spaces (equipped with the standard topology) is continuous.</li>
</ul>
</div>
<p><strong>?.</strong> …<em>one can extract the components of the elements of the dual vector space <script type="math/tex">V^*</script>.</em> <a href="#" onclick="showById('answer14', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer14" style="display: none;">
<p><em>Answer:</em> true.</p>
<p>Clarification:</p>
<ul>
<li>a basis for <script type="math/tex">V</script> uniquely determines a dual basis for <script type="math/tex">V^* </script>, which uniquely determines the components of any covector.</li>
</ul>
</div>
<p><strong>?.</strong> …<em>each vector of <script type="math/tex">V</script> can be reconstructed from its components.</em> <a href="#" onclick="showById('answer15', this); return false;" class="show-answer">Show answer</a></p>
<div class="answer" id="answer15" style="display: none;">
<p><em>Answer:</em> true.</p>
<p>Clarification:</p>
<ul>
<li>Given the basis vectors <script type="math/tex">\mathbf{e}_i</script> and components <script type="math/tex">v^i</script> for <script type="math/tex">1 \leq i \leq d</script>, <script type="math/tex">\mathbf{v} = \sum_{i=1}^d v^i \mathbf{e}_i</script>.</li>
</ul>
</div>Probabilistically interesting planning problems2018-05-28T00:00:00+02:002018-05-28T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2018-05-28:/2018/05/28/probabilistically-interesting/<p>This post briefly describes the problem of probabilistic planning, and explains what makes a planning problem “probabilistically interesting”.</p><p>This post briefly describes the problem of <em>probabilistic planning</em>, and explains in informal terms what makes a planning problem <em>probabilistically interesting</em>, along with some examples.</p>
<h1>Primer on probabilistic planning</h1>
<p>In a nutshell, planning is about <em>finding a way to win</em>, and as such, the field of research on planners is vast. However, there is no single textbook definition of “planning”, so in this post I’ll try to be as general as possible. One description of a planning problem could be: given a description of an environment, find a sequence of actions that brings the environment from the initial state of the environment to a goal state. There are multiple ways to describe the environment: for example in formal logic with the <a href="https://en.wikipedia.org/wiki/Situation_calculus">situation calculus</a>, or more commonly as a <a href="https://en.wikipedia.org/wiki/Markov_decision_process">Markov decision process (<span class="caps">MDP</span>)</a>. In probabilistic planning problems, the functions describing the <span class="caps">MDP</span> are not necessarily deterministic: executing action <script type="math/tex">a</script> in state <script type="math/tex">s</script> will bring the environment to state <script type="math/tex">s'</script> with a probability of <script type="math/tex">T(s,a,s')</script>. In contrast with the <em>control problem</em> of reinforcement learning, where the goal is to find an optimal <em>policy</em> (i.e. a mapping from states to actions), in planning one is interested only in a partial policy that brings the agent closer to a goal state, or frequently only a single action that brings the agent closer to a goal state from the current state. An example planning problem is thus: “Siri, show me a way to the library.” Then Siri responds either with a plan that I can follow from the first step to the last (i.e. a route from start to finish), or only an action that I can take right now (“go forward 100 meters”).</p>
<p>Graphical representation of an example <span class="caps">MDP</span>:</p>
<p><img alt="Graphical representation of an example MDP" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/MDP-env.jpg"></p>
<p>An example policy for the same <span class="caps">MDP</span>:</p>
<p><img alt="An example policy for the same MDP" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/MDP-policy.jpg"></p>
<p>An example plan for the same <span class="caps">MDP</span>:</p>
<p><img alt="An example plan for the same MDP" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/MDP-plan.jpg"></p>
<p>The approach taken by a planner differs based on the discounting factor \<script type="math/tex"> \gamma \</script> and the distribution of rewards. In a <em>shortest path problem</em> the future rewards are discounted (\<script type="math/tex"> 0 < \gamma < 1 \</script>), and there might be a constant negative reward for every step taken. Together with a positive reward in goal states, an agent with the goal of maximizing return – i.e. the sum of discounted expected future rewards – has incentives to minimize the length of the path to the goal. However, if there is no discounting (\<script type="math/tex">\gamma = 1 \</script>) and there’s a positive reward only in the goal states, it is sufficient for the agent to find <em>any</em> way to the goal. (Some call these <em>goal-based problems</em> <a href="#Yoon2008-probabilistic-planning">(Yoon, Fern, Givan, <span class="amp">&</span> Kambhampati, 2008)</a>.) In the next section we’ll see that not all plans are created equal, so even in the non-discounted case we want one that ends up in a goal state with the highest probability.</p>
<p>In an <em>offline</em> approach to deterministic planning problems, a planner is given an environment, initial state and goal state, and it needs to return a sequence of actions that brings the environment to the goal state. However, this offline approach does not work for probabilistic problems, where the outcome of an action is not always in our control. Hence a probabilistic planner is usually executed <em>online</em>: it makes an observation (e.g. the current state of the environment, in the fully observable case), does some magic, and outputs a single action that brings the agent closer to a goal state. Nature brings the agent to a new state, not necessarily the one you desired, and these steps are repeated, until you run out of time or end up at a goal.</p>
<p>Since the fourth <a href="http://icaps-conference.org/index.php/Main/Competitions">International Planning Competition</a> in 2004 hosted by the <span class="caps">ICAPS</span> (International Conference on Automated Planning and Scheduling), this event featured a probabilistic track. The winner of <span class="caps">IPPC</span> 2004 was <span class="caps">FF</span>-Replan, a planner that simplifies the probabilistic planning problem into a deterministic one by not considering the multiple potential effects of an action <a href="#Yoon2007-FF-replan">(Yoon, Fern, <span class="amp">&</span> Givan, 2007)</a> – hence the title of the paper, “<span class="caps">FF</span>-Replan: A Baseline for Probabilistic Planning.”</p>
<h1>Probabilistically interesting planning problems</h1>
<p>Iain Little and Sylvie Thiébaux analyzed the common characteristics of planning problems that can and cannot be optimally solved by a planner like <span class="caps">FF</span>-Replan <a href="#Little2007-probabilistic-planning">(Little <span class="amp">&</span> Thiébaux, 2007)</a>. They gave necessary and sufficient conditions for a probabilistic planning problem to be <em>probabilistically interesting</em>: on a problem fulfilling these conditions, a planner that determinizes the problem will lose crucial information, and will do worse than a probabilistic planner. In this section I’ll summarize these conditions using natural language, slightly diverging from the vocabulary of the paper. For formal definitions and more examples, see the <a href="http://users.cecs.anu.edu.au/~iain/icaps07.pdf">original paper</a>; it is an interesting read.</p>
<p><em>Criterion 1:</em> there are multiple paths from the start to the goal. If there is only a single path, then any planner that finds <em>a</em> path will do equally good, as this will be the only one.</p>
<p>Counterexample:</p>
<p><img alt="Graphical description of an MDP with a single goal trajectory" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/counter-1.png"></p>
<p><em>Criterion 2:</em> where the above two paths diverge, there is a choice about which way to go, i.e. a state \<script type="math/tex">s_{crossroads}\</script> from which action \<script type="math/tex">a_1\</script> leads to one road with a different probability than action \<script type="math/tex">a_2\</script> does. (Yes, this is a sufficient condition for the first criterion.) If it’s only luck that separates the two paths, then the agent doesn’t have much of a choice to do better.</p>
<p>Counterexample:</p>
<p><img alt="MDP with skill doesn't help" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/counter-2.png"></p>
<p><em>Criterion 3:</em> there must be a non-trivially avoidable dead end in the environment. A <em>dead end</em> is an absorbing state that is not a goal state, i.e. a state from which there is no path to any goal state. For a dead end to be <em>avoidable</em>, there must be a state \<script type="math/tex">s_{crossroads}\</script> with at least two possible actions \<script type="math/tex">a_{deadly}\</script> and \<script type="math/tex">a_{winning}\</script>, such that executing \<script type="math/tex">a_{deadly}\</script> brings the agent to the dead end with a higher probability than executing \<script type="math/tex">a_{winning}\</script>. A dead end is <em>non-trivially avoidable</em> if \<script type="math/tex">s_{crossroads}\</script> is on a path from the initial state to a goal state, and there is a non-zero chance of reaching a goal state after executing either \<script type="math/tex">a_{winning}\</script> or \<script type="math/tex">a_{deadly}\</script>.</p>
<p>Counterexample: the probabilistic version of Blocksworld, where the worst case scenario is that a block is dropped accidentally, does not contain dead ends; the environment is irreducible. (This was an actual problem of <span class="caps">IPPC</span> 2004.)</p>
<p><img alt="Probabilistic Blocks world" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/blocksworld.png"></p>
<p>Counterexample: all dead ends are unavoidable.</p>
<p><img alt="MDP with no avoidable dead end" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/counter-3b.png"></p>
<p>Counterexample: all dead ends are trivially avoidable.</p>
<p><img alt="MDP with only trivially avoidable dead ends" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/counter-3c.png"></p>
<h1>A simple yet “interesting” planning problem</h1>
<p>A very simple problem that is probabilistically interesting is what the authors call <code>climber</code>, described by the following story:</p>
<blockquote>
<p>You are stuck on a roof because the ladder you climbed up on fell down. There are plenty of people around; if you call out for help someone will certainly lift the ladder up again. Or you can try to climb down without it. You aren’t a very good climber though, so there is a 40% chance that you will fall and break your neck if you do it alone. What do you do?</p>
</blockquote>
<p>Graphical representation of the <code>climber</code> problem:
<img alt="Graphical representation of the climber problem" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/climber-orig.jpg"></p>
<p>Despite the simplicity of this problem, most methods to turn it into a deterministic problem fail. Little and Thiébaux described 3 ways to determinize a problem, and they called a resulting deteministic problem a “compilation”.</p>
<p>The <em><span class="caps">REPLAN1</span></em> approach simply drops all but the most likely outcome of every action, and finds the shortest goal trajectory. (This was the approach used by <span class="caps">FF</span>-Replan.) Compilation of the climber problem according to <span class="caps">REPLAN1</span>:</p>
<p><img alt="Compilation of the climber problem according to REPLAN1" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/climber-det1.jpg"></p>
<p><em><span class="caps">REPLAN2</span>(shortest)</em> turns every possible probabilistic outcome of an action into the outcome of a deterministic action, each with a cost of 1. Optimizing for smallest cost thus finds the <em>shortest</em> goal trajectory, but this might not be the one with the highest success probability. Compilation of the climber problem according to <span class="caps">REPLAN2</span>(shortest):</p>
<p><img alt="Compilation of the climber problem according to REPLAN2(shortest)" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/climber-det2.jpg"></p>
<p><em><span class="caps">REPLAN2</span>(most-likely)</em> also turns every outcome into a separate deterministic action, but the new action costs are the negative log probability of the relevant outcome. This is the only compilation of the problem that finds the optimal path for <code>climber</code>, but for many other problems even this one will be suboptimal. The resulting compilation is as follows:</p>
<p><img alt="Compilation of the climber problem according to REPLAN2(most-likely)" src="https://www.treszkai.com/2018/05/28/probabilistically-interesting/climber-det3.jpg"></p>
<h1>Summary</h1>
<p>Finding the optimal goal trajectory in a probabilistic planning problem is computationally expensive, so most planners use some heuristics. One way to plan in a stochastic environment is to change the probabilistic planning problem into a deterministic shortest path problem and replan after (almost) every step, which is computationally efficient, but in many cases suboptimal. This article outlined the attributes of probabilistically interesting problems, where the deterministic replanning approach often fails. As such, recent probabilistic planners use more complicated methods (or often a portfolio of probabilistic planners), but replanners remain a good baseline to compare against.</p>
<h1>References</h1>
<ol class="bibliography"><li><span id="Little2007-probabilistic-planning">Little, I., <span class="amp">&</span> Thiébaux, S. (2007). Probabilistic planning vs. replanning. <i>Workshop, <span class="caps">ICAPS</span> 2007</i>. Retrieved from http://users.cecs.anu.edu.au/ iain/icaps07.pdf</span></li>
<li><span id="Yoon2007-FF-replan">Yoon, S. W., Fern, A., <span class="amp">&</span> Givan, R. (2007). <span class="caps">FF</span>-Replan: A Baseline for Probabilistic Planning. In <span class="caps">M. S.</span> Boddy, M. Fox, <span class="amp">&</span> S. Thiébaux (Eds.), <i>Proceedings of the Seventeenth International Conference on Automated
Planning and Scheduling, <span class="caps">ICAPS</span> 2007, Providence, Rhode Island, <span class="caps">USA</span>,
September 22-26, 2007</i> (p. 352). <span class="caps">AAAI</span>. Retrieved from http://www.aaai.org/Library/<span class="caps">ICAPS</span>/2007/icaps07-045.php</span></li>
<li><span id="Yoon2008-probabilistic-planning">Yoon, S. W., Fern, A., Givan, R., <span class="amp">&</span> Kambhampati, S. (2008). Probabilistic Planning via Determinization in Hindsight. In D. Fox <span class="amp">&</span> <span class="caps">C. P.</span> Gomes (Eds.), <i>Proceedings of the Twenty-Third <span class="caps">AAAI</span> Conference on Artificial Intelligence,
<span class="caps">AAAI</span> 2008, Chicago, Illinois, <span class="caps">USA</span>, July 13-17, 2008</i> (pp. 1010–1016). <span class="caps">AAAI</span> Press. Retrieved from http://www.aaai.org/Library/<span class="caps">AAAI</span>/2008/aaai08-160.php</span></li></ol>Change YouTube speed from your favorites bar2018-05-23T00:00:00+02:002018-05-23T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2018-05-23:/2018/05/23/youtube-speed/<p>While the <span class="caps">UI</span> of YouTube only shows only limited set of high-speed options, it is possible to set the speed to any floating point value. Even better, one can do so from their favorites bar with bookmarklets.</p><p><img alt="I feel the need... the need for speed!" src="https://www.treszkai.com/2018/05/23/youtube-speed/need-for-speed.gif"></p>
<h1>Premise</h1>
<p>While the <span class="caps">UI</span> of YouTube only shows only limited set of high-speed options, it is possible to set the speed to any floating point value. Even better, one can do so from their favorites bar with bookmarklets.</p>
<p><img alt="Youtube dialog to set speed" src="https://www.treszkai.com/2018/05/23/youtube-speed/youtube.png"> ⇒ <img alt="Bookmarks to set youtube speed" src="https://www.treszkai.com/2018/05/23/youtube-speed/bookmarklets.png"></p>
<h1>Method</h1>
<p>Simply add any of the following code snippets as bookmarks.</p>
<p>If you have a fixed speed in mind, e.g. 2.5:</p>
<p><code>javascript:document.getElementsByTagName("video")[0].playbackRate=2.5;</code></p>
<p>Or save this line to show a prompt that asks for a floating-point input:</p>
<p><code>javascript:var%20speed=prompt("Speed:","1.");document.getElementsByTagName("video")[0].playbackRate=parseFloat(speed);</code></p>
<p>Which results in the following prompt:</p>
<p><img alt="A prompt that asks for speed" src="https://www.treszkai.com/2018/05/23/youtube-speed/custom.png"></p>
<h1>Caveats</h1>
<p>Works with YouTube and Vimeo.</p>
<p>The speed display in the video player will remain to show the last setting.</p>
<h1>References</h1>
<ul>
<li><span class="caps">GIF</span>: <a href="https://www.youtube.com/watch?v=fR2hajcuFEM">Top Gun</a></li>
<li>Script: <a href="https://www.quora.com/Is-there-a-way-of-watching-YouTube-videos-at-higher-than-2x-speed/answer/John-Vuong-12">Quora answer of John Vuong</a></li>
</ul>Some versatile tools for bash2018-05-16T00:00:00+02:002018-05-16T00:00:00+02:00Laszlo Treszkaitag:www.treszkai.com,2018-05-16:/2018/05/16/bash-versatile/<p>A 7-line bash script that includes 90% of what an average user needs.</p><p>I rarely use bash besides the basics: I could use a <code>for</code> loop even if woken up at night, but my knowledge of the language doesn’t go much further. Hence instead of trying to memorize all the <code>{}%$</code> magic, having a few versatile commands in my toolbox comes handy.</p>
<p>Recently I faced the task of renaming a set of files {<code>foo 02.jpg</code>, …, <code>foo 74.jpg</code>} to {<code>foo 06.jpg</code>, …, <code>foo 78.jpg</code>}, while keeping the order. My approach contained nothing extraordinary:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
<span class="k">for</span> i in <span class="sb">`</span>seq <span class="m">74</span> <span class="m">2</span><span class="sb">`</span>
<span class="k">do</span>
<span class="nb">printf</span> -v oldname <span class="s2">"foo %02d.jpg"</span> <span class="nv">$i</span>
<span class="nb">printf</span> -v newname <span class="s2">"foo %02d.jpg"</span> <span class="k">$(</span><span class="nb">echo</span> <span class="s2">"</span><span class="nv">$i</span><span class="s2">+4"</span> <span class="p">|</span> bc<span class="k">)</span>
mv <span class="s2">"</span><span class="nv">$oldname</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$newname</span><span class="s2">"</span>
<span class="k">done</span>
</code></pre></div>
<p>Yet there were some educational points in it:</p>
<ul>
<li>One loops a variable <code>x</code> over the lines of a string <code>values</code> by <code>for x in values; do something; something_else; done</code>.</li>
<li><code>seq a b</code> simply prints out the integers from <code>a</code> to <code>b</code>, inclusive, regardless of which is larger.</li>
<li>Variable <code>x</code> is assigned a value by <code>x=foobar</code>, where there must be <em>no spaces around the equation sign</em>.The value of <code>x</code> can then be referred to by <code>$x</code>.</li>
<li>Renaming a set of files to a similar name but later in the alphabet must be done in reverse order.</li>
<li>Bash has a built-in <code>printf</code> that seems to work as in C: first the string to be printed with format specifiers like <code>%02d</code>, followed by the arguments whose values are used according to the format specifiers.</li>
<li>With the <code>-v</code> option of <code>printf</code>, you can save the output into a variable.</li>
<li>One can use <code>$( )</code> for executing a command and having bash treat the output as the source code. (It’s the same as using backticks, as around <code>seq 74 2</code>, but allows nesting and is clearer. Kinda like <code>eval</code> in other languages, like JavaScript.) Not shown here, but it even works in quotation marks, e.g. <code>"$(echo hey yo)"</code> is like writing <code>"hey yo"</code>. Note that the trailing newline is deleted.</li>
<li><code>bc</code> is a calculator that reads from the input and outputs nothing but the result on a single line.</li>
<li>Don’t forget the quotes around arguments with spaces, like with <code>mv</code> above.</li>
</ul>
<p>One minute of further bash tips are provided by Julia Evans <a href="https://drawings.jvns.ca/bashtips/">[here]</a>.</p>Some proofs in first-order logic2018-02-27T00:00:00+01:002018-02-27T00:00:00+01:00Laszlo Treszkaitag:www.treszkai.com,2018-02-27:/2018/02/27/logic-courseworks/<p>This page lists some interesting problems in mathematical logic that I solved during my studies</p><p>I had the fortune to study classical logic from <a href="http://www.renyi.hu/~csirmaz/">László Csirmaz</a> at the Eötvös Loránd University, Budapest. Although I was not officially enrolled in the course, he was kind enough to mark my weekly homework regardless of my lack of student status. These were originally written in Hungarian, and I translated a few of them into English.</p>
<h1>A non-standard model of Robinson arithmetics</h1>
<p><em>Give a model that fulfills every axiom of the Robinson arithmetics, and which contains contains two elements that are neither greater than or equal to, nor smaller than or equal to one another; or prove that such a model doesn’t exist.</em></p>
<p><a href="https://www.treszkai.com/2018/02/27/logic-courseworks/2017-03-logic-cw4ex4.pdf">Solution (<span class="caps">PDF</span>)</a>.</p>
<h1>A two-formula version of the diagonal lemma</h1>
<p><em>Let <script type="math/tex">\Gamma</script> be a theorem which can represent every recursive function. Prove that for every pair of formulae <script type="math/tex">\Phi(x)</script> and <script type="math/tex">\Psi(x)</script> with one free variable, there exist closed formulae <script type="math/tex">\eta</script> and <script type="math/tex">\theta</script> such that <script type="math/tex">\Gamma \proves \eta \,\leftrightarrow\, \Phi(\Godel{\theta})</script> and <script type="math/tex">\Gamma \proves \theta \,\leftrightarrow\, \Psi(\Godel{\eta})</script>.</em></p>
<p><a href="https://www.treszkai.com/2018/02/27/logic-courseworks/2017-05-logic-cw9ex1.pdf">Solution (<span class="caps">PDF</span>)</a>.</p>
<h1>Final steps of the proof of Gödel’s completeness theorem</h1>
<p>When proving Gödel’s completeness theorem during the lectures, I was missing a crucial step from the proof, so I <a href="https://www.treszkai.com/2018/02/27/logic-courseworks/2017-07-logic-henkin.pdf">proved it myself</a>.</p>