Forecasting Newsletter: June 2020.

Highlights

Facebook launches Forecast, a community for crowdsourced predictions.
Foretell, a forecasting tournament by the Center for Security and Emerging Technology, is now open.
A Preliminary Look at Metaculus and Expert Forecasts: Metaculus forecasters do better.

Index

Highlights.
In the News.
Prediction Markets & Forecasting Platforms.
Negative Examples.
Hard to Categorize.
Long Content.

In the News.

Facebook releases a forecasting app (link to the app, press release, TechCrunch take, hot-takes). The release comes before Augur v2 launches, and it is easy to speculate that it might end up being combined with Facebook’s stablecoin, Libra.
The Economist has a new electoral model out (article, model) which gives Trump an 11% chance of winning reelection. Given that Andrew Gelman was involved, I’m hesitant to criticize it, but it seems a tad overconfident. See here for Gelman addressing objections similar to my own.
COVID-19 vaccine before US election. Analysts see White House pushing through vaccine approval to bolster Trump’s chances of reelection before voters head to polls. “All the datapoints we’ve collected make me think we’re going to get a vaccine prior to the election,” Jared Holz, a health-care strategist with Jefferies, said in a phone interview. The current administration is “incredibly incentivized to approve at least one of these vaccines before Nov. 3.”
“Israeli Central Bank Forecasting Gets Real During Pandemic”. Israeli Central Bank is using data to which it has real-time access, like credit-card spending, instead of lagging indicators.
Google produces wind schedules for wind farms. “The result has been a 20 percent increase in revenue for wind farms”. See here for essentially the same thing on solar forecasting.
Survey of macroeconomic researchers predicts economic recovery will take years, reports 538.

Prediction Markets & Forecasting platforms.

Ordered in subjective order of importance:

Foretell, a forecasting tournament by the Center for Security and Emerging Technology, is now open. I find the thought heartening that this might end up influencing bona-fide politicians.
Metaculus
- posted A Preliminary Look at Metaculus and Expert Forecasts: Metaculus forecasters do better, and the piece is a nice reference point.
- was featured in Forbes.
- announced their Metaculus Summer Academy: “an introduction to forecasting for those who are relatively new to the activity and are looking for a fresh intellectual pursuit this summer”
Replication Markets might add a new round with social and behavioral science claims related to COVID-19, and a preprint market, which would ask participants to forecast items like publication or citation. Replication Markets is also asking for more participants, with the catchline “If they are knowledgeable and opinionated, Replication Markets is the place to be to make your opinions really count.”
Good Judgement family
- Good Judgement Open: Superforecasters were able to detect that Russia and the USA would in fact undertake some (albeit limited) form of negotiation, and do so much earlier than the general public, even while posting their reasons in full view.
- Good Judgement Analytics continues to provide its COVID-19 dashboard.
PredictIt & Election Betting Odds. I stumbled upon an old 538 piece on fake polls: Fake Polls are a Real Problem. Some polls may have been conducted by PredictIt traders in order to mislead or troll other PredictIt traders; all in all, an amusing example of how prediction markets could encourage worse information.
An online prediction market with reputation points, implementing an idea by Paul Christiano. As of yet slow to load.
Augur:
- An overview of the platform and of v2 modifications.
- Augur also happens to have a blog with some interesting tidbits, such as the extremely clickbaity How One Trader Turned $400 into $400k with Political Futures (“I find high volume markets…like the Democratic Nominee market or the 2020 Presidential Winner market… and what I’m doing is I’m just getting in line at the ‘buy’ price and waiting my turn until my orders get filled. Then when those orders get filled I just sell them for 1c more.”)
Coronavirus Information Markets is down to ca. $12000 in trading volume; it seems like they didn’t take off.

Negative examples.

World powers to converge on strategies for presenting COVID-19 information to make forecasters' jobs more interesting:
- Brazil stops releasing COVID-19 death toll and wipes data from official site.
- Meanwhile, in Russia, St Petersburg issues 1,552 more death certificates in May than last year, but Covid-19 toll was 171.
- In the US, CDC wants states to count ‘probable’ coronavirus cases and deaths, but most aren’t doing it
- India has the fourth-highest number of COVID-19 cases, but the Government denies community transmission
  - One suspects that this denial is political, because India is otherwise being extremely competent in weather forecasting.
Youyang Gu’s model, widely acclaimed as one of the best coronavirus models for the US, produces 95% confidence intervals which seem too narrow when extended to Pakistan.
Some discussion on twitter: “Only a fool would put a probability on whether the EU and the UK will agree a trade deal”, says Financial Times correspondent, and other examples.

Hard to categorize.

A Personal COVID-19 Postmortem, by FHI researcher David Manheim.

I think it’s important to clearly and publicly admit when we were wrong. It’s even better to diagnose why, and take steps to prevent doing so again. COVID-19 is far from over, but given my early stance on a number of questions regarding COVID-19, this is my attempt at a public personal review to see where I was wrong.
FantasyScotus beat GoodJudgementOpen on legal decisions. I’m still waiting to see whether Hollywood Stock Exchange will also beat GJOpen on film predictions.
How does pandemic forecasting resemble the early days of weather forecasting; what lessons can the USA learn from the later about the former? An example would be to create an organization akin to the National Weather Center, but for forecasting.
Linch Zhang, a COVID-19 forecaster with an excellent track-record, is doing an Ask Me Anything, starting on Sunday the 7th; questions are welcome!
The Rules To Being A Sellside Economist. A fun read.
1. How to get attention: If you want to get famous for making big non-consensus calls, without the danger of looking like a muppet, you should adopt ‘the 40% rule’. Basically you can forecast whatever you want with a probability of 40%. Greece to quit the euro? Maybe! Trump to fire Powell and hire his daughter as the new Fed chair? Never say never! 40% means the odds will be greater than anyone else is saying, which is why your clients need to listen to your warning, but also that they shouldn’t be too surprised if, you know, the extreme event doesn’t actually happen.
How to improve space weather forecasting (see here for the original paper):

For instance, the National Oceanic and Atmospheric Administration’s Deep Space Climate Observatory (DSCOVR) satellite sits at the location in space called L1, where the gravitational pulls of Earth and the Sun cancel out. At this point, which is roughly 1.5 million kilometers from Earth, or barely 1% of the way to the Sun, detectors can provide warnings with only short lead times: about 30 minutes before a storm hits Earth in most cases or as little as 17 minutes in advance of extremely fast solar storms.
Coup cast: A site that estimates the yearly probability of a coup. The color coding is misleading; click on the countries instead.
Prediction = Compression. “Whenever you have a prediction algorithm, you can also get a correspondingly good compression algorithm for data you already have, and vice versa.”
- Other LessWrong posts which caught my attention were Betting with Mandatory Post-Mortem and Radical Probabilism
Box Office Pro looks at some factors around box-office forecasting.

Long Content.

When the crowds aren’t wise; a sober overview, with judicious use of Cordocet’s jury theorem

Suppose that each individual in a group is more likely to be wrong than right because relatively few people in the group have access to accurate information. In that case, the likelihood that the group’s majority will decide correctly falls toward zero as the size of the group increases.

Some prediction markets fail for just this reason. They have done really badly in predicting President Bush’s appointments to the Supreme Court, for example. Until roughly two hours before the official announcement, the markets were essentially ignorant of the existence of John Roberts, now the chief justice of the United States. At the close of a prominent market just one day before his nomination, “shares” in Judge Roberts were trading at $0.19—representing an estimate that Roberts had a 1.9% chance of being nominated.

Why was the crowd so unwise? Because it had little accurate information to go on; these investors, even en masse, knew almost nothing about the internal deliberations in the Bush administration. For similar reasons, prediction markets were quite wrong in forecasting that weapons of mass destruction would be found in Iraq and that special prosecutor Patrick Fitzgerald would indict Deputy Chief of Staff Karl Rove in late 2005.
A review of Tetlock’s ‘Superforecasting’ (2015), by Dominic Cummings. Cummings then went on to hire one such superforecaster, which then resigned over a culture war scandal, characterized by adversarial selection of quotes which indeed are outside the British Overton Window. Notably, Dominic Cummings then told reporters to “Read Philip Tetlock’s Superforecasters, instead of political pundits who don’t know what they’re talking about.”
Assessing the Performance of Real-Time Epidemic Forecasts: A Case Study of Ebola in the Western Area Region of Sierra Leone, 2014-15. The one caveat is that their data is much better than coronavirus data, because Ebola symptoms are more evident; otherwise, pretty interesting:

Real-time forecasts based on mathematical models can inform critical decision-making during infectious disease outbreaks. Yet, epidemic forecasts are rarely evaluated during or after the event, and there is little guidance on the best metrics for assessment.

…good probabilistic calibration was achievable at short time horizons of one or two weeks ahead but model predictions were increasingly unreliable at longer forecasting horizons.

This suggests that forecasts may have been of good enough quality to inform decision making based on predictions a few weeks ahead of time but not longer, reflecting the high level of uncertainty in the processes driving the trajectory of the epidemic.

Comparing different versions of our model to simpler models, we further found that it would have been possible to determine the model that was most reliable at making forecasts from early on in the epidemic. This suggests that there is value in assessing forecasts, and that it should be possible to improve forecasts by checking how good they are during an ongoing epidemic.

One forecast that gained particular attention during the epidemic was published in the summer of 2014, projecting that by early 2015 there might be 1.4 million cases. This number was based on unmitigated growth in the absence of further intervention and proved a gross overestimate, yet it was later highlighted as a “call to arms” that served to trigger the international response that helped avoid the worst-case scenario.

Methods to assess probabilistic forecasts are now being used in other fields, but are not commonly applied in infectious disease epidemiology

The deterministic SEIR model we used as a null model performed poorly on all forecasting scores, and failed to capture the downturn of the epidemic in Western Area.

On the other hand, a well-calibrated mechanistic model that accounts for all relevant dynamic factors and external influences could, in principle, have been used to predict the behaviour of the epidemic reliably and precisely. Yet, lack of detailed data on transmission routes and risk factors precluded the parameterisation of such a model and are likely to do so again in future epidemics in resource-poor settings.
In the selection of quotes above, we gave an example of a forecast which ended up overestimating the incidence, yet might have “served as a call to arms”. It’s maybe a real-life example of a forecast changing the true result, leading to a fixed point problem, like the ones hypothesized in the parable of the Predict-O-Matic.
- It would be a fixed point problem if \[forecast above the alarm threshold\] → epidemic being contained, but \[forecast below the alarm thresold\] → epidemic not being contained.
- Maybe the fix-point solution, i.e., the most self-fulfilling (and thus, accurate) forecast, would have been a forecast on the edge of the alarm threshold, which would have ended up leading to mediocre containment.
- The troll polls created by PredictIt traders are perhaps a more clear cut example of Predict-O-Matic problems.
Calibration Scoring Rules for Practical Prediction Training. I found it most interesting when considering how Brier and log rules didn’t have all the pedagogic desiderata.
- I also found the following derivation of the logarithmic scoring rule interesting. Consider: If you assign a probability to n events, then the combined probability of these events is p1 x p2 x p3 x … pn. Taking logarithms, this is log(p1 x p2 x p3 x … x pn) = Σ log(pn), i.e., the logarithmic scoring rule.
Binary Scoring Rules that Incentivize Precision. The results (the closed-form of scoring rules which minimize a given forecasting error) are interesting, but the journey to get there is kind of a drag, and ultimately the logarithmic scoring rule ends up being pretty decent according to their measure of error.
- Opinion: I’m not sure whether their results are going to be useful for things I’m interested in (like human forecasting tournaments, rather than Kaggle data analysis competitions). In practice, what I might do if I wanted to incentivize precision is to ask myself if this is a question where the answer is going to be closer to 50%, or closer to either of 0% or 100%, and then use either the Brier or the logarithmic scoring rules. That is, I don’t want to minimize an l-norm of the error over \[0,1\], I want to minimize an l-norm over the region I think the answer is going to be in, and the paper falls short of addressing that.
How Innovation Works—A Review. The following quote stood out for me:

Ridley points out that there have always been opponents of innovation. Such people often have an interest in maintaining the status quo but justify their objections with reference to the precautionary principle.
A list of prediction markets, and their fates, maintained by Jacob Laguerros. Like most startups, most prediction markets fail.

Note to the future: All links are added automatically to the Internet Archive. In case of link rot, go here

“I beseech you, in the bowels of Christ, think it possible that you may be mistaken.” Oliver Cromwell