Measure is unceasing

RSS Feed, all content

Auftragstaktik (2023/11/30)

Auftragstaktik is a method of command and delegation where the commander gives subordinates a clearly-defined objective, high-level details, and the tools needed to accomplish their objective. The subordinates have clear operational freedom, which leaves command to focus on architecting strategic decisions.

It has an interesting historical and semi-mythical background. After Napoleon outclassed the rest of Europe, the Prussians realized that they needed to up their game, and developed this methodology. With it, Germany became a military superpower. These days, though, armies have given up on having independent and semi-insubordinate general troops, and instead the Auftragstaktik stance seems to be reserved for special forces, like e.g., Navy Seals.

But beyond the semi-historical overview from the last paragraph, the idea of Auftragstaktik is useful to me as an ideal to aspire to implement. It is my preferred method of command, and my preferred method of being commanded. It stands in contrast to micromanaging. It avoids alienation as characterized by Marx, where the worker doesn’t have control of their own actions, which kills the soul. Corny as it sounds, if you have competent subordinates, why not give them wide berth to act as special forces rather than as corporate drones?

A while ago, I cleaned up a bit the Wikipedia page on this concept, and now I am writting this post so that it becomes more widely known across my circles. We need more people to carry A Message to Garcia, but independence benefits from a system of incentivization and control that enables it.

Ꙭ ...

Hurdles of using forecasting as a tool for making sense of AI progress (2023/11/07)

Introduction

In recent years there have been various attempts at using forecasting to discern the shape of the future development of artificial intelligence, like the AI progress Metaculus tournament, the Forecasting Research Institute’s existential risk forecasting tournament/experiment, Samotsvety forecasts on the topic of AI progress and dangers, or various questions osn INFER on short-term technological progress.

Here is a list of reasons, written with early input from Misha Yagudin, on why using forecasting to make sense of AI developments can be tricky, as well some casual suggestions of ways forward.

Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions

Ꙭ ...

Brief thoughts on CEA’s stewardship of the EA Forum (2023/10/15)

Epistemic status: This post is blunt. Please see the extended disclaimer about negative feedback here. Consider not reading it if you work on the EA forum and don’t have thick skin.

tl;dr: Once, the EA forum was a lean, mean machine. But it has become more bloated over time, and I don’t like it. Separately, I don’t think it’s worth the roughly $2M/year1 it costs, although I haven’t modelled this in depth.

The EA forum frontpage through time.

In 2018-2019, the EA forum was a lean and mean machine:

Ꙭ ...

Count words in <50 lines of C (2023/09/15)

The Unix utility wc counts words. You can make simple, non-POSIX compatible version of it that solely counts words in 159 words and 42 lines of C. Or you can be like GNU and take 3615 words and 1034 lines to do something more complex.

Desiderata

Ꙭ ...

Quick thoughts on Manifund’s application to Open Philanthropy (2023/09/05)

Manifund is a new effort to improve, speed up and decentralize funding mechanisms in the broader Effective Altruism community, by some of the same people previously responsible for Manifold. Due to Manifold’s policy of making a bunch of their internal documents public, you can see their application to Open Philanthropy here (also a markdown backup here).

Here is my perspective on this:

Ꙭ ...

Incorporate keeping track of accuracy into X (previously Twitter) (2023/08/19)

tl;dr: Incorporate keeping track of accuracy into X1. This contributes to the goal of making X the chief source of information, and strengthens humanity by providing better epistemic incentives and better mechanisms to separate the wheat from the chaff in terms of getting at the truth together.

Why do this?

St Michael Killing the Dragon - public domain, via Wikimedia commons

Ꙭ ...

Webpages I am making available to my corner of the internet (2023/08/14)

Here is a list of internet services that I make freely available to friends and allies, broadly defined—if you are reading this, you qualify. These are ordered roughly in order of usefulness.

search.nunosempere.com

search.nunosempere.com is an instance of Whoogle. It presents Google results as they were and as they should have been: without clutter and without advertisements.

Readers are welcome to make this their default search engine. The process to do this is a bit involved and depends on the browser, but can be found with a Whoogle search. In past years, I’ve had technical difficulties around once every six months, but tend to fix them quickly.

Ꙭ ...

squiggle.c (2023/08/01)

squiggle.c is a self-contained C99 library that provides functions for simple Monte Carlo estimation, based on Squiggle. Below is a copy of the project’s README, the original, always-up-to-date version of which can be found here

Why C?

Ꙭ ...

Why are we not harder, better, faster, stronger? (2023/07/19)

Cover for Harder, Better, Faster, Stronger, by Daft Punk

In The American Empire has Alzheimer’s, we saw how the US had repeatedly been rebuffing forecasting-style feedback loops that could have prevented their military and policy failures. In A Critical Review of Open Philanthropy’s Bet On Criminal Justice Reform, we saw how Open Philanthropy, a large foundation, spent and additional $100M in a cause they no longer thought was optimal. In A Modest Proposal For Animal Charity Evaluators (ACE) (unpublished), we saw how ACE had moved away from quantitative evaluations, reducing their ability to find out which animal charities were best. In External Evaluation of the Effective Altruism Wiki, we saw someone spending his time less than maximally ambitiously. In My experience with a Potemkin Effective Altruism group (unpublished), we saw how an otherwise well-intentioned group of decent people mostly just kept chugging along producing a negligible impact on the world. As for my own personal failures, I just come out of having spent the last couple of years making a bet on ambitious value estimation that flopped in comparison to what it could have been. I could go on.

Those and all other failures could have been avoided if only those involved had just been harder, better, faster, stronger. I like the word “formidable” as a shorthand here.

Ꙭ ...

Some melancholy about the value of my work depending on decisions by others beyond my control (2023/07/13)

For the last few years, while I was employed at the Quantified Uncertainty Research Institute, a focus of my work has been on estimating impact, and on doing so in a more hardcore way, and for more speculative domains, than the Effective Altruism community was previously doing. Alas, the FTX Future Fund, which was using some of our tools, no longer exists. Open Philanthropy was another foundation which might have found value in our work, but they don’t seem to have much excitement and apetite for the “estimate everything” line of work that I was doing. So in plain words, my work seems much less valuable than it could have been [1].

Melancholy, a painting by Munch

Part of my mistake [2] here was to do work whose value depended on decisions by others beyond my control. And then given that I was doing that, not making sure those decisions came back positive.

I have made this mistake before, which is why it stands out to me. When I dropped out of university, it was to design a randomized controlled trial for ESPR, a rationality camp which I hoped was doing some good, but where having some measure of how much could be good to decide whether to greatly scale it. I designed the randomized trial, but it wasn’t my call to decide whether to implement it, and it wasn’t. Pathetically, some students were indeed randomized, but without gathering any pre-post data. Interesting, ESPR and similar programs, like ATLAS, did scale up, so having tracked some data could have been decision relevant.

Ꙭ ...

Betting and consent (2023/06/26)

There is an interesting thing around consent and betting:

Ꙭ ...

People’s choices determine a partial ordering over people’s desirability (2023/06/17)

Consider the following relationship:

Ꙭ ...

Relative values for animal suffering and ACE Top Charities (2023/05/29)

tl;dr: I present relative estimates for animal suffering and 2022 top Animal Charity Evaluators (ACE) charities. I am doing this to showcase a new tool from the Quantified Uncertainty Research Institute (QURI) and to present an alternative to ACE’s current rubric-based approach.

Introduction and goals

At QURI, we’re experimenting with using relative values to estimate the worth of various items and interventions. Instead of basing value on a specific unit, we ask how valuable each item in a list is, compared to each other item. You can see an overview of this approach here.

In this context, I thought it would be meaningful to estimate some items in animal welfare and suffering. I estimated the value of a few a few animal quality-adjusted life-years—fish, chicken, pigs and cows—relative to each other. Then I using those, I estimated the value of top and standout charities as chosen by ACE (Animal Charity Evaluators) in 2022.

Ꙭ ...

Updating in the face of anthropic effects is possible (2023/05/11)

Status: Simple point worth writting up clearly.

Motivating example

You are a dinosaur astronomer about to encounter a sequence of big and small meteorites. If you see a big meteorite, you and your whole kin die. So far you have seen n small meteorites. What is your best guess as to the probability that you will next see a big meteorite?

artist rendition of giant meteorite hitting the Earth

In this example, there is an anthropic effect going on. Your attempt to estimate the frequency of big meteorites is made difficult by the fact that when you see a big meteorite, you immediately die. Or, in other words, no matter what the frequency of big meteorites is, conditional on you still being alive, you’d expect to only have seen small meteorites so far. For instance, if you had reason to believe that around 90% of meteorites are big, you’d still expect to only have seen small meteorites so far.

Ꙭ ...

Review of Epoch’s Scaling transformative autoregressive models (2023/04/28)

We want to forecast the arrival of human-level AI systems. This is a complicated task, and previous attempts have been kind of mediocre. So this paper proposes a new approach.

The approach has some key assumptions. And then it needs some auxiliary hypotheses and concrete estimates flesh out those key assumptions. Its key assumptions are:

Ꙭ ...

A flaw in a simple version of worldview diversification (2023/04/25)

Summary

I consider a simple version of “worldview diversification”: allocating a set amount of money per cause area per year. I explain in probably too much detail how that setup leads to inconsistent relative values from year to year and from cause area to cause area. This implies that there might be Pareto improvements, i.e., moves that you could make that will result in strictly better outcomes. However, identifying those Pareto improvements wouldn’t be trivial, and would probably require more investment into estimation and cross-area comparison capabilities.1

More elaborate versions of worldview diversification are probably able to fix this flaw, for example by instituting trading between the different worldview—thought that trading does ultimately have to happen. However, I view those solutions as hacks, and I suspect that the problem I outline in this post is indicative of deeper problems with the overall approach of worldview diversification.

This post could have been part of a larger review of EA (Effective Altruism) in general and Open Philanthropy in particular. I sent a grant request to the EA Infrastructure Fund on that topic, but alas it doesn’t to be materializing, so that’s probably not happening.

Ꙭ ...

A Soothing Frontend for the Effective Altruism Forum (2023/04/18)

About

forum.nunosempere.com is a frontend for the Effective Altruism Forum. It aims to present EA Forum posts in a way which I personally find soothing. It achieves that that goal at the cost of pretty restricted functionality—like not having a frontpage, or not being able to make or upvote comments and posts.

Usage

Instead of having a frontpage, this frontend merely has an endpoint:

Ꙭ ...

General discussion thread (2023/04/08)

Do you want to bring up something to me or to the kinds of people who are likely to read this post? Or do you want to just say hi? This is the post to do it.

Why am I doing this?

Well, the EA Forum was my preferred forum for discussion for a long time. But in recent times it has become more censorious. Specifically, it has a moderation policy that I don’t like: moderators have banned people I like, like sapphire or Sabs, who sometimes say interesting things. Recently, they banned someone for making a post they found distasteful during April Fools in the EA forum—whereas I would have made the call that poking fun at sacred cows during April Fools is fair game.

So overall it feels like the EA Forum has become bigger and like it cares less about my values. Specifically, moderators are much more willing than I am to trade off the pursuit of truth in exchange for having fewer rough edges. Shame, though perhaps neccessary to turtle down against actors seeking to harm one.

Ꙭ ...

Things you should buy, quantified (2023/04/06)

I’ve written a notebook using reusable Squiggle components to estimate the value of a few consumer products. You can find it here.

Ꙭ ...

What is forecasting? (2023/04/03)

Saul Munn asks:

I haven’t been able to find many really good, accessible essays/posts/pages that explain clearly & concisely what forecasting is for ppl who’ve never heard of it before. Does anyone know of any good, basic, accessible intro to forecasting pages? Thank you!

(something i can link to when someone asks me “what’s forecasting???”)

In general, forecasting refers to the act of making predictions about future events. Generally these predictions are numerical—"A 25% that Trump will be president in 2025“—and they are generally made with the objective of improving one’s models of the world. It’s easy to pretend to have models, or to have models that don’t really help you navigate the world. And at its best, forecasting helps you to acquire and create better models of the world, by discarding the hypotheses that don’t end up predicting the future and polishing those that do. Other threads that also point to this are "rationality”, “good judgment”, “good epistemics”, or “Bayesian statistics”.

Ꙭ ...

Soothing software (2023/03/27)

I have this concept of my mind of “soothing software”, a cluster of software which is just right, which is competently made, which contains no surprises, which is a joy to use. Here are a few examples:

pass: “the standard unix password manager”

pass is a simple password manager based on the Unix philosophy. It saves passwords on a git repository, encrypted with gpg. To slightly tweak the functionality of its native commands (pass show and pass insert), I usually use two extensions, pass reveal, and pass append.

lf

Ꙭ ...

Some estimation work in the horizon (2023/03/20)

This post outlines some work in altruistic estimation that seems currently doable. Some of it might be pursued by, for example, my team at the Quantified Uncertainty Research Institute. But together this work adds up to more than what our small team can achieve.

Two downsides of this post are that a) it looks at things that are more salient to me, and doesn’t comprehensively review all estimation work being done, and b) it could use more examples.

Saruman in Isengard looking at an army of orcs
Saruman in Isengard looking at an army of orcs

Ꙭ ...

Find a beta distribution that fits your desired confidence interval (2023/03/15)

Here is a tool for finding a beta distribution that fits your desired confidence interval. E.g., to find a beta distribution whose 95% confidence interval is 0.2 to 0.8, input 0.2, 0.8, and 0.95 in their respective fields below:

Ꙭ ...

Estimation for sanity checks (2023/03/10)

I feel very warmly about using relatively quick estimates to carry out sanity checks, i.e., to quickly check whether something is clearly off, whether some decision is clearly overdetermined, or whether someone is just bullshitting. This is in contrast to Fermi estimates, which aim to arrive at an estimate for a quantity of interest, and which I also feel warmly about but which aren’t the subject of this post. In this post, I explain why I like quantitative sanity checks so much, and I give some examples.

Why I like this so much

I like this so much because:

  • It is very defensible. There are some cached arguments against more quantified estimation, but sanity checking cuts through most—if not all—of them. “Oh, well, I just think that estimation has some really nice benefits in terms of sanity checking and catching bullshit, and in particular in terms of defending against scope insensitivity. And I think we are not even at the point where we are deploying enough estimation to catch all the mistakes that would be obvious in hindsight after we did some estimation” is both something I believe and also just a really nice motte to retreat when I am tired, don’t feel like defending a more ambitious estimation agenda, or don’t want to alienate someone socially by having an argument.

Ꙭ ...

What happens in Aaron Sorkin’s The Newsroom (2023/03/10)

WILL MACAVOY is an aging news anchor who, together with his capable but amoral executive producer, DON KEEFER, is creating a news show that is optimizing for viewership, sacrificing newsworthiness and journalistic honour in the process. Unsatisfied with this, his boss CHARLIE SKINNER, hires MacAvoy’s idealistic yet supremely capable ex-girlfriend, MACKENZIE MCHALE, to be the new executive producer. She was recently wounded in Afghanistan and is physically and mentally exhausted, but SKINNER is able to see past that, trust his own judgment, and make a bet on her.

Over the course of three seasons, MACKENZIE MCHALE imprints her idealistic and principled journalistic style on an inexperienced news team, whom she mentors and cultivates. She also infects MACAVOY and DON KEEFER, who, given the chance also choose to report newsworthy events over populistic gossip. All the while, CHARLIE SKINNER insulates that budding team from pressures from the head honchos to optimize for views and to not antagonize poweful political figures, like the Koch brothers. His power isn’t infinite, but it is ENOUGH to make the new team, despite trials and tribulations, flourish.

Towards the end of the series, the work of the underlings ends up convincing the head honchos, LEANA and REESE LANSING, that having news reporting that is not crap is something that they, too, desire, and that they are willing to sacrifice some profits to nourish. This becomes relevant when the parent company confronts a hostile takeover, and the LANSINGS have to make a conscious choice to exert their efforts to preserve their news division, which they have come to cherish as a valuable public good.

Ꙭ ...

Winners of the Squiggle Experimentation and 80,000 Hours Quantification Challenges (2023/03/08)

In the second half of 2022, we of QURI announced the Squiggle Experimentation Challenge and a $5k challenge to quantify the impact of 80,000 hours' top career paths. For the first contest, we got three long entries. For the second, we got five, but most were fairly short. This post presents the winners.

Squiggle Experimentation Challenge Objectives

From the announcement post

Ꙭ ...

Use of “I’d bet” on the EA Forum is mostly metaphorical (2023/03/02)

Epistemic status: much ado about nothing.

tl;dr: I look at people saying “I’d bet” on the EA Forum. I find that they mostly mean this metaphorically. I suggest reserving the word “bet” for actual bets, offer to act as a judge for the next 10 bets that people ask me to judge, and mention that I’ll be keeping an eye on people who offer bets on the EA Forum to consider taking them. Usage of the construction “I’d bet” is a strong signal of belief only if it is occasionally tested, and I suggest we make it so.

Inspired by this manifold market created by Alex Lawsen—which hypothesized that I have an automated system to detect where people offer bets—and by this exchange—where someone said “I would bet all the money I have (literally not figuratively) that X” and then didn’t accept one such bet—I wrote a small script[^1] to search for instances of the word “bet” on the EA Forum:

```

Ꙭ ...

A computable version of Solomonoff induction (2023/03/01)

Thinking about Just-in-time Bayesianism a bit more, here is a computable approximation to Solomonoff Induction, which converges to the Turing machine generating your trail of bits in finite time.

The key idea: arrive at the correct hypothesis in finite time
  1. Start with a finite set of Turing machines, \(\{T_0, ..., T_n\}\)
  2. If none of the \(T_i\) predict your trail bits, \((B_0, ..., B_m)\), compute the first \(m\) steps of Turing machine \(T_{n+1}\). If \(T_{n+1}\) doesn’t predict them either, go to \(T_{n+2}\), and so on[^1]
  3. Observe the next bit, purge the machines from your set which don’t predict it. If none predict it, go to 2.

Ꙭ ...

A Bayesian Adjustment to Rethink Priorities' Welfare Range Estimates (2023/02/19)

I was meditating on Rethink Priorities’ Welfare Range Estimates:

Something didn’t feel right. Suddenly, an apparition of E. T. Jaynes manifested itself, and exclaimed:

The way was clear. I should:

Ꙭ ...

Inflation-proof assets (2023/02/11)

Can you have an asset whose value isn’t subject to inflation? I review a few examples, and ultimatly conclude that probably not. I’ve been thinking about this in the context of prediction markets—where a stable asset would be useful—and in the context of my own financial strategy, which I want to be robust. But these thoughts are fairly unsophisticated, so comments, corrections and expansions are welcome.

Asset Resists inflation? Upsides Downsides
Government currencies No Easy to use in the day-to-day
  • At 3% inflation, value halves every 25 years.
Cryptocurrencies A bit Not completely correlated with currencies
  • Depends on continued community interest
  • More volatile
  • Hard to interface with mainstream financial system
  • Normally not private
Stock market Mediumly
  • Easy to interface with the mainstream financial system
  • Somewhat resistant to inflation
  • Nominal increases in value are taxed (!)
  • Not resistant to civilizational catastrophe
  • Past returns don’t guarantee future returns, and American growth may be slowing down.

Ꙭ ...

Straightforwardly eliciting probabilities from GPT-3 (2023/02/09)

I explain two straightforward strategies for eliciting probabilities from language models, and in particular for GPT-3, provide code, and give my thoughts on what I would do if I were being more hardcore about this.

Straightforward strategies

Look at the probability of yes/no completion

Given a binary question, like “At the end of 2023, will Vladimir Putin be President of Russia?” you can create something like the following text for the model to complete:

Ꙭ ...

Impact markets as a mechanism for not loosing your edge (2023/02/07)

Here is a story I like about how to use impact markets to produce value:

  • You are Open Philanthropy and you think that something is not worth funding because it doesn’t meet your bar
  • You agree that if you later change your mind and in hindsight, after the project is completed, come to think you should have funded it, you’ll buy the impact shares, in n years. That is, if the project needs $X to be completed, you promise you’ll spend $X plus some buffer buying its impact shares.
  • The market decides whether you are wrong. If the market is confident that you are wrong, it can invest in a project, make it happen, and then later be paid once you realize you were wrong

The reverse variant is a normal prediction market:

Ꙭ ...

More content