RSS Feed, all content
A Bayesian Nerd-Snipe (2024/02/25)
Consider the number of people you know who share your birthday. This seems an unbiased estimate of the number of people who, if they had been born the same day of the year as you, you’d know—just multiply by 365. That estimate itself is an estimate of how many people one knows at a somewhat non-superficial level of familiarity.
I asked my Twitter followers that question, and this is what they answered:
How many people do you know that were born in the same day of the year as you?— Nuño Sempere (@NunoSempere) February 21, 2024
Now, and here comes the nerd snipe: after seeing the results of that poll, what should my posterior estimate be for the distribution of how many people my pool of followers knows enough that they’d know their birthdays if they fell on the same day as one’s own?
→ Ꙭ ...
Alternative Visions of Effective Altruism (2023/12/27)Intro
EA was born out of a very specific cultural milieu, and holds a very specific yet historically contingent shape. So what if instead EA had been…Effective Altruism Distributed
Day-trading the crypto to liberate the animals:
→ Ꙭ ...
AI safety forecasting questions (2023/12/06)
tl;dr: This document contains a list of forecasting questions, commissioned by Open Philanthropy as part of its aim to have more accurate models of future AI progress. Many of these questions are more classic forecasting questions, others have the same shape but are unresolvable, still others look more like research projects or like suggestions of data gathering efforts. Below we give some recommendations of what to do with this list, mainly to feed them into forecasting and research pipelines. In a separate document, we outline reasons why using forecasting for discerning the future of AI may prove particularly difficult.Table of Contents
- Recurring terms
→ Ꙭ ...
Auftragstaktik is a method of command and delegation where the commander gives subordinates a clearly-defined objective, high-level details, and the tools needed to accomplish their objective. The subordinates have clear operational freedom, which leaves command to focus on architecting strategic decisions.
It has an interesting historical and semi-mythical background. After Napoleon outclassed the rest of Europe, the Prussians realized that they needed to up their game, and developed this methodology. With it, Germany became a military superpower. These days, though, armies have given up on having independent and semi-insubordinate general troops, and instead the Auftragstaktik stance seems to be reserved for special forces, like e.g., Navy Seals.
But beyond the semi-historical overview from the last paragraph, the idea of Auftragstaktik is useful to me as an ideal to aspire to implement. It is my preferred method of command, and my preferred method of being commanded. It stands in contrast to micromanaging. It avoids alienation as characterized by Marx, where the worker doesn’t have control of their own actions, which kills the soul. Corny as it sounds, if you have competent subordinates, why not give them wide berth to act as special forces rather than as corporate drones?
A while ago, I cleaned up a bit the Wikipedia page on this concept, and now I am writting this post so that it becomes more widely known across my circles. We need more people to carry A Message to Garcia, but independence benefits from a system of incentivization and control that enables it.
→ Ꙭ ...
Hurdles of using forecasting as a tool for making sense of AI progress (2023/11/07)Introduction
In recent years there have been various attempts at using forecasting to discern the shape of the future development of artificial intelligence, like the AI progress Metaculus tournament, the Forecasting Research Institute’s existential risk forecasting tournament/experiment, Samotsvety forecasts on the topic of AI progress and dangers, or various questions osn INFER on short-term technological progress.
Here is a list of reasons, written with early input from Misha Yagudin, on why using forecasting to make sense of AI developments can be tricky, as well some casual suggestions of ways forward.Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions
→ Ꙭ ...
Brief thoughts on CEA’s stewardship of the EA Forum (2023/10/15)
Epistemic status: This post is blunt. Please see the extended disclaimer about negative feedback here. Consider not reading it if you work on the EA forum and don’t have thick skin.
tl;dr: Once, the EA forum was a lean, mean machine. But it has become more bloated over time, and I don’t like it. Separately, I don’t think it’s worth the roughly $2M/year1 it costs, although I haven’t modelled this in depth.The EA forum frontpage through time.
In 2018-2019, the EA forum was a lean and mean machine:
→ Ꙭ ...
Count words in <50 lines of C (2023/09/15)
The Unix utility wc counts words. You can make simple, non-POSIX compatible version of it that solely counts words in 159 words and 42 lines of C. Or you can be like GNU and take 3615 words and 1034 lines to do something more complex.Desiderata
- Simple: Just count words as delimited by spaces, tabs, newlines.
- Allow: reading files, piping to the utility, and reading from stdin—concluded by pressing Ctrl+D.
- Separate utilities for counting different things, like lines and characters, into their own tools.
→ Ꙭ ...
Quick thoughts on Manifund’s application to Open Philanthropy (2023/09/05)
Manifund is a new effort to improve, speed up and decentralize funding mechanisms in the broader Effective Altruism community, by some of the same people previously responsible for Manifold. Due to Manifold’s policy of making a bunch of their internal documents public, you can see their application to Open Philanthropy here (also a markdown backup here).
Here is my perspective on this:
- They have given me a $50k regranting budget. It seems plausible that this colors my thinking.
- Manifold is highly technologically competent.
- Effective Altruism Funds, which could be the closest point of comparison to Manifund, is not highly technologically competent. In particular, they have been historically tied to Salesforce, a den of mediocrity that slows speed, makes interacting with their systems annoying, and isn’t that great across any one dimension.
→ Ꙭ ...
Incorporate keeping track of accuracy into X (previously Twitter) (2023/08/19)
tl;dr: Incorporate keeping track of accuracy into X1. This contributes to the goal of making X the chief source of information, and strengthens humanity by providing better epistemic incentives and better mechanisms to separate the wheat from the chaff in terms of getting at the truth together.Why do this?
- Because it can be done
→ Ꙭ ...
Webpages I am making available to my corner of the internet (2023/08/14)
Here is a list of internet services that I make freely available to friends and allies, broadly defined—if you are reading this, you qualify. These are ordered roughly in order of usefulness.search.nunosempere.com
search.nunosempere.com is an instance of Whoogle. It presents Google results as they were and as they should have been: without clutter and without advertisements.
Readers are welcome to make this their default search engine. The process to do this is a bit involved and depends on the browser, but can be found with a Whoogle search. In past years, I’ve had technical difficulties around once every six months, but tend to fix them quickly.
→ Ꙭ ...
squiggle.c is a self-contained C99 library that provides functions for simple Monte Carlo estimation, based on Squiggle. Below is a copy of the project’s README, the original, always-up-to-date version of which can be found hereWhy C?
- Because it is fast
- Because I enjoy it
- Because C is honest
→ Ꙭ ...
Why are we not harder, better, faster, stronger? (2023/07/19)
In The American Empire has Alzheimer’s, we saw how the US had repeatedly been rebuffing forecasting-style feedback loops that could have prevented their military and policy failures. In A Critical Review of Open Philanthropy’s Bet On Criminal Justice Reform, we saw how Open Philanthropy, a large foundation, spent and additional $100M in a cause they no longer thought was optimal. In A Modest Proposal For Animal Charity Evaluators (ACE) (unpublished), we saw how ACE had moved away from quantitative evaluations, reducing their ability to find out which animal charities were best. In External Evaluation of the Effective Altruism Wiki, we saw someone spending his time less than maximally ambitiously. In My experience with a Potemkin Effective Altruism group (unpublished), we saw how an otherwise well-intentioned group of decent people mostly just kept chugging along producing a negligible impact on the world. As for my own personal failures, I just come out of having spent the last couple of years making a bet on ambitious value estimation that flopped in comparison to what it could have been. I could go on.
Those and all other failures could have been avoided if only those involved had just been harder, better, faster, stronger. I like the word “formidable” as a shorthand here.
→ Ꙭ ...
Some melancholy about the value of my work depending on decisions by others beyond my control (2023/07/13)
For the last few years, while I was employed at the Quantified Uncertainty Research Institute, a focus of my work has been on estimating impact, and on doing so in a more hardcore way, and for more speculative domains, than the Effective Altruism community was previously doing. Alas, the FTX Future Fund, which was using some of our tools, no longer exists. Open Philanthropy was another foundation which might have found value in our work, but they don’t seem to have much excitement and apetite for the “estimate everything” line of work that I was doing. So in plain words, my work seems much less valuable than it could have been .
Part of my mistake  here was to do work whose value depended on decisions by others beyond my control. And then given that I was doing that, not making sure those decisions came back positive.
I have made this mistake before, which is why it stands out to me. When I dropped out of university, it was to design a randomized controlled trial for ESPR, a rationality camp which I hoped was doing some good, but where having some measure of how much could be good to decide whether to greatly scale it. I designed the randomized trial, but it wasn’t my call to decide whether to implement it, and it wasn’t. Pathetically, some students were indeed randomized, but without gathering any pre-post data. Interesting, ESPR and similar programs, like ATLAS, did scale up, so having tracked some data could have been decision relevant.
→ Ꙭ ...
Betting and consent (2023/06/26)
There is an interesting thing around consent and betting:
- Clueless people can’t give fully informed consent around taking some bets I offer,
- because if they were fully informed, they wouldn’t make the bet, because they would know that I’m a few levels above them in terms of calibration and forecast accuracy.
- But on the other hand, they can only acquire the “fully informed” state about their own bullshitting after losing the bet,
- because once you lose money it is much harder to spin up rationalizations.
→ Ꙭ ...
People’s choices determine a partial ordering over people’s desirability (2023/06/17)
Consider the following relationship:
→ Ꙭ ...
Relative values for animal suffering and ACE Top Charities (2023/05/29)
tl;dr: I present relative estimates for animal suffering and 2022 top Animal Charity Evaluators (ACE) charities. I am doing this to showcase a new tool from the Quantified Uncertainty Research Institute (QURI) and to present an alternative to ACE’s current rubric-based approach.Introduction and goals
At QURI, we’re experimenting with using relative values to estimate the worth of various items and interventions. Instead of basing value on a specific unit, we ask how valuable each item in a list is, compared to each other item. You can see an overview of this approach here.
In this context, I thought it would be meaningful to estimate some items in animal welfare and suffering. I estimated the value of a few a few animal quality-adjusted life-years—fish, chicken, pigs and cows—relative to each other. Then I using those, I estimated the value of top and standout charities as chosen by ACE (Animal Charity Evaluators) in 2022.
→ Ꙭ ...
Updating in the face of anthropic effects is possible (2023/05/11)
Status: Simple point worth writting up clearly.Motivating example
You are a dinosaur astronomer about to encounter a sequence of big and small meteorites. If you see a big meteorite, you and your whole kin die. So far you have seen n small meteorites. What is your best guess as to the probability that you will next see a big meteorite?
In this example, there is an anthropic effect going on. Your attempt to estimate the frequency of big meteorites is made difficult by the fact that when you see a big meteorite, you immediately die. Or, in other words, no matter what the frequency of big meteorites is, conditional on you still being alive, you’d expect to only have seen small meteorites so far. For instance, if you had reason to believe that around 90% of meteorites are big, you’d still expect to only have seen small meteorites so far.
→ Ꙭ ...
Review of Epoch’s Scaling transformative autoregressive models (2023/04/28)
We want to forecast the arrival of human-level AI systems. This is a complicated task, and previous attempts have been kind of mediocre. So this paper proposes a new approach.
The approach has some key assumptions. And then it needs some auxiliary hypotheses and concrete estimates flesh out those key assumptions. Its key assumptions are:
- That a sufficient condition for reaching human-level performance might be indistinguishability: if you can’t determine whether a git repository was produced by an expert human programmer or by an AI, this should be a sufficient (though not necessary) demonstration for the AI to have acquired the capability of programming.
- That models' performance will continue growing as predicted by current scaling laws.
→ Ꙭ ...
A flaw in a simple version of worldview diversification (2023/04/25)Summary
I consider a simple version of “worldview diversification”: allocating a set amount of money per cause area per year. I explain in probably too much detail how that setup leads to inconsistent relative values from year to year and from cause area to cause area. This implies that there might be Pareto improvements, i.e., moves that you could make that will result in strictly better outcomes. However, identifying those Pareto improvements wouldn’t be trivial, and would probably require more investment into estimation and cross-area comparison capabilities.1
More elaborate versions of worldview diversification are probably able to fix this flaw, for example by instituting trading between the different worldview—thought that trading does ultimately have to happen. However, I view those solutions as hacks, and I suspect that the problem I outline in this post is indicative of deeper problems with the overall approach of worldview diversification.
This post could have been part of a larger review of EA (Effective Altruism) in general and Open Philanthropy in particular. I sent a grant request to the EA Infrastructure Fund on that topic, but alas it doesn’t to be materializing, so that’s probably not happening.
→ Ꙭ ...
A Soothing Frontend for the Effective Altruism Forum (2023/04/18)About
forum.nunosempere.com is a frontend for the Effective Altruism Forum. It aims to present EA Forum posts in a way which I personally find soothing. It achieves that that goal at the cost of pretty restricted functionality—like not having a frontpage, or not being able to make or upvote comments and posts.Usage
Instead of having a frontpage, this frontend merely has an endpoint:
→ Ꙭ ...
General discussion thread (2023/04/08)
Do you want to bring up something to me or to the kinds of people who are likely to read this post? Or do you want to just say hi? This is the post to do it.Why am I doing this?
Well, the EA Forum was my preferred forum for discussion for a long time. But in recent times it has become more censorious. Specifically, it has a moderation policy that I don’t like: moderators have banned people I like, like sapphire or Sabs, who sometimes say interesting things. Recently, they banned someone for making a post they found distasteful during April Fools in the EA forum—whereas I would have made the call that poking fun at sacred cows during April Fools is fair game.
So overall it feels like the EA Forum has become bigger and like it cares less about my values. Specifically, moderators are much more willing than I am to trade off the pursuit of truth in exchange for having fewer rough edges. Shame, though perhaps neccessary to turtle down against actors seeking to harm one.
→ Ꙭ ...
Things you should buy, quantified (2023/04/06)
I’ve written a notebook using reusable Squiggle components to estimate the value of a few consumer products. You can find it here.
→ Ꙭ ...
What is forecasting? (2023/04/03)
Saul Munn asks:
I haven’t been able to find many really good, accessible essays/posts/pages that explain clearly & concisely what forecasting is for ppl who’ve never heard of it before. Does anyone know of any good, basic, accessible intro to forecasting pages? Thank you!
(something i can link to when someone asks me “what’s forecasting???”)
In general, forecasting refers to the act of making predictions about future events. Generally these predictions are numerical—"A 25% that Trump will be president in 2025“—and they are generally made with the objective of improving one’s models of the world. It’s easy to pretend to have models, or to have models that don’t really help you navigate the world. And at its best, forecasting helps you to acquire and create better models of the world, by discarding the hypotheses that don’t end up predicting the future and polishing those that do. Other threads that also point to this are "rationality”, “good judgment”, “good epistemics”, or “Bayesian statistics”.
→ Ꙭ ...
Soothing software (2023/03/27)
I have this concept of my mind of “soothing software”, a cluster of software which is just right, which is competently made, which contains no surprises, which is a joy to use. Here are a few examples:pass: “the standard unix password manager”
pass is a simple password manager based on the Unix philosophy. It saves passwords on a git repository, encrypted with gpg. To slightly tweak the functionality of its native commands (pass show and pass insert), I usually use two extensions, pass reveal, and pass append.lf
→ Ꙭ ...
Some estimation work in the horizon (2023/03/20)
This post outlines some work in altruistic estimation that seems currently doable. Some of it might be pursued by, for example, my team at the Quantified Uncertainty Research Institute. But together this work adds up to more than what our small team can achieve.
Two downsides of this post are that a) it looks at things that are more salient to me, and doesn’t comprehensively review all estimation work being done, and b) it could use more examples.
Saruman in Isengard looking at an army of orcs
→ Ꙭ ...
Find a beta distribution that fits your desired confidence interval (2023/03/15)
Here is a tool for finding a beta distribution that fits your desired confidence interval. E.g., to find a beta distribution whose 95% confidence interval is 0.2 to 0.8, input 0.2, 0.8, and 0.95 in their respective fields below: