Are flimsy evaluations worth it?
I recently received a bit of grief over a brief evaluations of the impact of the top-10 billionnaires. It seems possible that this topic is worth discussing. In what follows I outline a few non-exhaustive considerations, as well as a few questions of interest.
Value of flimsy evaluations
Right now, I see the value of flimsy evaluations or estimations as coming from:
1. Value of experimentation
There are many things we don’t have estimates or evaluations for. Trying different evaluation methods and topics can be informative about which are more valuable. Individual flimsy evaluations can serve as a proof of concept that can be built upon if the preliminary version appears valuable, and as testing grounds for new evaluation methodologies.
2. Flimsy evaluations considered better than no evaluation
When estimating a probability or a quantity, sometimes a quick BOTEC (back of the envelope calculation) or a Fermi estimate might be worth having despite its imprecision, because there isn’t time or it isn’t worth the effort to conjure a more complex estimate.
For evaluations, oftentimes the tradeoff isn’t between a flimsy evaluation and a more accurate in-depth evaluation, but rather between a flimsy evaluation and no evaluation at all.
In particular, I don’t think that the case of a ranking of billionnaires was that important. But the case of evaluations of EA organizations is. For example, a longstanding annual evaluation of AI safety organizations by Larks is not happening partly because it would be too expensive to produce. But then in that case we are getting no evaluation rather than an flimsier evaluation.
3. Less sure: The world being complicated enough that epistemics is for now a community effort
I consider myself a reasonably knowledgeable individual, but I still regularly read things in the EA Forum and elsewhere that surprise me. Similarly, when forecasting, one usually gets a better result when combining different individual perspectives.
Adjacently, Cunningham’s law states that:
the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer
So it doesn’t seem crazy that for a given number of hours of research, a better answer can be found by posting a flimsy evaluation and relying on commenters to point flaws that would have been hard for the author to identify on their own.
This feels true, but too adversarial for my taste. If I was relying on this, I would explicitly signpost it.
Disvalue of flimsy evaluation
1. Reduced epistemics
In a previous post, a commenter mentioned:
I think posting this was probably net negative EV but it was really funny … Your methodology looks pretty flimsy but it looks like other EAs are taking it seriously … I think the harm from posting things with flimsy methodology and get a lot of upvotes/uncritical comments is something like “lower epistemic rigour on the forum in general”, rather than this article in particular causing a great deal of harm. I think the impact of this article whether positive or negative is likely to be small.
It’s possible that factors such as these could be present for flimsy evaluations.
2. People and organizations are really touchy about evaluations
People and organizations tend to get a bit angsty when being evaluated. I think this is a real cost. I also think that generally, it’s a cost worth paying for communities to have better models of the world. But for very flimsy evaluations, it’s very possible that the cost is just not worth paying.
3. Evaluations having some chance of error
Evaluations have some rate of error that rises the flimsier they are. It’s possible that negative errors are fairly harmful, e.g., by reducing an organization’s ability to fundraise through no fault of their own.
Discussion
Some questions:
- In which context are flimsy evaluations worth it?
- How should one signal that an evaluation could be flimsy?
- Is there inflation of words going on? Open Philanthropy uses “shallow evaluations” for documents that can be a bit comprehensive
- What is the expected error rate before it’s not worth publishing a flimsy evaluation? 1 in 20 seems to low, 1 in 2 too high.
Personal thoughts
Perhaps one likely conclusion could be that flimsy evaluations might be valuable if they clearly signal how much research has gone into them, and give an accurate impression of how flimsy they are.
One possible way of doing this would be to have a prediction about what the expected error rate is. For instance, one could have a prediction like: “I expect that there is a 5% chance of an eggregious error that switches the main conclusion, and 1 to 4 minor errors that flip secondary considerations”.