Daniel Filan

A theory of how alignment research should work

Epistemic status:

Maybe obvious to everyone but me, or totally wrong (this doesn’t really grapple with the challenges of working in a domain where an intelligent being might be working against you), but: