Daniel Filan

An Analytic Perspective on AI Alignment

Cross-posted to the AI Alignment Forum.

This is a perspective I have on how to do useful AI alignment research. Most perspectives I’m aware of are constructive: they have some blueprint for how to build an aligned AI system, and propose making it more concrete, making the concretisations more capable, and showing that it does in fact produce an aligned AI system. I do not have a constructive perspective - I’m not sure how to build an aligned AI system, and don’t really have a favourite approach. Instead, I have an analytic perspective. I would like to understand AI systems that are built. I also want other people to understand them. I think that this understanding will hopefully act as a ‘filter’ that means that dangerous AI systems are not deployed. The following dot points lay out the perspective.

Since the remainder of this post is written as nested dot points, some readers may prefer to read it in workflowy.

Background beliefs

Background desiderata

Transparency

Foundations

Relation between transparency and foundations

Criticisms of the perspective