Daniel Filan

About me

I'm currently a research manager at MATS, where I chat with scholars and hopefully turn them into cool resarchers - specifically, working on AI alignment, interpretability, and/or governance.

I have a blog where I write about topics of interest to me: as of the time I write this, there are posts about forecasting, math, and puzzles.

I also have a podcast about this field of research. It's called AXRP, which is short for the AI X-risk Research Podcast. You can listen to episodes on YouTube, or by searching "AXRP" in your favourite podcast app. Alteratively, you can read transcripts here.

In addition to AXRP, I have another podcast called The Filan Cabinet, where I talk to people about whatever I want. Episodes are available on YouTube, or wherever you listen to podcasts.

I'm interested in effective altruism, how we can use our limited resources to do the most good in the world. I also sometimes bet on things, for reasons described by Bryan Caplan and Immanuel Kant.

I completed my PhD in AI at UC Berkeley in 2024, where I was supervised by Stuart Russell. You can read my thesis "Structure and Representation in Neural Networks" here.

I did my undergrad at the Australian National University, studying the theory of reinforcement learning, mathematics, and theoretical physics. I did my honours year (similar to a research master's degree lasting one year) under Marcus Hutter; you can read my thesis "Resource-bounded Complexity-based Priors for Agents" here.

Papers

bibtex

Clusterability in Neural Networks. arxiv
With Stephen Casper, Shlomi Hod, Cody Wild, Andrew Critch, and Stuart Russell.
Introduces the task of dividing the neurons of a network into groups such that edges between neurons in the same group have higher weight than edges between neurons in different groups. Implements this using graph clustering, so 'clusterability' refers to the divisibility of networks. Shows that in many conditions, networks trained with pruning and/or dropout are more clusterable than if their weights were randomly permuted. Also introduces a method of regularizing networks for clusterability.
Exploring Hierarchy-Aware Inverse Reinforcement Learning. arxiv
With Chris Cundy (lead author).
Presented at GoalsRL 2018, held jointly at ICML, IJCAI, and AAMAS 2018.
Advocates for the use of hierarchical planning models of humans for use in inverse reinforcement learning as more realistic for complex tasks, showing that in one task they perform comparably to state-of-the-art models.
Modeling Agents with Probabilistic Programs. website
With Owain Evans (lead author), Andreas Stuhlmüller, and John Salvatier.
Web book.
A web book explaining how to write models of agents in the webppl probabilistic programming language. Covers topics such as "planning as inference", (PO)MDPs, inverse reinforcement learning, hyperbolic discounting, myopic planning, and multi-agent planning.
Self-Modification of Policy and Utility Function in Rational Agents. arxiv
With Tom Everitt (lead author), Mayank Daswani, and Marcus Hutter.
Presented at AGI 2016, winner of the Kurzweil prize for best paper.
Discusses agents that can modify their source code and predict the result of these modifications, and how to define them so that they don't make modifications that stop them from optimising what we originally told them to optimise.
Loss Bounds and Time Complexity for Speed Priors. jmlr
With Jan Leike and Marcus Hutter.
Presented at AISTATS 2016.
A discussion of 'speed priors', that is to say priors over infinite sequences of bits that penalise complex strings, where complexity is measured by the length of programs that produce a string, and the time those programs take to run. Builds off Jürgen Schmidhuber's original paper defining his Speed Prior.