Research

AI Alignment

Shutdownable Agents through POST-Agency

Abstract:

Many fear that future artificial agents will resist shutdown. I present an idea – the POST-Agents Proposal – for ensuring that doesn’t happen. I propose that we train agents to satisfy Preferences Only Between Same-Length Trajectories (POST). I then prove that POST – together with other conditions – implies Neutrality+: the agent maximizes expected utility, ignoring the probability distribution over trajectory-lengths. I argue that Neutrality+ keeps agents shutdownable and allows them to be useful.

Towards Shutdownable Agents via Stochastic Choice (with Alexander Roman, Christos Ziakas, Leyton Ho, and Louis Thomson)

Technical AI Safety Conference, 2025.

Open-access article

Abstract:

The POST-Agents Proposal (PAP) is an idea for ensuring that advanced artificial agents never resist shutdown. A key part of the PAP is using a novel ‘Discounted Reward for Same-Length Trajectories (DReST)’ reward function to train agents to (1) pursue goals effectively conditional on each trajectory-length (be ‘USEFUL’), and (2) choose stochastically between different trajectory-lengths (be ‘NEUTRAL’ about trajectory-lengths). In this paper, we propose evaluation metrics for USEFULNESS and NEUTRALITY. We use a DReST reward function to train simple agents to navigate gridworlds, and we find that these agents learn to be USEFUL and NEUTRAL. Our results thus provide some initial evidence that DReST reward functions could train advanced agents to be USEFUL and NEUTRAL. Our theoretical work suggests that these agents would be useful and shutdownable.

The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

Philosophical Studies, 2024.

Open-access article / Draft

Abstract:

I explain and motivate the shutdown problem: the problem of designing artificial agents that (1) shut down when a shutdown button is pressed, (2) don’t try to prevent or cause the pressing of the shutdown button, and (3) otherwise pursue goals competently. I prove three theorems that make the difficulty precise. These theorems show that agents satisfying some innocuous-seeming conditions will often try to prevent or cause the pressing of the shutdown button, even in cases where it’s costly to do so. And patience trades off against shutdownability: the more patient an agent, the greater the costs that agent is willing to incur to manipulate the shutdown button. I end by noting that these theorems can guide our search for solutions.

Ethics

A Non-Identity Dilemma for Person-Affecting Views

Draft

Abstract:

Person-affecting views state that (in cases where all else is equal) we’re permitted but not required to create people who would enjoy good lives. In this paper, I present an argument against every possible variety of person-affecting view. The argument is a dilemma over trilemmas. Narrow person-affecting views imply a trilemma in a case that I call ‘Expanded Non-Identity.’ Wide person-affecting views imply a trilemma in a case that I call ‘Two-Shot Non-Identity.’ One plausible practical upshot of my argument is as follows: we individuals and our governments should be doing more to reduce the risk of human extinction this century.

A Fission Problem for Person-Affecting Views

Ergo, forthcoming.

Draft

Abstract:

On person-affecting views in population ethics, the moral import of a person’s welfare depends on that person’s temporal or modal status. These views typically imply that – all else equal – we’re never required to create extra people, or to act in ways that increase the probability of extra people coming into existence.

In this paper, I use Parfit-style fission cases to construct a dilemma for person-affecting views: either they forfeit their seeming-advantages and face fission analogues of the problems faced by their rival impersonal views, or else they turn out to be not so person-affecting after all. In light of this dilemma, the attractions of person-affecting views largely evaporate. What remains are the problems unique to them.

Critical-Set Views, Biographical Identity, and the Long Term

Australasian Journal of Philosophy, 2025.

Open-access article / Draft

Abstract:

Critical-set views avoid the Repugnant Conclusion by subtracting some constant from the welfare score of each life in a population. These views are thus sensitive to facts about biographical identity: identity between lives. In this paper, I argue that questions of biographical identity give us reason to reject critical-set views and embrace the total view. I end with a practical implication. If we shift our credences towards the total view, we should also shift our efforts towards ensuring that humanity survives for the long term.

The Procreation Asymmetry, Improvable-Life Avoidance, and Impairable-Life Acceptance

Analysis, 2023.

Open-access article / Draft

Abstract:

Many philosophers are attracted to a complaints-based theory of the procreation asymmetry, according to which creating a person with a bad life is wrong (all else equal) because that person can complain about your act, whereas declining to create a person who would have a good life is not wrong (all else equal) because that person never exists and so cannot complain about your act. In this paper, I present two problems for such theories: the problem of impairable-life acceptance and an especially acute version of the problem of improvable-life avoidance. I explain how these problems afflict two recent complaints-based theories of the procreation asymmetry, from Joe Horton and Abelard Podgorski.

Critical Levels, Critical Ranges, and Imprecise Exchange Rates in Population Axiology

Journal of Ethics and Social Philosophy, 2022.

Open-access article / Draft

Abstract:

According to critical-level views in population axiology, an extra life improves a population if and only if that life’s welfare level exceeds some fixed “critical level.” An extra life at the critical level leaves the new population equally good as the original. According to critical-range views, an extra life improves a population if and only if that life’s welfare level exceeds some fixed “critical range.” An extra life within the critical range leaves the new population incommensurable with the original.

In this paper, I sharpen some old objections to these views and offer some new ones. Critical-level views cannot avoid certain repugnant and sadistic conclusions. Critical-range views imply that lives featuring no good or bad components whatsoever can nevertheless swallow up and neutralize goodness and badness. Both classes of view imply discontinuities in implausible places.

I then offer a view that retains much of the appeal of critical-level and critical-range views while avoiding the above pitfalls. On the Imprecise Exchange Rates View, various exchange rates—between pairs of goods, between pairs of bads, and between goods and bads—are imprecise. This imprecision is the source of incommensurability between lives and between populations.

Is Global Consequentialism More Expressive Than Act Consequentialism?

Analysis, 2022.

Open-access article / Draft

Abstract:

Act consequentialism states that an act is right if and only if the expected value of its outcome is at least as great as the expected value of any other act’s outcome. Two objections to this view are as follows. The first is that act consequentialism cannot account for our normative ambivalence in cases where agents perform the right act out of bad motives. The second is that act consequentialism is silent on questions of character: questions like ‘What are the right motives to have?’ and ‘What kind of person ought I be?’. These objections have been taken to motivate a move to global consequentialism, on which acts are not the only subjects of normative assessment. Motives and decision-procedures (amongst other things) are also judged right or wrong by direct reference to their consequences. In this paper, I argue that these objections fail to motivate the move from act to global consequentialism.

The Impossibility of a Satisfactory Population Prospect Axiology (Independently of Finite Fine-Grainedness)

Philosophical Studies, 2021.

Open-access article / Draft

Abstract:

Arrhenius’s impossibility theorems purport to demonstrate that no population axiology can satisfy each of a small number of intuitively compelling adequacy conditions. However, it has recently been pointed out that each theorem depends on a dubious assumption: Finite Fine-Grainedness. This assumption states that there exists a finite sequence of slight welfare differences between any two welfare levels. Denying Finite Fine-Grainedness makes room for a lexical population axiology which satisfies all of the compelling adequacy conditions in each theorem. Therefore, Arrhenius’s theorems fail to prove that there is no satisfactory population axiology.

In this paper, I argue that Arrhenius’s theorems can be repurposed. Since all of our population-affecting actions have a non-zero probability of bringing about more than one distinct population, it is population prospect axiologies that are of practical relevance, and amended versions of Arrhenius’s theorems demonstrate that there is no satisfactory population prospect axiology. These impossibility theorems do not depend on Finite Fine-Grainedness, so lexical views do not escape them.

A Dilemma for Lexical and Archimedean Views in Population Axiology

Economics and Philosophy, 2021.

Article / Draft

Abstract:

According to lexical views in population axiology, there are good lives x and y such that some number of lives equally good as x is not worse than any number of lives equally good as y. Such views can avoid the Repugnant Conclusion without violating Transitivity or Separability, but they imply a dilemma: either some good life is better than any number of slightly worse lives, or else the ‘at least as good as’ relation on populations is radically incomplete, in a sense to be explained. One might judge that the Repugnant Conclusion is preferable to each of these horns and hence embrace an Archimedean view. This is, roughly, the claim that quantity can always substitute for quality: each population is worse than a population of enough good lives. However, Archimedean views face an analogous dilemma: either some good life is better than any number of slightly worse lives, or else the ‘at least as good as’ relation on populations is radically and symmetrically incomplete, in a sense to be explained. Therefore, the lexical dilemma gives us little reason to prefer Archimedean views. Even if we give up on lexicality, problems of the same kind remain.

Papers in Population Ethics

DPhil (PhD) Thesis, 2023.

Link

Abstract:

This thesis consists of a series of papers in population ethics: a subfield of normative ethics concerned with the distinctive issues that arise in cases where our actions can affect the identities or number of people of who ever exist. Each paper can be read independently of the others. In Chapter 1, I present a dilemma for Archimedean views: roughly, those views on which adding enough good lives to a population can make that population better than any other. In Chapter 2, I extend Gustaf Arrhenius’s famous impossibility theorems in population axiology into the domain of choices under risk. My risky impossibility theorems dispense with the assumption that welfare levels are finitely fine-grained, and so tell against lexical views in population axiology. In Chapter 3, I present objections to critical-level and critical-range views in population axiology. I then sketch out what I call the ‘Imprecise Exchange Rates View’ and argue that it is an attractive alternative. In Chapter 4, I address critical-level and critical-range views again. This time, I note that they are vulnerable to objections from biographical identity: identity between lives. I suggest that these objections give us reason to reject critical-level and critical-range views and embrace the Total View. In Chapter 5, I argue that objections of the same form – objections from personal identity – tell against person-affecting views in population ethics. In Chapter 6, I draw out some counterintuitive implications of two recent complaints-based theories of the procreation asymmetry.

Policy

How Much Should Governments Pay to Prevent Catastrophes? Longtermism's Limited Role
(with Carl Shulman)

Essays on Longtermism, Oxford University Press, forthcoming.

Draft

Abstract:

Longtermists have argued that humanity should significantly increase its efforts to prevent catastrophes like nuclear wars, pandemics, and AI disasters. But one prominent longtermist argument overshoots this conclusion: the argument also implies that humanity should reduce the risk of existential catastrophe even at extreme cost to the present generation. This overshoot means that democratic governments cannot use the longtermist argument to guide their catastrophe policy. In this paper, we show that the case for preventing catastrophe does not depend on longtermism. Standard cost-benefit analysis implies that governments should spend much more on reducing catastrophic risk. We argue that a government catastrophe policy guided by cost-benefit analysis should be the goal of longtermists in the political sphere. This policy would be democratically acceptable, and it would reduce existential risk by almost as much as a strong longtermist policy.

Google Sites

Report abuse