# [Closed] The ALTER AI Alignment Prize

Make a substantial contribution to the learning-theoretic AI alignment research agenda.

ALTER's Vanessa Kosoy is offering a prize for the best substantial contribution to the learning-theoretic AI alignment research agenda.

**Deadline: October 1, 2023.**

Depending on the quality of submissions, the winner(s) may be offered a position as a researcher in ALTER (similar to this one), to continue work on the agenda, if they so desire.

**Topics** The research topics eligible for the prize are:

- Studying the mathematical properties of the algorithmic information-theoretic definition of intelligence.
- Building and analyzing formal models of value learning based on the above.
- Pursuing any of the future research directions listed in the article on infra-Bayesian physicalism.
- Studying infra-Bayesian logic in general, and its applications to infra-Bayesian reinforcement learning in particular.
- Theoretical study of the behavior of RL agents in population games. In particular, understand to what extent infra-Bayesianism helps to avoid the grain-of-truth problem.
- Studying the conjectures relating superrationality to thermodynamic Nash equilibria.
- Studying the theoretical properties of the infra-Bayesian Turing reinforcement learning setting.
- Developing a theory of reinforcement learning with traps, i.e. irreversible state transitions. Possible research directions include studying the computational complexity of Bayes-optimality for finite state policies (in order to avoid the NP-hardness for arbitrary policies) and bootstrapping from a safe baseline policy.

New topics might be added to this list over the year.

**Requirements** The format of the submission can be either a LessWrong post/sequence or an arXiv paper.

The submission is allowed to have one or more authors. In the latter case, the authors will be considered for the prize as a team, and if they win, the prize money will be split between them either equally or according to their own internal agreement. For the submission to be eligible, its authors must *not* include:

- Anyone employed or supported by ALTER.
- Members of the board of directors of ALTER.
- Members of the panel of the judges.
- First-degree relatives or romantic partners of judges.

In order to win, the submission must be a *substantial* contribution to the mathematical theory of one of the topics above. For this, it must include at least one of:

- A novel theorem, relevant to the topic, which is difficult to prove.
- A novel
*unexpected*mathematical definition, relevant to the topic, with an array of natural properties.

Some examples of known results which would be considered substantial at the time:

- Theorems 1 and 2 in "RL with imperceptible rewards".
- Definition 1.1 in "infra-Bayesian physicalism", with the various theorems proved about it.
- Theorem 1 in "Forecasting using incomplete models".
- Definition 7 in "Basic Inframeasure Theory", with the various theorems proved about it.

**Evaluation** The evaluation will consist of two phases. In the first phase, Vanessa will select 3 finalists. In the second phase, each of the finalists will be evaluated by a panel of judges comprising of:

Each judge will score the submission on a scale of 0 to 4. These scores will be added to produce a total score between 0 and 16. If no submission achieves a score of 12 or more, the main prize will not be awarded. If at least one submission achieves a score of 12 or more, the submission with the highest score will be the winner. In case of a tie, the money will be split between the front runners.

The final winner will be announced publicly, but the scores received by various submissions will not.

**Fast Track** If the prize is awarded, and at least one author of the winning submission is interested in a researcher position in ALTER, they will be considered for it, although this is not an offer or guarantee of employment. In fact, making such hires to continue to advance the agenda is my foremost reason for organizing this prize.

If multiple winning authors are interested in researcher positions, we will consider hiring *at least* one of them. It is also quite likely we will consider hiring all of them, but this depends on our financial and organization ability to support that number.

For additional details about the position, see our previous hiring announcement.

**Assistance** We do not promise to provide any guidance or mentorship to contestants. In fact, identifying researchers that can work with minimal guidance is one of the advantages of this process. However, Vanessa expects to be usually available for providing comments on well-written research proposals. Contestants are encouraged to write such proposals in case of any doubt about the eligibility of their research. Moreover, technical questions pertaining to the learning-theoretic agenda can be asked either on the MIRIx Discord server (where either Alex or Vanessa often answer them), or as comments on the relevant posts by Diffractor (Alex) or myself. Invites to the server are available to good-faith contestants upon request.

If you wish to contact us about either a research proposal or an invite, please write to rot13 of [email protected] and attach a CV plus any other relevant background about yourself.

Please indicate your interest in working on this prize on below or comment to find potential collaborators.

Good luck!

Donors can increase the prize pool by adding their contribution commitment below.

No comments yet