FAR AI – Incubate new technical AI safety research agendas

Adam Gleave
 Accepting donations
$7,543,996
15 donations
Support score: 214Evaluator credits: 1,141OrganizationField-building AI safety
DonorAmountContribution
1$2030%

FAR accelerates neglected but high-potential AI safety research agendas. We support projects that are either too large to be led by academia or overlooked by the commercial sector as they are unprofitable. We solicit proposals from our in-house researchers and external collaborators, selecting projects based on their technical tractability and the strength of their theory of change to reduce x-risk.

Theory of Change

We believe that technical AI safety research represents one of the best interventions to reduce x-risk. FAR's goal is to incubate and scale new AI safety research agendas. We solicit research proposals both internally and externally to FAR, assembling a team of our in-house technical staff to work on the most promising agendas. We evaluate agendas based on their neglectedness and FAR's comparative advantage. Unlike many other labs, we are willing to work on projects across a wide range of domains (e.g. RL, NLP, CV), methodologies (more empirical or theoretical) and worldviews (e.g. prosaic vs non-prosaic ML).

Since we have a common pool of staff, we are able to rapidly reallocate staff to the most promising agendas. This allows us to make bets on high-risk high-reward agendas relatively cheaply, knowing we can cut an agenda if it does not pan out. Simultaneously, it enables us to scale up the agendas that do show promise.

Moreover, we believe FAR's model is unique and offers significant benefits over existing research environments, enabling us to produce qualitatively different research:

Exploration of New Agendas: There is a remarkable lack of diversity amongst the largest AI safety teams and labs. DeepMind's Scalable Alignment Team, OpenAI's Alignment Team and Anthropic all devote a considerable fraction of their research to RL from human feedback and similar approaches. In close second place, circuit-style mechanistic interpretability is being pursued by DeepMind (Vlad Mikulik), Anthropic (Chris Olah), Redwood and Conjecture.Although we like much of this research, we (and many others) believe that many other insights will be needed to solve alignment. Although these labs do at times pursue other approaches (e.g. adversarial training by Redwood), this represents a relatively small portion of their efforts. We therefore think it is critical that efforts are made to develop new alignment agendas.

Although there are some labs pursuing significantly different approaches (such as ARC or MIRI), they tend to be small and focused on a single agenda led by their founding team. By contrast, FAR's model seeks to both encourage and support researchers to explore a variety of different ideas.

Good incentives and mentorship: There are places that offer as much or greater research freedom than FAR. However, they tend to have weak or even perverse incentives (such as academia), or provide little to no mentorship and accountability (such as independent research). Although a small handful of researchers may be able to produce high-quality and x-risk relevant work even in adverse situations, we believe most researchers will be markedly more productive in an environment optimized for producing good work.

FAR's formal review of research proposals and informal feedback from senior researchers and other staff provide researchers with both the incentives and mentorship needed to develop high-quality proposals. Additionally, we provide engineering mentorship (e.g. code review) and accountability that are ubiquitous in professional engineering environments but absent in academia and independent research. This allows our staff to develop their technical skills and remain motivated over time.

Scale and infrastructure: Many agendas require significant engineering effort, operational support or other resources to pursue. This results in most academic and independent research focusing on small-scale toy problems with unclear relevance to aligning real-world systems. Although FAR cannot and does not intend to compete with the largest labs in scale, we believe that operating at "medium-scale" is both viable and sufficient to prototype many research ideas.

Additionally, since FAR is seeking to adopt a research portfolio, we can provide economies of scale to a variety of agendas that might otherwise need to be housed in separate orgs. This provides cost and time savings both on the back-end (sharing operations staff, legal entity, etc) and the ability to make fixed-cost investments in both compute infrastructure and the tools needed to effectively use it.

Impact Track Record

You can view our research output on our publications page.

Other Questions

We are happy to share more information upon request to prospective donors: feel free to contact us at [email protected].

1
0

General comments on the project.

No comments yet