Reasons to be Worried About AI

Co-written with Claude Sonnet 4.6

After watching one of Hank Green’s recent video on Anthropic’s Mythos announcement I went back and caught up on his previous two videos (though I mostly skipped the parts with Cal Newport). Going through this I realized that while I have a lot of sympathy for prosaic AI problems, I don’t have a canonical, visible source of reference for what these prosaic problems are.

An important part of my own frame for analyzing these things is that I think any existential risk concern from AI is going to have to route through a prosaic risk concern from AI. If a singular AI agent kills off all of humanity, that means a singular AI agent was able to accumulate enough military power to do that! Maybe that means secret bioweapons expertise being accessible to untrustworthy actors. Maybe it means super-persuasion. Maybe it means autonomous weapon systems being deployed and then jailbroken. Part of the reason so many people are worried about AI for seemingly unrelated or incompatible reasons is exactly that the risks are disjunctive; any risk being true could result in disaster, even if the rest of the risks evaporate.

I expect that the problem of fully general agent alignment is unsolvable—not just in practice, but in the technical sense of being undecidable or self-contradictory—so what remains is to understand what subspace of alignment problems we are likely to face. In this case, considering problems that are widely agreed upon or that are already happening grounds the expectations in something empirical, rather than in the small, only lightly constrained Agent Foundations paradigm.

Many of the disagreements in broad political discourse are simply tracking areas of uncertainty that are already well established in AI Safety discourse. The dimensions I’ve collected follow:

Capabilities and Timing

AI timeline debates have been hotly debated for at least 16 years and arguably for 60+ years. A number of factors, notably the jagged frontier, have reasonably resulted in large disagreements over current AI capabilities, to say nothing of the timeline of future capabilities. And the ways that people use AI different also create significant apparent capability variation.

While I don’t think the case that AI is only hype and slop holds any more, this development is recent and empirical, and having a clear, recent demonstration of capabilities is helpful for creating common ground with someone who may have only engaged with AI casually months in the past.

Targets of Harm

While existential risks harm everyone, there are a wide variety of existing and near-term-unavoidable harms. When engaging with someone who is currently or imminently in danger from present AI practices, I expect that scaling up their worries until they become catastrophic risks will be more persuasive than invalidating their experiences to focus on existential threats.

Some of the commonly discussed dimensions of harm include:

  • Economic harms from labor displacement, wage suppression, IP theft, centralization of power

  • Policy harms from bias in deployed systems for criminal justice, insurance claims

  • Epistemic harms from overreliance on AI, saturation of media environments, and disconnection from ground truth, including deskilling or loss of important phases of learning

  • Environmental harms such as power consumption of data centers and destructive use of rare minerals

  • Political harms such as regulatory capture, state surveillance capabilities, monopolies, concentration of power, and exclusion of marginalized communities

Any of these dimensions could be a source of existential risk. Economic disruption could result in humans being irrelevant to the global economy. Deploying policies based on biased systems is strong evidence that our institutions are already unable to act on safety and accuracy concerns. Epistemic harms scale to super-persuasion and AI psychosis (skipping right past alternative facts).

Possible Remedies

An important corollary to the multi-dimensionality of harms is the necessity of multi-dimensional solutions. I’m proud that EA managed to prioritize policy work a while back, but government policy and technical work are not the only dimensions of remedy available, nor are they single dimensions in themselves.

  • Collective action would need to be the source of implementing UBI-style solutions to AI economic takeovers

  • Antitrust measures may be important for limiting concentration of power in private companies

  • Sector-specific regulations are required for technical solutions to be robustly reliably implemented

  • International coordination is required for slowing capabilities progress and constraining military applications

  • Privacy laws and technologies are needed for protection from mass surveillance

  • Safety practices need broad buy-in to control deployed technologies

  • Design choices in deployed technologies will shape both the security surfaces of the technology and common usage patterns

  • Technical consistency and reliability results are needed for safety practices to provide meaningful safety

  • Interpretability results are necessary to communicate safety results to affected communities

Coordinated Action Under Uncertainty

When the value of different actions is uncertain, and there are many available resources, it is best to divide resources among plausible actions along the pareto frontier according to estimated marginal utility.

When uncertainty looms as large as it does, estimating marginal utility is incredibly difficult, and at the limit of uncertainty there is no way to distinguish between parts of the pareto frontier.

While there may be large disagreements in probability and potential value, in general Neglectedness suggests that all avenues should be pursued when there are resources at hand.

In fact, specialization can result in pareto gains in theory and often does in practice. Technical AI safety as a field is often dismissive of other critiques, but creating a larger tent for a broader coalition can help us monitor our own blind spots.

Next
Next

Engaging vs. Engagement