If you work on AI alignment, you already know the recurring conceptual bottleneck: even when we can train powerful optimizers, we still struggle to answer—cleanly and operationally—questions like “what counts as doing well?” In alignment terms, this maps closely to outer alignment: specifying the right target, rather than a seductive proxy.
My (fallible, but actionable) bet is that this bottleneck is partly axiological: we don’t just need better techniques; we need progress in value theory—how to compare values in conflict, how to decide under moral uncertainty, and how to coordinate under deep disagreement. And that is where Unbiased Machine fits: not as “yet another moral stance,” but as a set of techniques for making hard discussions produce movement rather than polarization.
The claim in one sentence: Without progress in axiology (and in how we talk about axiology), alignment lacks a stable specification.
This may not be the whole story of alignment, but it looks central: alignment is, in large part, about aligning systems with human objectives/values—and those are difficult to specify, easy to proxy, and contested.
Axiology is, literally, the study of value: what is good, what matters, and how goods trade off. In alignment, this reappears as technical pressure:
- Multiple legitimate objectives in tension (welfare vs rights, autonomy vs safety, fairness vs efficiency, etc.).
- Moral uncertainty (we don’t know which moral theory is correct—if any single one is).
- Aggregation and disagreement (there is no “the human”; there are many humans with conflicting values).
Even approaches that try to sidestep explicit value specification—e.g., treating human preferences as the target while remaining uncertain about them and learning from behavior—don’t remove axiology. They relocate it: what counts as evidence, whose preferences matter, how conflicts are resolved, what trade-offs are acceptable.
The practical obstacle is not lack of intelligence, but social cognition. In the abstract, we expect “better arguments” to move beliefs. In practice, on identity-loaded topics (religion, politics, existential risk, moral status), the direct path can backfire: more reasons → more defense → less plasticity.
This matters for alignment because axiology is, by definition, identity-adjacent. Many moral commitments function as existential anchors, not as easily swappable hypotheses. I’ve described three mechanisms for discussions where “being right” doesn’t help (and can make things worse):
My thesis for this fourth piece is that these techniques are not decorative “soft skills.” They are epistemic infrastructure for the part of the map where we get stuck hardest.
Read more: https://manuherran.substack.com/p/axiology-first-why-ai-alignment-needs