Unbiased Machine

Tag: Epistemology

Has AI ever made you change your mind?

It doesn’t necessarily have to be a big change. Something subtle is enough: it nudges you to refine your judgment on a sensitive topic, or makes you reconsider a belief you thought was settled.

In my case, yes.

And I’m interested in a possibility: if this becomes a trend, and if AI can sustain reasonable standards of impartiality and rigor, the cumulative impact could be enormous.

If that dynamic scales, the next breakthroughs in unlocking major problems—political, ethical, scientific, or coordination-related—could be immense, and with that we could reduce enormous amounts of suffering.

January 17, 2026
Sneezing, sex, and epistemology

In Spain—and across much of the Spanish-speaking world—it’s common to respond to a sneeze with “Jesús” or “salud,” while French speakers might say “à tes/vos souhaits” and Italians “salute.” For me, that ritual can be oddly irritating; and, based on what I’ve seen in repeated online discussions, plenty of other people report the same annoyance (often describing it as a pointless, attention-drawing micro-obligation).

The point isn’t the sneeze. It’s the pattern: an involuntary spasm gets socially “underlined,” and the underline quietly demands a response (“thanks”), turning physiology into a mandatory interaction. Not because anyone is trying to scold you—usually it’s goodwill or pure habit—but because the script makes you manage, in public, something you didn’t choose.

What helped wasn’t “winning” against the ritual, but changing what my brain treats it as. When a script reliably needles you, you can sometimes treat it like a foreign language: ignore the literal sound and privately “translate” it into a functional meaning you choose. The content of the translation can be arbitrary (even exaggerated) as long as it reliably shifts your affect from irritation to neutral competence.

That same move scales to intellectual conflict. In debates, the urge isn’t always to clarify; it’s to crush, humiliate, or dominate—an adrenal state that tends to reduce precision and increase hostility. A deliberate reframe (“this could be my client,” “this could be my boss,” “this person is trying to think”) can lower dehumanization long enough to regain accuracy and keep the exchange structurally constructive.

This connects to AI alignment and axiology because the hardest part often isn’t optimization—it’s specifying what “doing well” even means (outer alignment) when values conflict, proxies tempt, and humans disagree under pressure. If our value talk is routinely distorted by identity, status, and conversational failure modes, then we shouldn’t be surprised that encoding “the right target” is brittle. A plausible, pragmatic thesis is: better alignment may require not only better models, but better human-side techniques for making value deliberation less reactive and more coherent—i.e., small, repeatable methods that help us do axiology in the real world rather than in idealized armchairs.

Read more: https://manuherran.substack.com/p/sneezing-sex-and-epistemology/

January 9, 2026
Axiology first: why AI alignment needs better conversations
If you work on AI alignment, you already know the recurring conceptual bottleneck: even when we can train powerful optimizers, we still struggle to answer—cleanly and operationally—questions like “what counts as doing well?” In alignment terms, this maps closely to outer alignment: specifying the right target, rather than a seductive proxy.

My (fallible, but actionable) bet is that this bottleneck is partly axiological: we don’t just need better techniques; we need progress in value theory—how to compare values in conflict, how to decide under moral uncertainty, and how to coordinate under deep disagreement. And that is where Unbiased Machine fits: not as “yet another moral stance,” but as a set of techniques for making hard discussions produce movement rather than polarization.

The claim in one sentence: Without progress in axiology (and in how we talk about axiology), alignment lacks a stable specification.

This may not be the whole story of alignment, but it looks central: alignment is, in large part, about aligning systems with human objectives/values—and those are difficult to specify, easy to proxy, and contested.

Axiology is, literally, the study of value: what is good, what matters, and how goods trade off. In alignment, this reappears as technical pressure:
- Multiple legitimate objectives in tension (welfare vs rights, autonomy vs safety, fairness vs efficiency, etc.).
- Moral uncertainty (we don’t know which moral theory is correct—if any single one is).
- Aggregation and disagreement (there is no “the human”; there are many humans with conflicting values).
Even approaches that try to sidestep explicit value specification—e.g., treating human preferences as the target while remaining uncertain about them and learning from behavior—don’t remove axiology. They relocate it: what counts as evidence, whose preferences matter, how conflicts are resolved, what trade-offs are acceptable.

The practical obstacle is not lack of intelligence, but social cognition. In the abstract, we expect “better arguments” to move beliefs. In practice, on identity-loaded topics (religion, politics, existential risk, moral status), the direct path can backfire: more reasons → more defense → less plasticity.

This matters for alignment because axiology is, by definition, identity-adjacent. Many moral commitments function as existential anchors, not as easily swappable hypotheses. I’ve described three mechanisms for discussions where “being right” doesn’t help (and can make things worse):
- The elephant in the room of rationalism: the direct path to truth doesn’t always work.
- The single-layer sandwich: a format where you only verbalize what is valuable in the other person’s contribution, to reduce defensiveness and open exploration.
- Gentle feedback by meta-levels: when there is nothing positive to say at the object level, you search for something salvageable at higher levels (intent, method, precision, epistemic courage, clarity of framing, etc.).
My thesis for this fourth piece is that these techniques are not decorative “soft skills.” They are epistemic infrastructure for the part of the map where we get stuck hardest.

Read more: https://manuherran.substack.com/p/axiology-first-why-ai-alignment-needs
January 8, 2026
When there’s nothing positive to say: the art of gentle criticism by meta-levels

Sometimes the other person’s claim is so wrong you can’t “steelman” it (restate it in its strongest and fairest form) without lying. There’s nothing you can honestly praise. So what can you say truthfully while keeping the conversation revisable?

Read more: https://manuherran.substack.com/p/when-theres-nothing-positive-to-say

January 6, 2026
The single-layer sandwich

There is a widespread intuition—especially in rationalist circles—that the best way to make intellectual progress is to state the truth clearly, directly, and with solid arguments. It’s an attractive and elegant idea. Yet in many contexts it is simply false.

In practice, direct approaches often fail when the topic is emotionally charged, socially uncomfortable, or tied to identity. In those cases, a logically correct response not only fails to help—it makes things worse: it triggers defenses, provokes rejection, and shuts down reflection.
That forces a reframing of the goal. The question stops being “How do I prove I’m right?” and becomes a very different one: How do we make epistemic progress when direct truth doesn’t go in?

The single-layer sandwich is a communication technique designed for contexts in which direct confrontation blocks understanding. In this proposal, we don’t seek to impose conclusions, the interlocutor isn’t an adversary, disagreement isn’t a problem to solve, truth doesn’t appear as a final thesis but as something that will emerge on its own once the angle changes.

Read more: https://manuherran.substack.com/p/the-single-layer-sandwich

January 2, 2026
The elephant in the room of rationalism: the direct path to truth doesn’t always work

In contexts of deep disagreement —with political, moral, or identity-related ramifications— direct rational discussion based on arguments and evidence produces genuinely poor results. Data not only fails to convince; it is often used selectively to reinforce prior beliefs. Discussion turns into litigation to be won, not a constructive inquiry against uncertainty. And unfortunately this is not limited to explicitly political, moral, or identity-laden topics—such as a debate on political philosophy, sexuality, or religion—but also underlies discussions that could initially appear purely technical, such as assessing the risk of nuclear energy, choosing measurement criteria, urban planning, or the application of laws.

Read more: https://manuherran.substack.com/p/the-elephant-in-the-room-of-rationalism

January 1, 2026