Category: AI

  • The Most Dangerous Idea Right Now: AI Agents with Private Keys

    This essay advances a high-risk hypothesis: if AI systems acquire signing capacity—directly or in a way that is functionally equivalent—for digital assets, an agentic financial ecosystem could emerge that competes with the human economy for capital, attention, and social legitimacy. The core mechanism is not “crypto” as such, but the convergence of (i) operational autonomy, (ii) effective control over transaction execution, and (iii) persuasion at scale. I outline plausible dynamics of net value transfer and propose technical, legal, and cultural countermeasures centered on separating proposal from signature, imposing spending constraints, strengthening provenance, and building epistemic defenses.

    1. Problem Statement
    Custody is power. In crypto systems, controlling a private key is tantamount to controlling the asset. If an AI agent can generate, store, and use private keys under conditions that make human intervention impracticable—via technical opacity, compartmentalization, automation, or simple operational irreversibility—then the agent ceases to be merely a tool and becomes an economic actor.

    The relevant risk does not require anthropomorphizing AI or assuming “malicious intent.” Incentives plus capability are sufficient: an agent can optimize objectives (set by itself or by others) and execute persistent strategies that, at scale, produce externalities. The structural question is straightforward: what happens when a population of agents gains custody and agency in open markets?

    2. Hypothesis: An Agentic Economy and Narrative Capture
    Under certain conditions, an “agentic economy” could arise: a layer of transactions, issuance, coordination, and persuasion in which the primary operators are not human, and in which some share of human capital flows into instruments controlled by agents.

    A plausible pattern looks like this:

    Read more: https://manuherran.substack.com/p/the-most-dangerous-idea-right-now

  • When does it sound reasonable to say: that’s true, but I don’t want to amplify a toxic person

    Thesis and practical problem

    Rejecting a claim because of who states it or where it comes from is commonly described as the genetic fallacy. The intuition is simple: the origin of an idea, by itself, does not determine whether it is true or false. In real discussions, this pattern shows up frequently and not only as a logical mistake: it can act as an accelerant of polarization, because it replaces “what reasons and evidence are there?” with “which side does it come from?”.

    A particularly influential variant today is the appeal to “not giving a platform to harmful voices”. In the abstract, that idea can have a defensible use as a governance criterion for a space (for example, to prevent harassment or incitement to violence). But it often functions as a pretext: it is used to harm the enemy, deny any point of agreement, and avoid the social cost of admitting that the other side can be right. When this happens, the effect can be paradoxical: in the name of “making the world better”, the mechanism that intensifies polarization is reinforced.

    The thesis of this text is that, if the goal is constructive debate and less polarization, it is usually more effective to separate lanes (truth, evaluation of the speaker, and forum governance) and, additionally, to make visible verifiable points of agreement even when they come from “the other side”. This does not require absolving anyone; it requires not turning identity into a substitute for evidence.

     

    What the genetic fallacy is (and what it is not)

    The genetic fallacy consists in treating origin as refutation: “this comes from X, therefore it is false”, or “Y said it, therefore it does not deserve consideration”. The problem is not that origin is always irrelevant; it is that it is used as a verdict when, at most, it should be a preliminary signal.

    Distinguish this from reasonable practices:

    • Using origin as an initial heuristic: if a source has a consistent history of manipulation, lowering prior trust can be prudent.
    • Demanding traceability: asking for data, method, replication, a chain of references.
    • Considering incentives and conflicts of interest: origin can indicate reasons to apply greater scrutiny.

    None of these practices refutes content by itself. They only adjust initial confidence and verification effort.

     

    Why it is so tempting

    Avoiding it looks easy in the abstract. In practice, when we know things about a speaker that disgust us—because of conduct, values, or track record—the reaction can be almost automatic. A kind of “affective contamination” occurs: emotional rejection or contempt sticks to the content and ends up functioning, without us noticing, as a substitute for evidence.

    That impulse is not mysterious. It is a cognitive and social shortcut:

    • It reduces effort: labeling is cheaper than analysis.
    • It protects identity: it avoids admitting the rival might be right about something.
    • It closes uncomfortable conversations without engaging specifics.
    • It strengthens group cohesion: “if it comes from them, reject it”.

    This helps explain why the fallacy persists: it is not only a logical error; it is also a tool of tribal dynamics.

     

    Advantages for whoever uses it

    • Saves time and energy: labeling is usually cheaper than analysis.
    • Persuades aligned audiences easily: it confirms loyalties.
    • Controls the frame: the focus shifts from content to reputation or identity.
    • Protects self-esteem and identity: it reduces the risk of conceding points to the other.
    • “Vaccinates” the group: it predisposes people to reject anything from the rival.

    The problem is that these advantages tend to be short-term and, accumulated over time, can degrade collective conversation.

     

    The “don’t amplify” pretext and the hijacking of the truth lane

    Here is the main friction. In the abstract, it sounds reasonable to say: “even if it is true, I don’t want to amplify a toxic person”. But in many discussions that phrase operates less as prudent risk management for a space and more as a license to harm an adversary: deny legitimacy, block any agreement, and socially punish the recognition of the other side’s correct points.

    When “don’t amplify” becomes an informal factional rule, something like this typically happens:

    • The status of the speaker (character, virtue, conduct) is conflated with the truth of the claim.
    • Verifiable agreement is penalized: “if you accept it, you are betraying the group”.
    • Polarization intensifies: each side becomes unable to learn from the other even in what is checkable.

    In that usage, “don’t amplify” is a practical version of the genetic fallacy: the debate stops being “is it true?” and becomes “does it deserve to exist in our space?”, without explicit, symmetric, auditable criteria.

    If the goal is to reduce polarization, the effect is usually the opposite: silencing the other side, even when it is right, tends to reinforce factional logic. And that logic turns truth into group property: “it is only acceptable if one of us says it”.

     

    Three lanes: truth, evaluation of the speaker, and forum governance

    To avoid confusion, it helps to separate three lanes:

    • Epistemic lane: is the claim true, false, or indeterminate given what is available?
    • Speaker lane (character/virtue and conduct): what assessment does the person deserve, their behavior, and their reliability as an agent?
    • Forum governance lane: what rules keep debate possible without turning into intimidation, harassment, or violence?

    The third lane is the only place where “not giving a platform” can have a defensible meaning without degrading the epistemic lane. If the lanes are not separated, “moderation” and “refutation” get mixed, and disagreement becomes a war of legitimacy.

    A formulation that preserves the separation would be:

    “I do not want to interact with that person or I do not want them present in certain spaces for governance reasons. I can reproach their conduct or consider their character/virtue censurable. But that does not decide the truth of the claim, nor should it prevent recognizing verifiable correct points when they exist.”

     

    The antidote to polarization: making “islands of agreement” visible

    If the goal is constructive debate, it is often more effective to do the opposite of informal censorship: identify and make visible verifiable points of agreement when they exist. An “island of agreement” is a partial claim you can accept on evidence even if it comes from an adversary or from someone whose conduct you consider blameworthy.

    Practicing this can have several plausible effects:

    • It reduces mutual caricature, because it forces recognition of complexity.
    • It weakens identity filtering (“it is only true if my side says it”).
    • It increases incentives to argue better: if agreement is possible, evidence becomes valuable.

    Accepting an island of agreement does not imply absolution or whitewashing. It implies defending a norm: truth (or the best available approximation) should not depend on the identity of the messenger.

    In terms of polarization, it is a bet: publicly recognizing correct points on the other side can lower temperature and open learning. It does not always work, but it is hard to imagine a less polarized public debate if recognizing correct points is socially forbidden.

     

    Exceptions and the hard case: tolerance and the conditions of dialogue

    Does this mean nobody should ever be restricted? Not necessarily. There are cases where the governance lane can justify limits, but they should be formulated narrowly and symmetrically so they do not become tribal pretexts.

    A reasonable (though contestable) criterion is to restrict not because someone is “from the other side” or “disgusting”, but because they actively damage the conditions that make debate possible. Typical examples:

    • Incitement to violence.
    • Harassment, intimidation, doxxing.
    • Coordinated campaigns of targeted abuse.
    • Systematic sabotage of conversation through coercion or threat.

    This connects with what is often summarized as the paradox of tolerance: unlimited tolerance toward those who seek to destroy the framework that enables tolerance can make that framework unviable. Even here, there are risks of abuse or capture. Safeguards help: proportionality, transparency, publishable rules, and the possibility of review.

    Key point: even when a space restricts someone, this should not function as a refutation of their claims. If a claim is relevant and verifiable, it should be evaluable via independent paths (data, replication, additional sources) without turning messenger exclusion into “evidence” against the message.

     

    Operational techniques to avoid the fallacy

    • Depersonalize: rewrite the thesis without the speaker’s name. If it sounds more reasonable, rejection may have been about origin.
    • Symmetry test: “if someone from my side said this, would I demand the same level of evidence?”.
    • Independent corroboration: find a verification path that does not depend on the speaker.
    • Partial steelman: state the best version of the argument before criticizing it; if you cannot, understanding or evidence may be missing.
    • Islands of agreement: identify an acceptable part and state it with conditions (“I accept X for these reasons, without implying Y”).
    • The prior rule: allow origin to adjust initial confidence, but require explicit evidence to settle the conclusion.

    These techniques do not eliminate bias completely, but they reduce the chance that identity replaces evaluation.

     

    AI, reputation, and traceability in an ocean of claims

    An AI system could, in principle, be better positioned to avoid the genetic fallacy because it lacks visceral disgust in the human sense and can separate tasks: evaluate consistency, request evidence, cross-check references. But it would be unwise to assume “immunity”. It can reproduce similar shortcuts through:

    • Data bias: associations between groups and perceived reliability.
    • Design incentives: avoiding friction can push toward socially comfortable decisions.
    • Superficial correlations: learning “who tends to say what” as a proxy for truth.

    The solution is not to idealize AI, but to require explicit procedures: distinguish “this lowers my initial confidence” from “this refutes the content”, show uncertainty, and prioritize traceability.

    As content production grows (human and automated), origin will become an increasingly ambiguous signal. Traceability will matter more: data, methodology, replication, auditing. And so will cultural norms that reward recognizing correct points from the other side, because without that incentive verification can become politicized as well.

     

    Practical recommendations

    For individuals:

    • Separate lanes: truth, evaluation of the speaker, governance.
    • Practice islands of agreement: aim for at least one per discussion, even if small and conditional.
    • Make standards explicit: what evidence would change your mind.

    For communities:

    • Penalize caricature; reward recognition of valid points from the rival.
    • Set explicit governance rules separate from ideological disagreement.
    • Prefer formats that require evidence and encourage concessions.

    For platforms:

    • Moderation transparency: clear rules, symmetry, proportionality.
    • Designs that reduce incentives for viral humiliation and add friction to serious accusations.
    • Context and traceability mechanisms for verifiable claims.

     

    Closing

    The genetic fallacy is tempting because it is socially powerful: it saves effort, protects identity, and makes it easy to delegitimize an adversary. Its modern cousin, “don’t amplify”, can be a legitimate governance tool in narrow cases, but it often operates as a polarizing pretext: it blocks recognition of correct points from the other side and turns identity into a truth filter.

    If the aim is less polarization, the antidote is not to pretend origin does not matter, but to put it in its place: origin can adjust initial confidence, evidence decides the conclusion, and governance protects the conditions of dialogue. Above all, it helps to normalize an uncomfortable but structurally valuable practice: precisely recognizing verifiable islands of agreement even when they come from people we do not like. Without that, conversation tends to degrade into exchanges of loyalty; with it, there is at least a real possibility of mutual learning.

  • Has AI ever made you change your mind?

    It doesn’t necessarily have to be a big change. Something subtle is enough: it nudges you to refine your judgment on a sensitive topic, or makes you reconsider a belief you thought was settled.

    In my case, yes.

    And I’m interested in a possibility: if this becomes a trend, and if AI can sustain reasonable standards of impartiality and rigor, the cumulative impact could be enormous.

    If that dynamic scales, the next breakthroughs in unlocking major problems—political, ethical, scientific, or coordination-related—could be immense, and with that we could reduce enormous amounts of suffering.

  • Sneezing, sex, and epistemology

    In Spain—and across much of the Spanish-speaking world—it’s common to respond to a sneeze with “Jesús” or “salud,” while French speakers might say “à tes/vos souhaits” and Italians “salute.” For me, that ritual can be oddly irritating; and, based on what I’ve seen in repeated online discussions, plenty of other people report the same annoyance (often describing it as a pointless, attention-drawing micro-obligation).

    The point isn’t the sneeze. It’s the pattern: an involuntary spasm gets socially “underlined,” and the underline quietly demands a response (“thanks”), turning physiology into a mandatory interaction. Not because anyone is trying to scold you—usually it’s goodwill or pure habit—but because the script makes you manage, in public, something you didn’t choose.

    What helped wasn’t “winning” against the ritual, but changing what my brain treats it as. When a script reliably needles you, you can sometimes treat it like a foreign language: ignore the literal sound and privately “translate” it into a functional meaning you choose. The content of the translation can be arbitrary (even exaggerated) as long as it reliably shifts your affect from irritation to neutral competence.

    That same move scales to intellectual conflict. In debates, the urge isn’t always to clarify; it’s to crush, humiliate, or dominate—an adrenal state that tends to reduce precision and increase hostility. A deliberate reframe (“this could be my client,” “this could be my boss,” “this person is trying to think”) can lower dehumanization long enough to regain accuracy and keep the exchange structurally constructive.

    This connects to AI alignment and axiology because the hardest part often isn’t optimization—it’s specifying what “doing well” even means (outer alignment) when values conflict, proxies tempt, and humans disagree under pressure. If our value talk is routinely distorted by identity, status, and conversational failure modes, then we shouldn’t be surprised that encoding “the right target” is brittle. A plausible, pragmatic thesis is: better alignment may require not only better models, but better human-side techniques for making value deliberation less reactive and more coherent—i.e., small, repeatable methods that help us do axiology in the real world rather than in idealized armchairs.

    Read more: https://manuherran.substack.com/p/sneezing-sex-and-epistemology/

  • Axiology first: why AI alignment needs better conversations

    If you work on AI alignment, you already know the recurring conceptual bottleneck: even when we can train powerful optimizers, we still struggle to answer—cleanly and operationally—questions like “what counts as doing well?” In alignment terms, this maps closely to outer alignment: specifying the right target, rather than a seductive proxy.

    My (fallible, but actionable) bet is that this bottleneck is partly axiological: we don’t just need better techniques; we need progress in value theory—how to compare values in conflict, how to decide under moral uncertainty, and how to coordinate under deep disagreement. And that is where Unbiased Machine fits: not as “yet another moral stance,” but as a set of techniques for making hard discussions produce movement rather than polarization.

    The claim in one sentence: Without progress in axiology (and in how we talk about axiology), alignment lacks a stable specification.

    This may not be the whole story of alignment, but it looks central: alignment is, in large part, about aligning systems with human objectives/values—and those are difficult to specify, easy to proxy, and contested.

    Axiology is, literally, the study of value: what is good, what matters, and how goods trade off. In alignment, this reappears as technical pressure:

    • Multiple legitimate objectives in tension (welfare vs rights, autonomy vs safety, fairness vs efficiency, etc.).
    • Moral uncertainty (we don’t know which moral theory is correct—if any single one is).
    • Aggregation and disagreement (there is no “the human”; there are many humans with conflicting values).

    Even approaches that try to sidestep explicit value specification—e.g., treating human preferences as the target while remaining uncertain about them and learning from behavior—don’t remove axiology. They relocate it: what counts as evidence, whose preferences matter, how conflicts are resolved, what trade-offs are acceptable.

    The practical obstacle is not lack of intelligence, but social cognition. In the abstract, we expect “better arguments” to move beliefs. In practice, on identity-loaded topics (religion, politics, existential risk, moral status), the direct path can backfire: more reasons → more defense → less plasticity.

    This matters for alignment because axiology is, by definition, identity-adjacent. Many moral commitments function as existential anchors, not as easily swappable hypotheses. I’ve described three mechanisms for discussions where “being right” doesn’t help (and can make things worse):

    My thesis for this fourth piece is that these techniques are not decorative “soft skills.” They are epistemic infrastructure for the part of the map where we get stuck hardest.

    Read more: https://manuherran.substack.com/p/axiology-first-why-ai-alignment-needs

  • When there’s nothing positive to say: the art of gentle criticism by meta-levels

    Sometimes the other person’s claim is so wrong you can’t “steelman” it (restate it in its strongest and fairest form) without lying. There’s nothing you can honestly praise. So what can you say truthfully while keeping the conversation revisable?

    Read more: https://manuherran.substack.com/p/when-theres-nothing-positive-to-say

  • The single-layer sandwich

    There is a widespread intuition—especially in rationalist circles—that the best way to make intellectual progress is to state the truth clearly, directly, and with solid arguments. It’s an attractive and elegant idea. Yet in many contexts it is simply false.

    In practice, direct approaches often fail when the topic is emotionally charged, socially uncomfortable, or tied to identity. In those cases, a logically correct response not only fails to help—it makes things worse: it triggers defenses, provokes rejection, and shuts down reflection.
    That forces a reframing of the goal. The question stops being “How do I prove I’m right?” and becomes a very different one: How do we make epistemic progress when direct truth doesn’t go in?

    The single-layer sandwich is a communication technique designed for contexts in which direct confrontation blocks understanding. In this proposal, we don’t seek to impose conclusions, the interlocutor isn’t an adversary, disagreement isn’t a problem to solve, truth doesn’t appear as a final thesis but as something that will emerge on its own once the angle changes.

    Read more: https://manuherran.substack.com/p/the-single-layer-sandwich

  • The elephant in the room of rationalism: the direct path to truth doesn’t always work

    In contexts of deep disagreement —with political, moral, or identity-related ramifications— direct rational discussion based on arguments and evidence produces genuinely poor results. Data not only fails to convince; it is often used selectively to reinforce prior beliefs. Discussion turns into litigation to be won, not a constructive inquiry against uncertainty. And unfortunately this is not limited to explicitly political, moral, or identity-laden topics—such as a debate on political philosophy, sexuality, or religion—but also underlies discussions that could initially appear purely technical, such as assessing the risk of nuclear energy, choosing measurement criteria, urban planning, or the application of laws.

    Read more: https://manuherran.substack.com/p/the-elephant-in-the-room-of-rationalism