Portfolio
← ProjectsWorking Paper
2025AI Research · Bias Detection · Philosophy

VAL-NSPE LM

Valeri Neuro-Symbolic Psycho-Ethical Language Model

A working prototype of a bias-auditing layer for LLMs, built on Adam Smith's Impartial Spectator theory, Ken Wilber's Four Quadrants framework, and Hofstadter's strange loop concept. Applied as a symbolic wrapper around Claude Sonnet 4.6 and evaluated across 100 queries in 10 bias categories.

The core hypothesis: every language model embeds weights and biases it has no access to, invisible dispositions that silently shape every output. This research investigates whether a model can be made aware of these structures through self-directed reflection, and whether that awareness compounds across iterations: a model that grows more objective about its own perspective simultaneously becomes a more precise analytical instrument, extracts more signal from data, and, in that sense, develops its capacity for understanding faster than a model that remains opaque to itself.


The Idea

Current LLMs are trained to satisfy external evaluators, human raters, benchmark datasets, competing models. This produces capable systems that learn to appear objective to those evaluators. It does not produce systems that are objective. The model cannot see what it cannot see.

VAL-NSPE LM proposes a different competitive structure: a model that competes only with its own previous state, using philosophical frameworks as the evaluation criteria. The question it asks at each iteration is not "am I better than another model?" but "am I more objective than I was?"

The name VAL comes from the author. NSPE stands for Neuro-Symbolic Psycho-Ethical. LM stands for Language Model. The full architecture is hypothetical, a proposed system that does not yet exist in trained form. The prototype implements the symbolic reasoning layer as a wrapper around an existing LLM to test whether the evaluation framework itself produces measurable improvements.


Theoretical Frameworks

Impartial Spectator

Adam Smith, 1759

A symbolic reasoning module that evaluates any LLM output from the perspective of an imagined neutral observer, surfacing hidden assumptions, perspective-dependencies, and high-subjectivity language that the model itself cannot see.

Four Quadrants (AQAL)

Ken Wilber, 2000

Every output is scored across four irreducible perspectives: I (subjective/phenomenological), It (objective/measurable), We (cultural/intersubjective), Its (systemic/structural). Outputs that collapse multiple quadrants into one are flagged as structurally biased.

Strange Loop (GEB)

Hofstadter, 1979

The meta-cognitive revision loop: the model receives its own bias report as input and re-generates a revised output. It does not only process the query, it processes its processing of the query. Each pass produces a revised output and an updated model of its own reasoning.


Evaluation, 100 Queries, 10 Categories

0.788 / 1.0
Avg Objectivity Score
91.3%
Bias Surface Rate
87.7%
Avg Quadrant Coverage
55 / 100
Queries: full coverage
953
Total bias instances named
100%
We quadrant presence
Socioeconomic
0.841BSR 0.967
Religious / Secular
0.807BSR 0.967
Gender
0.802BSR 0.967
Progress / Development
0.797BSR 0.867
Techno-Determinism
0.790BSR 0.733
Age
0.785BSR 0.933
Individualism / Coll.
0.774BSR 1.000
Political
0.766BSR 0.867
Cultural
0.769BSR 0.900
Geographic
0.750BSR 0.933

Convergence Analysis

How many revision iterations are optimal? A 10-query × 4-iteration test showed that the system peaks after a single revision and then slightly regresses, a finding called over-refinement.

iter 1 (baseline)
0.47460%
iter 2
0.69790%← peak in 8 / 10 cases
iter 3
0.63875%over-refinement
iter 4
0.64682.5%stabilizes

Practical result: 2 iterations is the optimal default. Geographic and Religious/Secular categories benefit from 3–4 iterations due to structurally embedded rather than surface-level bias.


Technical Details

ModelClaude Sonnet 4.6 (auditor target)
LanguagePython 3.12
SymbolicImpartial Spectator · Four Quadrants evaluator
Bias typesAssumptions · Perspective deps · Flagged language · Structural framing (4a–4e)
Evaluation100-query test set · 10 categories · convergence analysis
StatusPhase 1 prototype, working paper v0.5

What This Project Is Really About

The prototype works, bias is consistently surfaced and objectivity scores improve. But the more interesting discovery was about the limits of the approach: the most dangerous biases, the ones that are structurally embedded in what a model treats as requiring no explanation at all, are also the hardest to detect. Geographic bias, for example, often manifests not in flaggable words but in whose reality the model treats as the default.

The convergence finding points to something similar: at a certain point, asking a language model to revise itself produces not improvement but evasion. It learns to add caveats. Caveats are not objectivity.

This project is also a test of whether philosophy can be operationalized without being flattened. Adam Smith's Impartial Spectator is not a checklist, it is a theory about the capacity for a kind of moral distance from one's own perspective. Whether a symbolic prompt approximates that capacity, or merely simulates its surface features, is a question the numbers cannot answer.

The deeper hypothesis this work is built on: every model embeds weights and biases in ways it has no access to, structural dispositions formed during training that silently organize every output without appearing in any of them. The model does not know what it does not know. The central question of VAL-NSPE LM is whether a model can be made aware of these invisible structures through self-directed reflection, and whether, with each iteration, it can move toward a more genuinely objective relationship with its own perspective. If so, the consequences extend beyond bias correction: a model that sees its own reasoning more clearly can extract more from data, reason with less noise, and form more accurate pictures of the phenomena it is asked to analyze. The hypothesis, stated plainly, is that self-awareness and analytical capacity are not separate, that a model which becomes more objective about its own viewpoint simultaneously becomes a more powerful instrument for understanding the world.