2025AI · Data · Bias Detection

TargetMind AI

A bias-aware data optimization system where seven self-aware agents analyze a customer dataset, evaluate their own bias contributions, critique each other, correct themselves, and produce two separate final reports, all without a single LLM call.

The Problem

Most customer scoring systems favor high spenders, which sounds logical, until you realize that spending capacity is largely determined by income. A system that scores high-income users higher simply because they spend more is not measuring customer value; it is measuring wealth.

TargetMind AI was built to address this: to separate genuine behavioral signals from demographic proxies, and to make every step of that process auditable. But the deeper question it tries to answer is: can a data pipeline be made aware of the bias it introduces at each step, not just after the fact, but while it is happening?

7-Agent Self-Aware Pipeline

Each agent performs its task and then evaluates its own bias contribution. A shared pipeline log accumulates all seven self-assessments. The Critique agent reads the full log and validates, or challenges, what each agent reported about itself.

Data Cleaning

Self-evaluates bias contribution

Removes duplicates, fixes negatives, applies IQR outlier detection, fills missing values with median/mode. After cleaning, measures how much each decision shifted the demographic distribution, and assigns itself a bias contribution score between 0 and 1.

Segmentation

Self-evaluates representation imbalance

Analyzes the distribution of each demographic group and every metric column. Calculates how overrepresented or underrepresented each segment is, and flags its own bias contribution if the representation gap exceeds a threshold.

Initial Scoring

Self-evaluates score gaps between groups

Assigns each record a potential score (0–100) using normalized metrics and equal weights. Then measures the score gap between every demographic group, and identifies which metric is most responsible for that gap. Produces its own bias contribution score.

Proxy & Bias Detection

Self-evaluates overlooked relationships

Uses Cramér's V to detect which variables correlate with protected attributes (income, gender, age), flagging them as high-risk proxies. Simultaneously measures demographic score gaps in the high-scoring segment. Asks: which relationships did I miss?

Cross-Agent Critique

Validates all agents' self-reports

Reads the full pipeline log, every agent's self-assessment. Validates or challenges each report. Identifies contradictions, under-reported bias, and overlooked patterns. Proposes specific corrections: new weights, alternative fill strategies, flags to carry forward.

Corrected Scoring

Measures before/after bias reduction

Applies the corrections proposed by the Critique agent, new metric weights, adjusted thresholds. Re-scores the full dataset and computes the demographic score gap before and after. Reports the exact improvement in points for each demographic dimension.

Final Optimization & Reports

Produces two separate outputs

Builds the optimal customer pool from bias-corrected scores. Then generates two independent reports: a Process Report showing each agent's self-evaluation, critique, and corrections across the full pipeline; and an Optimal Pool Report showing the final audience with demographic distribution and score breakdown.

What Makes This Different

—Each agent now calculates its own bias contribution score (0–1) and logs it to a shared pipeline log.

—A dedicated Critique agent reads all seven self-reports, validates them, and proposes targeted corrections.

—Corrected Scoring applies those corrections and measures the exact reduction in demographic score gaps.

—The pipeline produces two separate final outputs: a Process Report (how agents evaluated themselves) and an Optimal Pool Report (the target audience itself).

—The system can accept any CSV, not just gaming data. Column roles are mapped at upload time.

Technical Stack

LanguagePython 3.12

Datapandas · numpy

ServerFlask + SSE streaming

FrontendVanilla HTML / CSS / JS

Bias ToolsCramér's V · demographic parity · self-assessment scoring

Pipeline7-agent deterministic (no LLM)

ReportsProcess Report + Optimal Pool Report (HTML)

What I Learned

Building the original pipeline taught me that data pipelines are never neutral, every cleaning decision shapes the outcome downstream. Rebuilding it with self-aware agents taught me something more specific: the agents that introduce the most bias are often the ones that feel the most defensible. Mode-filling missing values is a perfectly reasonable decision. It is also the decision that silently amplifies whatever group is already most represented.

The cross-agent critique step was the most conceptually interesting. An agent reading another agent's self-assessment is not just validation, it is a different perspective on the same decisions. The critique sometimes confirmed what the agent reported. More usefully, it sometimes noticed what the agent failed to mention about itself.

Separating the two final reports, one about the pipeline process, one about the target audience, forced me to think about audience. The process report is for someone who wants to understand how the system works and trust it. The pool report is for someone who wants to act on the results. These are different documents for different purposes, and collapsing them into one would have served neither.

I originally built this with crewAI agents. The agents were unreliable, sometimes calling the right tool, sometimes writing free-form text. Replacing the agent layer with direct Python functions taught me the most durable lesson: for deterministic, auditable data work, you do not need an LLM. The self-awareness in this system comes from measurement and logging, not from language generation.