TargetMind AI
A bias-aware data optimization system where seven self-aware agents analyze a customer dataset, evaluate their own bias contributions, critique each other, correct themselves, and produce two separate final reports, all without a single LLM call.
The Problem
Most customer scoring systems favor high spenders, which sounds logical, until you realize that spending capacity is largely determined by income. A system that scores high-income users higher simply because they spend more is not measuring customer value; it is measuring wealth.
TargetMind AI was built to address this: to separate genuine behavioral signals from demographic proxies, and to make every step of that process auditable. But the deeper question it tries to answer is: can a data pipeline be made aware of the bias it introduces at each step, not just after the fact, but while it is happening?
7-Agent Self-Aware Pipeline
Each agent performs its task and then evaluates its own bias contribution. A shared pipeline log accumulates all seven self-assessments. The Critique agent reads the full log and validates, or challenges, what each agent reported about itself.
Data Cleaning
Self-evaluates bias contributionRemoves duplicates, fixes negatives, applies IQR outlier detection, fills missing values with median/mode. After cleaning, measures how much each decision shifted the demographic distribution, and assigns itself a bias contribution score between 0 and 1.
Segmentation
Self-evaluates representation imbalanceAnalyzes the distribution of each demographic group and every metric column. Calculates how overrepresented or underrepresented each segment is, and flags its own bias contribution if the representation gap exceeds a threshold.
Initial Scoring
Self-evaluates score gaps between groupsAssigns each record a potential score (0–100) using normalized metrics and equal weights. Then measures the score gap between every demographic group, and identifies which metric is most responsible for that gap. Produces its own bias contribution score.
Proxy & Bias Detection
Self-evaluates overlooked relationshipsUses Cramér's V to detect which variables correlate with protected attributes (income, gender, age), flagging them as high-risk proxies. Simultaneously measures demographic score gaps in the high-scoring segment. Asks: which relationships did I miss?
Cross-Agent Critique
Validates all agents' self-reportsReads the full pipeline log, every agent's self-assessment. Validates or challenges each report. Identifies contradictions, under-reported bias, and overlooked patterns. Proposes specific corrections: new weights, alternative fill strategies, flags to carry forward.
Corrected Scoring
Measures before/after bias reductionApplies the corrections proposed by the Critique agent, new metric weights, adjusted thresholds. Re-scores the full dataset and computes the demographic score gap before and after. Reports the exact improvement in points for each demographic dimension.
Final Optimization & Reports
Produces two separate outputsBuilds the optimal customer pool from bias-corrected scores. Then generates two independent reports: a Process Report showing each agent's self-evaluation, critique, and corrections across the full pipeline; and an Optimal Pool Report showing the final audience with demographic distribution and score breakdown.
What Makes This Different
Technical Stack
What I Learned
Building the original pipeline taught me that data pipelines are never neutral, every cleaning decision shapes the outcome downstream. Rebuilding it with self-aware agents taught me something more specific: the agents that introduce the most bias are often the ones that feel the most defensible. Mode-filling missing values is a perfectly reasonable decision. It is also the decision that silently amplifies whatever group is already most represented.
The cross-agent critique step was the most conceptually interesting. An agent reading another agent's self-assessment is not just validation, it is a different perspective on the same decisions. The critique sometimes confirmed what the agent reported. More usefully, it sometimes noticed what the agent failed to mention about itself.
Separating the two final reports, one about the pipeline process, one about the target audience, forced me to think about audience. The process report is for someone who wants to understand how the system works and trust it. The pool report is for someone who wants to act on the results. These are different documents for different purposes, and collapsing them into one would have served neither.
I originally built this with crewAI agents. The agents were unreliable, sometimes calling the right tool, sometimes writing free-form text. Replacing the agent layer with direct Python functions taught me the most durable lesson: for deterministic, auditable data work, you do not need an LLM. The self-awareness in this system comes from measurement and logging, not from language generation.