How I Score Marketing Copy with a Brain Model Before It Ships

I ran two headlines for the same landing page through a predictive brain model. One scored 0.72 for conversion. The other scored 0.38.

Same product. Same audience. Completely different predicted brain response.

The model is called TRIBE v2, and it is the reason I stopped guessing about what copy will work. It does not write headlines. It does not A/B test. It simulates how five specific brain regions respond to a piece of text before a single human reads it.

Here is how I built a pipeline around it, what it measures, and why the difference between 0.72 and 0.38 is not subjective. It is neuroscience.

Watch the full walkthrough above, or read on for the written breakdown.

Why I Built This

Most marketing teams test copy after it ships. They run A/B tests, wait for traffic, and hope one variant outperforms the other by enough to matter.

That is expensive. You are burning ad spend on copy that might fail. You are waiting days for statistical significance. And you are learning what did not work after the budget is already gone.

I wanted a way to know which headline would land before I paid for a single impression. Not a gut feel. Not a best practices checklist. A prediction grounded in how the human brain actually processes text.

TRIBE v2 is that prediction engine. It is open source, built by Meta's research team, and it runs on my Mac Mini at 5:30 every morning while I am still asleep.

What TRIBE Actually Measures

TRIBE v2 is a cortical surface activation predictor. In plain language: you feed it a headline, and it simulates which parts of a human brain light up.

Meta trained it by putting 720 people in fMRI machines and having them read natural text. They recorded which brain regions activated for which words. Then they distilled that massive dataset into a model small enough to run on a normal computer.

The result predicts activation across five functional brain regions:

Language Comprehension

This is your cognitive load meter. Dense jargon, convoluted phrasing, and unnecessary abstraction all show up here. If this score is high, the reader is burning mental calories before they have even understood your offer.

Think of it as a readability score, but for the brain. Not the Flesch-Kincaid formula. Actual predicted neural effort.

Executive Attention

Does the hook grab and hold? This is not about whether the copy is "good." It is about whether the brain allocates sustained focus to it.

Low attention means the reader bounced before the CTA. They did not disagree with you. They just stopped reading. The brain found something more interesting to do.

Reward Valuation

This is the conversion engine. Is the promised outcome worth the effort?

Pain-first hooks score high here because the brain is already imagining the relief. This is why "Stop losing leads to bad landing pages" hits harder than "Our agency builds high-converting landing pages." One triggers anticipation. The other triggers nothing.

The brain is fundamentally motivated by anticipated pleasure and relief from pain. Reward Valuation measures whether your copy activates that motivation.

Conflict Monitoring

This is where friction lives. High conflict means the reader detected a contradiction, an unclear promise, or a sales pitch.

The brain has a pitch detector. When it fires, the reader stops listening and starts defending. Self-promotional copy and aggressive CTAs spike this score. The reader does not articulate why they feel resistance. They just feel it.

Semantic Integration

Does the whole message cohere as a narrative? You can have a great hook and a clear CTA, but if they feel like they are from two different ads, this score drops.

The brain notices incoherence even when the reader cannot articulate why something feels off. It is the reason some landing pages convert poorly despite having a strong headline. The headline promises one thing. The body delivers another.

From Raw Scores to Actionable Intelligence

Raw scores are noise. A 0.47 tells you nothing. So I composite the five brain-region scores into three indices that actually mean something.

Engagement

Will the reader keep reading? The formula is 0.4 attention plus 0.4 reward plus 0.2 semantic integration. Attention and reward capture interest. Semantic integration stabilizes it.

Clarity

Will the reader understand what you are saying? The formula is 0.5 ease of comprehension plus 0.3 semantic integration plus 0.2 low friction. If the reader does not understand the words, nothing else matters.

Conversion

Will they act? The formula is 0.4 reward plus 0.35 low friction plus 0.25 attention. Reward drives action. Low friction removes resistance. Attention is the gatekeeper.

These weights are not arbitrary. I calibrated them against actual performance data from my content pipeline. The formula that worked best for predicting real-world engagement was the one that weighted attention and reward highest, with semantic integration as the stabilizer.

The Morning Pipeline

At 5:30am CT, a cron job fires. It queries my content database for every headline, CTA, and post scheduled but not yet scored. Then it feeds each piece into TRIBE.

Each variant takes about ninety seconds on an M4 Mac Mini. The model outputs five raw scores on a zero-to-one scale. My system composites them, runs an AI recommender for specific fixes, and queries my vector database for historically similar high-performing copy.

Everything writes back to the database. I wake up to a scored queue.

Scores do not sit in a spreadsheet. They trigger action. Low clarity routes to my Content Director for a rewrite. High conflict routes to my UI/UX Designer for a friction audit. Low conversion but high engagement means the copy is working and the landing page is breaking, so it routes to my Marketing Technologist.

There is a human approval gate. I see every recommendation before anything routes.

This is not replacing the marketer. It is giving the marketer a preview of what the reader's brain will do.

The Showdown: 0.72 vs 0.38

Here is the concrete example that made me trust this system.

Variant A: "Stop losing leads to bad landing pages." Conversion score: 0.72.

Variant B: "Our agency builds high-converting landing pages." Conversion score: 0.38.

Variant A wins on Reward Valuation. 0.62 versus 0.28. The pain-first hook activates the brain's reward anticipation. The reader is already imagining the fix before they have clicked.

Variant B loses on Conflict Monitoring. Self-promotion triggers the brain's pitch detector. The reader hears "sales pitch" before they hear "solution." The brain resists before the value proposition lands.

You might have picked Variant B because it sounds more professional. The brain disagrees. And now I have numbers to prove it.

What I Would Do Differently

A few honest notes.

TRIBE is CC BY-NC. Internal research and development only. I do not expose these scores to clients. This is my calibration layer, not a client deliverable.

The model is the easy part. The wiring. Vector database, content database, routing layer, approval gate. That took the time.

My first failure was overfitting. Early iterations scored everything high because the model learned my writing style. Every headline looked brilliant. I had to pull in external copy variants from other industries to calibrate against real diversity.

And if I started over, I would build the composite scoring formulas first. Raw activations are noise until you know what you are listening for.

What Is Next

I am building three things on top of this.

Signal context layer. Reddit sentiment from communities feeding into how the recommender frames its advice. If r/n8n is frustrated with a system, the recommender should know that before suggesting a hook.

Batch scoring for full landing page experiences. Score the hero, the subhead, the CTA, and the form as one integrated experience. A headline can score 0.72 and still fail if the landing page contradicts it.

Integration with Element Pb. Every lead magnet gets a TRIBE score before it ships.

The Real Question

If you are building with AI agents, the question is not what model you use. It is what you do with the output.

A brain model that predicts engagement is interesting. A brain model that routes specific recommendations to the right agent, grounded in historical performance, with a human in the loop. That is useful.

The difference between 0.72 and 0.38 is not a matter of opinion. It is a matter of neuroscience. And now that I can see it before I ship, I do not write copy the same way.