How accurate is retro-identification?

About 70–85% accurate vs the live tracking. The miss is mostly Direct-bucketed mobile ChatGPT traffic that has no recoverable referrer. Don't expect 100% — and label your number as a lower bound when presenting to finance.

What about anonymous LLM traffic?

Anonymous AI-referred sessions (no form fill) don't appear in your contact database at all — they're in your web analytics only. To pair the populations, use HubSpot's Marketing Analytics > Sources report cross-referenced with your contact list count.

How far back can I go?

Practically: 12 months. ChatGPT consumer launch was November 2022 but referral traffic was minimal until late 2023. Going back beyond ~Q2 2024 yields almost no AI-referred contacts and isn't worth the workflow cost.

Will retro-tagging mess up Original Source for these contacts?

No — retro-tagging only writes to your custom ai_source property. HubSpot's built-in Original Source field is locked at first-touch and can't be edited. Your existing reports are unaffected.

How to identify LLM-referred contacts in HubSpot (2026)

Once your AI source properties are set up, identifying LLM-referred contacts going forward is trivial. The harder problem is the contacts already in HubSpot before you set up tracking. Here's how to identify both populations and what to do with them.

Required tools

HubSpot Marketing Hub Professional (lists + workflows)
The seven AI source properties from the source-properties guide
30 minutes for the manual review of edge cases
Optional: Lantern, which automates the retro-tagging workflow during onboarding

The steps

Build the 'going forward' list: ai_source != 'none', dynamic list

Lists > Create > Active list. Filter: ai_source is any of chatgpt, perplexity, claude, gemini, copilot. Save as 'AI-referred contacts (live)'. This list updates daily as new AI traffic hits your forms.

Build the retro list: Original Source = Other Campaigns AND Original Source Drill-Down 1 contains AI hostnames

For contacts created before you set up custom AI properties, the breadcrumb is in HubSpot's default Drill-Downs. Filter: Original Source = Other Campaigns AND (Original Source Drill-Down 1 contains chatgpt OR perplexity OR claude OR gemini OR openai OR anthropic). This catches ~70–80% of historical AI traffic.

Cross-reference with Document Referrer if you stored it

If your forms captured Referrer URL into a custom property (or via HubSpot's hs_analytics_first_referrer field), filter contacts where Referrer contains chatgpt.com OR chat.openai.com OR perplexity.ai OR claude.ai OR gemini.google.com. This catches another 10–15% the Drill-Down missed.

Manually review the 'maybe AI' edge cases: Direct traffic from sessions with AI-search-style URLs

Many AI-referred contacts show as Direct because mobile ChatGPT strips referrer. The tell: a Direct contact whose first-page-seen is a long-tail informational URL no one would type directly ('/learn/aeo-pipeline-attribution/'). Pull a list of these, eyeball it, mark the obvious ones with ai_source = chatgpt manually.

Backfill the ai_source property on the historical contacts via workflow

Workflow: contact in 'retro list' → set ai_source = the engine name from Drill-Down. Once this runs, the historical contacts join the live list and your reports include the full population.

Re-run your AEO ROI report including the retroactively tagged contacts

Most teams discover their AEO-attributed ARR jumps 30–50% when they include the retro-tagged population. This is the number you take to the CFO — not the artificially deflated 'since we started tracking' number.

Common mistakes

Skipping the retro pass and undersizing AEO impact in the first quarterly report — you'll never get those contacts back into the data once you forget about them.
Tagging Direct traffic as AI without evidence — leads to AEO-impact inflation and a credibility hit when finance audits.
Forgetting Anthropic and OpenAI as referrer hostnames (vs claude.ai and chatgpt.com) — Anthropic logged-in traffic shows as anthropic.com, not claude.ai.
Using a static list instead of an active list — static lists don't update, so you'll forget to refresh and your reports go stale.

Where this fits in the AEO pipeline attribution stack

The steps above are one link in a longer chain. In order: you pick prompts to monitor, you track AI-referred sessions, you tag contacts in your CRM, you roll attribution up to the Deal object, you report pipeline dollars to the CFO. If you skip any link, the chain breaks and the number you quote to finance can't be defended in an audit.

If you're still evaluating which tool to run this workflow on, Lantern's AEO tool comparison hub has honest head-to-head pages for Profound, Scrunch, Peec AI, AthenaHQ, and HubSpot's own AEO product — scored on the dimensions that matter for a CMO buyer (CRM integration depth, reporting quality, prompt-scaling economics).

If you're about to walk this work into a budget review, the CFO's Guide to AEO Budget Defense has the memo template, the five-slide deck structure, the attribution-math cheat sheet, and the three most-common CFO objections with counter-arguments. It's the long-form companion to this how-to and was written for the renewal conversation specifically.

The operational rhythm that works: run the steps above once to set up, then review the output monthly in a 15-minute standing meeting with your Head of Growth and RevOps lead. Quarterly, re-audit your prompt list, your content backlog, and your attribution lookback window. Annual: present the full-year AEO ROI trend to the board. That cadence is what separates teams who ship an AEO dashboard once from teams who run AEO as an ongoing budget-defensible channel.

How to identify LLM-referred contacts in HubSpot