Skip to main content
Back to Insights
12 min read

Building a Competitor Discovery System: LLMs, Classification, and Guardrails

Building a domain classification system with LLM waterfall prompting, then creating a clever dual-path approach that discovers a website's industry and competitors in seconds, whether it's in our existing dataset or being analyzed for the first time.

When I set out to build automatic competitor discovery for UX Bench, I thought the problem was simple: classify websites by industry, find 5 similar competitors. I was wrong. What started as a straightforward classification task became a journey through LLM-guided classification, NAICS code limitations, and ultimately a clever shortcut that makes the whole system work in production. Here's how I classified 4,500 domains with 100% 6-digit NAICS granularity, then layered on custom business verticals to build a system that handles both known and unknown domains intelligently.

The Challenge: Automatically Finding 5 Relevant Competitors

UX Bench helps users benchmark their site's Core Web Vitals against competitors. For that to work, I needed to automatically identify 5 relevant competitors for any website. A user analyzing walmart.com should see target.com and costco.com, not random e-commerce sites. The system needed to work for thousands of known domains AND handle unknown startups users might analyze.

The Requirements

  • Accurate: Walmart's competitors should be Target and Costco, not Etsy and Wayfair
  • Handle known domains: Fast lookup for 4,000+ classified domains
  • Handle unknown domains: Real-time classification when users analyze new sites
  • Nuanced for newer industries: NAICS code 541511 includes Google, Shopify, and Oracle (not competitors), showing how even with updates every 5 years, NAICS hasn't yet developed sufficient granularity for newer digital business models
  • Intelligent fallback: Roll up to broader categories if specific ones are too small

This article tells two connected stories: Part 1 covers how I built the classification database (5,500 domains with increasingly granular NAICS codes and custom business verticals). Part 2 shows how I designed an intelligent system that handles both known domains (fast lookup) and unknown domains (brand similarity matching), with automatic rollup strategies when granular matches are sparse.

Final Results

~5,500

Domains classified

100%

6-digit NAICS codes

100%

High confidence

~4,500

In production

Why 5,500 became 4,500: After achieving 100% 6-digit NAICS classification with waterfall prompting, we filtered to domains with CrUX data (real user performance metrics), USA-focused English-language sites, and professional-appropriate content. Then we layered on custom business verticals (like "Traditional Banking" vs "Digital Banking") for even more nuanced competitor matching.

Why Domain Classification Is Harder Than It Looks

Before diving into the methods, it's important to understand why using NAICS (North American Industry Classification System) codes for domain classification is tricky.

NAICS Updates Every 5 Years

The system changes regularly (2022, 2017, 2012...). Codes get retired or merged. LLMs trained on older data may use outdated classifications. Example: Code 541513 "Computer Facilities Management Services" was retired in 2017 and merged into 541519.

Regional Variations

While standardized across the US, Canada, and Mexico, some codes are unique to each country. LLMs may conflate different regional versions, leading to classification errors for international companies.

Not Designed for Digital Businesses

NAICS code 541511 "Custom Computer Programming Services" includes Google (search), Shopify (e-commerce), Salesforce (CRM), and Oracle (databases). They're not competitors, yet NAICS sees them as identical.

The solution? Layer semantic "business verticals" on top of NAICS codes. More on that later.

How Text Classification Evolved (Brief History)

Before showing what I actually built, here's a quick look at how classification approaches evolved over the past 30 years. The first three are educational references showing historical context for the waterfall approach I developed.

Historic Alternative

Keyword/Regex Matching (1990s-2000s)

Simple substring matching: if text contains "restaurant" + "menu" → food service. Lightning fast (microseconds) but brittle and requires manual keyword curation for 1,000+ NAICS codes.

⚠️ Initial attempt: Achieved only 4.4% success rate. No semantic understanding, misses synonyms, breaks easily with small text changes.

Historic Alternative

TF-IDF + Similarity (2000s-2010s)

Statistical word importance scoring. Later evolved into supervised ML (Naive Bayes, SVM) but still lacked semantic understanding.

⚠️ Not pursued: No semantic understanding, struggles with context and homonyms.

Tried Initially

Zero-Shot LLM (2020s)

LLMs classify text using pre-trained semantic understanding without examples. Understands that "plumber" relates to "plumbing" and "We help businesses grow online" signals SaaS/marketing.

⚠️ Better but still limited: Can drift without guidance when choosing from 1,000+ categories. Needs structure.

What I Built

Waterfall LLM Classification

Guide the LLM step-by-step: 2-digit sector (20 choices) → 4-digit group (5-10 choices) → 6-digit code (2-5 choices) → business vertical. Prevents drift, improves accuracy.

✅ Used Chat OSS 20B (LM Studio) for 5,500 domains. Slower (25-45s/domain, CPU/RAM-intensive) but free and achieved 100% 6-digit granularity.

Part 1: Building the Classification Database

The first challenge: classify 5,500 domains with high accuracy and granularity. I needed 6-digit NAICS codes (not just broad 2-digit sectors) to enable meaningful competitor matching, then layer on custom business verticals for even more nuance.

I tried keyword/regex matching first, hoping it would be "good enough" to let me focus on building the rest of the competitor discovery tool. It wasn't. With only a 4.4% success rate and no semantic understanding, I needed a better approach. After experimenting with zero-shot LLM classification (which got to 79%), I developed a waterfall approach that achieved 100% 6-digit granularity.

Technology Stack

  • First attempt: Keyword/regex matching - Fast but only 4.4% success rate, no semantic understanding
  • Next attempts: Grok API (xAI) and ChatGPT - Fast, accurate, but costs add up for 5,000+ domains
  • Production method: Chat OSS 20B via LM Studio on local CPU
  • Speed: 25-45 seconds per domain (CPU/RAM-intensive, non-NVIDIA GPU couldn't be utilized)
  • Total time: ~53 hours for 5,500 domains
  • Benefits: Free, full control, easy monitoring

Input Data

  • Crawled: Homepage, about page, products overview
  • Extracted: Title, meta description, H1 tags, body text
  • Average: 918 words per domain
  • Uploaded: Official NAICS 2022 reference docs to help LLM

Each Method Brought Significant Improvements to Classification Accuracy

1. Keyword/Regex Matching 4.4% 6-digit
85% (2-digit broad)
10.6%
4.4%
2. Zero-Shot LLM 79% 6-digit
16%
79% (6-digit)
3. Waterfall LLM Classification 100% 6-digit ✅
100% (6-digit specific codes)
2-digit (broad sectors)
4-digit (industry groups)
6-digit (specific codes)

Domains Classified

5,500

6-Digit Codes

100%

(waterfall approach)

High Confidence

100%

(all 4,500 in prod)

Why the LLM Waterfall Approach

Instead of asking the LLM to pick from 1,030 NAICS codes at once, I guided it step-by-step through narrowing choices. This prevents the model from drifting between unrelated categories.

Waterfall Approach to Avoid Model Drift

1

Assign 2-Digit Sector (20 choices)

Example: "51 - Information"

2

Assign 4-Digit Industry Group within that sector (5-10 choices)

Example: "5112 - Software Publishers"

3

Assign 6-Digit Industry Code within that group (2-5 choices)

Example: "511210 - Software Publishers"

4

Assign Business Vertical (for crowded codes like 541511)

Example: "E-commerce Platforms" vs "Enterprise Cloud/SaaS" vs "Developer Tools"

Why Waterfall Works

Narrowing choices at each step prevents the LLM from jumping between unrelated categories. It's easier for the model to pick "Information" vs "Manufacturing" (Step 1), then "Software Publishers" vs "Data Processing" within Information (Step 2), than to choose correctly from 1,030 options at once.

Why Business Verticals Matter

Even with 6-digit NAICS codes, some industries contain hundreds of domains that aren't true competitors. These crowded codes lack the granularity needed for meaningful competitor discovery. For example, NAICS 541511 "Custom Computer Programming Services" included over 500 domains ranging from Google to Shopify to Oracle. By adding a semantic "business vertical" layer on top of NAICS, we can finally distinguish true competitors.

Before All Lumped Together

NAICS 541511: Custom Computer Programming

Google (Search)
Shopify (E-commerce)
Salesforce (CRM)
Oracle (Databases)
HubSpot (Marketing)
Atlassian (Dev Tools)
+ 494 more domains

❌ These are NOT competitors

After Broken Into Verticals

E-commerce Platforms

Shopify, WooCommerce, BigCommerce, Wix...

Enterprise Cloud/SaaS

Salesforce, Oracle, SAP, ServiceNow...

Developer Tools

Atlassian, GitHub, GitLab, JetBrains...

Search Engines

Google, Bing, DuckDuckGo, Brave...

Marketing Automation

HubSpot, Marketo, ActiveCampaign...

+ 35 more verticals

✅ Now we can find real competitors

The vertical layer was essential: Without it, competitor discovery would suggest Oracle as a Shopify competitor, which makes no sense despite sharing NAICS code 541511. Business verticals bring semantic meaning to rigid classification codes.

Part 2: Designing Intelligent Competitor Selection

After building the classification database, the next challenge was using it in production. UX Bench needed to handle two scenarios: known domains (fast lookup) and unknown domains (requires classification). Here's where I built a clever shortcut.

UX Bench Competitor Discovery Flow

Known Domain Path

User enters "walmart.com"

Database lookup

NAICS 455110 "Department Stores"

Return 5 competitors

⚡ Sub-second

Unknown Domain Path

User enters "new-startup.com"

Crawl + Ask LLM

"Which major brands is this similar to?"

Return matched competitors

⚡ 3-5 seconds

The two-path system leverages pre-classified data for known domains and uses intelligent LLM shortcuts for unknown domains, avoiding expensive re-classification.

Why This Shortcut Works

Instead of running full classification (25-45s, NAICS versioning issues), asking "which major brand is this similar to?" takes 3-5 seconds and sidesteps NAICS versioning entirely while remaining accurate. LLMs excel at brand similarity matching.

  • Faster: 3-5 seconds vs 25-45 seconds
  • Simpler: No NAICS versioning or documentation issues
  • Leverage existing work: Reuses the 5,500 classified domains

Sourcing Competitors with an Intelligent Rollup Strategy

If not enough competitors found at specific level, the system automatically broadens the search:

1

Start: 6-digit NAICS + Business Vertical (most specific)

Example: 541511 "Custom Computer Programming" + "E-commerce Platforms"

Found: 15 competitors ✅ Return top 5

2

If < 5 found: Roll up to 4-digit Industry Group

Example: 5415 "Computer Systems Design Services"

Found: 80 competitors ✅ Return top 5

3

If still < 5: Roll up to 2-digit Sector

Example: 54 "Professional, Scientific, Technical Services"

Found: 500+ competitors ✅ Return top 5

4

Sort by size similarity using Tranco Rankings

When many competitors exist, use Tranco rankings (global site popularity) to find similar-sized companies

Uses logarithmic distance to match Walmart (rank 150) with Target (rank 180), not a small local retailer (rank 500,000)

Reflections & Lessons Learned

The system works: sub-second lookups for known domains, 3-5 second classification via brand similarity for unknowns, and intelligent rollup when needed. But competitor discovery has no "end state." Modern business complicates classification:

  • Cross-vertical operations: Amazon operates in retail, cloud, streaming, and logistics. Where does Shopify end and Square begin?
  • Market fluidity: Today's SaaS competitor might be tomorrow's platform partner.
  • Industry convergence: Financial services build payment processors, retailers launch ad platforms.

If you're building a similar system, don't chase perfect classification. Recognize when improvements plateau (keyword matching achieved 4.4%, zero-shot LLM reached 79%, waterfall pushed to 100%), build practical fallbacks (brand similarity for edge cases), and know when to stop refining. The hardest lesson: deciding when continued iteration delivers diminishing returns versus when to move on to higher-impact work. Sometimes "good enough for 90% of cases" is the right answer.

Key Takeaways

  1. Evaluate if your classification framework has sufficient depth. If your framework lumps competitors together (like NAICS 541511 grouping Google, Shopify, and Salesforce), you may need to augment it with semantic layers that better match your domain.
  2. Guide LLMs, don't let them freewheel. Waterfall classification (20 → 5 → 2 choices per step) beats "pick from 1,030" approaches. Narrowing choices prevents the model from drifting between unrelated categories.
  3. Consider local LLMs for prototyping and batch work. If you have a strong GPU, local models let you validate your approach without API costs. Once proven, evaluate whether you need more powerful API models or if your local setup meets production needs.
  4. Step back when you hit an impasse. I spent days trying to optimize the unknown domain classification, even brainstorming with several LLMs. Nothing worked. Taking a step back, I rethought the challenge entirely and found a completely different approach (brand similarity matching). The new solution worked beautifully and became the production implementation.
  5. Know when to stop refining. The hardest decision isn't improving your system, it's recognizing when continued iteration delivers diminishing returns. Keyword matching achieved 4.4%, zero-shot LLM reached 79%, waterfall pushed to 100%. Each improvement had clear value, but chasing perfection beyond "good enough for 90% of cases" often means sacrificing higher-impact work elsewhere.

See It In Action

The classification system and intelligent competitor discovery described in this article power UX Bench. Enter any domain to see the system in action: known domains get instant results via database lookup, unknown domains use the LLM brand similarity shortcut.

Sources & Further Reading