Static Encyclopedia GEO Optimization SOP

---
name: static-encyclopedia-geo-optimization
slug: static-encyclopedia-geo-optimization
description: >-
  Complete GEO (Generative Engine Optimization) pipeline for static HTML
  industry encyclopedia sites. 7-module SOP adapted from the AI流量+471%
  methodology — batch optimizes 100+ static HTML pages for AI search
  visibility (ChatGPT Search, Perplexity, Gemini, Claude).
version: 1.1.0
tags: [geo, generative-engine-optimization, static-html, seo, ai-search,
  schema, eeat, semantic-html, batch-processing]
---

# Static Encyclopedia GEO Optimization Pipeline

Batch GEO optimization for static HTML industry encyclopedia sites (lighting, packaging, furniture, etc.). Based on the 7-module GEO SOP methodology proven to increase AI Platform traffic by +471%.

## When This Skill Is Relevant

Trigger when:
- User has a static HTML encyclopedia site (e.g. `/encyclopedia/lighting/`)
- User wants to optimize 100+ pages for AI search visibility
- Need to batch-apply SEO/GEO improvements across an entire site
- Building a new vertical encyclopedia and want to start GEO-ready from day 1

## ⚠️ Critical Constraint: "Public Content Only Shows Results, Not Methods"

When the user asks about what GEO optimization does for their clients / public-facing content, **never describe the technical steps or tools used**. Only describe the **results and outcomes**:

- ❌ "We added FAQPage Schema to 21 pages and Quick Answer boxes to 115 pages"
- ✅ "Your encyclopedia content is now structured so AI search engines can directly extract and cite your information"

- ❌ "We replaced div tags with semantic HTML elements like main, article, aside"
- ✅ "Your pages are technically optimized for AI crawlers to better understand and prioritize your content"

- ❌ "We set up a cron job at 9 AM to generate articles and your local agent publishes at 10 AM"
- ✅ "Content creation and multi-platform distribution are fully automated"

This constraint applies to: step-by-step instructions, specific tool names, architecture details, and technical implementation methods.

## Architecture

The encyclopedia is a static HTML site in a directory structure:
```
/var/www/[domain]/landing/encyclopedia/[topic]/
├── index.html
├── products/
│   ├── index.html
│   └── led-downlights-guide.html
├── parameters/
├── scenes/
├── troubleshooting/
├── standards/
├── research/
├── us/ (city directories — skip)
├── answers/ (GEO Prompt pages — NEW)
├── sitemap.xml
├── llms.txt
└── llms-full.txt
```

## The 7-Module GEO SOP

### Module 1: Optimization Goals & Metrics

Define measurable outcomes before starting:

| Metric | Source | Target |
|--------|--------|--------|
| AI Platform referral traffic | GA4 (filter by platform) | +50% in 30 days |
| AI mention rate | Batch Prompt testing | Brand appears in 60%+ of relevant prompts |
| Schema coverage | grep scan of HTML files | 95%+ of content pages |
| llms.txt completeness | Check for all categories | All sections listed with direct URLs |
| sitemap coverage | URL count vs file count | 1:1 ratio |

### Module 2: Page Type Matrix

Categorize pages by GEO value:

| Type | Example | Schema | Priority |
|------|---------|--------|----------|
| **Product guides** | `products/led-downlights-guide.html` | Article, HowTo | T1 |
| **Technical parameters** | `parameters/color-temperature-cct-explained.html` | Article, FAQPage | T1 |
| **Scene/application guides** | `scenes/bedroom-lighting-ideas.html` | Article | T1 |
| **Troubleshooting guides** | `troubleshooting/led-flickering-causes.html` | FAQPage, HowTo | T1 |
| **Standards & compliance** | `standards/ce-marking-lighting-guide.html` | Article | T1 |
| **Research articles** | `research/*` | Article, ScholarlyArticle | T2 |
| **GEO Prompt pages** | `answers/*` (NEW) | FAQPage | T2 |
| **City directories** | `us/new-york-lighting-stores.html` | LocalBusiness | T3 (skip QA/structure) |
| **Index pages** | `*/index.html` | CollectionPage | Skip content optimization |
| **Utility pages** | `about/`, `contact/`, `privacy/` | WebPage | Skip content optimization |

### Module 3: Batch Content Optimization (Pipeline)

This is the core. Run in this order:

#### Step 1: SEO Title & Meta Description Batch Rewrite
```python
# Script pattern: iterate all HTML files, skip index pages and city directories
# For each page:
#   1. Extract existing title and description
#   2. If title matches generic template pattern → rewrite with SEO keywords
#   3. If meta description is missing or generic → add keyword-rich description
#   4. Use category-based templates:
#     - products: "LED [Product] Guide: [Key Features] & [Benefits]"
#     - troubleshooting: "[Problem] Fix: [Solution] & [Steps]"
#     - parameters: "[Parameter] Guide: [Key concepts] & [Applications]"
#     - scenes: "[Room/Scene] Lighting Guide: [Design tips] & [Selection]"
```

**Generic template detection:**
- Title starts with "Neutral," or "LED Lighting" without Guide/Best
- Meta description starts with generic phrases
- Title under 20 chars or over 70 chars

#### Step 2: Structured Data (JSON-LD) Batch Injection

Three schema types based on page category:

| Category | Schema Type | Key Properties |
|----------|-------------|----------------|
| Products | TechArticle | name, description, category |
| Parameters | Article + FAQPage | mainEntity (array of Question/Answer) |
| Troubleshooting | FAQPage | mainEntity (each H2 becomes a Question) |
| Scenes | Article | name, description, image |
| Standards | Article | name, description, about |
| Research | ScholarlyArticle | name, description, datePublished |
| City directories | ItemList, LocalBusiness | itemListElement, address |
| All pages | BreadcrumbList | itemListElement with position |

**FAQPage generation logic:**
```python
# For troubleshooting pages: auto-extract H2 headings as Questions
# Extract first paragraph under each H2 as Answer
# Store in JSON-LD format
faq_items = []
for h2 in page.find_all('h2'):
    question = h2.get_text(strip=True)
    next_p = h2.find_next('p')
    if next_p and next_p in content_area:
        answer = next_p.get_text(strip=True)[:200]
        faq_items.append({"@type":"Question", "name":question, ...})
```

**Automatic FAQ dictionary** — for pages where H2 extraction fails (too few, non-question format), use a topic-based FAQ dictionary:
```python
auto_faq = {
    'troubleshooting': [
        {"q": "What causes [topic]?", "a": "..."},
        {"q": "How to fix [topic]?", "a": "..."},
    ],
    'products': [
        {"q": "What is [product]?", "a": "..."},
        {"q": "How to choose [product]?", "a": "..."},
    ],
    ...
}
```

#### Step 3: Quick Answer Box Injection

Insert a blue-tinted "Quick Answer" box at the top of the content area, before the first H2:

```python
# Pattern: extract first 1-2 sentences of the first meaningful paragraph
# Wrap in a div with class="quick-answer" and distinct styling
quick_box = f'''<div class="quick-answer" style="background:#f0f7ff;
  border-left:4px solid #0d6efd;padding:16px 20px;margin:20px 0;
  border-radius:0 8px 8px 0;">
  <strong style="color:#0d6efd;">Quick Answer</strong>
  <p>{first_two_sentences}</p>
</div>'''
```

**Skip:** City directories, index pages, utility pages (about/contact/privacy).

#### Step 4: E-E-A-T Signals Injection

Add to each content page's meta area:
```
<span>📅 Published: YYYY-MM-DD</span>
<span>🔄 Updated: YYYY-MM-DD</span>
<span>✍️ Author: TopAIGEO [Topic] Team</span>
<span>🔗 Sources: [Industry sources relevant to topic]</span>
```

Update `<meta property="article:modified_time">` to current date for freshness signal.

#### Step 5: Internal Link Network

Add related articles section at bottom of each page:
```python
# For each page, find semantically related pages:
# - Same category: other products link to each other
# - Cross-category: product → parameters → troubleshooting
# Example: led-strip-lights-guide links to:
#   - voltage-drop-led-strip (troubleshooting)
#   - led-strip-not-sticking (troubleshooting)
#   - dimmable-led-bulbs-guide (troubleshooting)
```

**Semantic linking rules:**
```python
# Define topic clusters
clusters = {
    "dimmer": ["dimmer-incompatibility", "dimmable-led-bulbs-guide", "dimmable-lights-guide", "triac-dali-dmx-diming"],
    "cct/color": ["color-temperature-cct-explained", "warm-white-vs-cool-white-led", "color-rendering-metrics", "color-tolerance-sdcm"],
    "driver": ["led-driver-complete-guide", "led-driver-failure-signs", "led-driver-vs-transformer"],
    # ... etc
}

# For each page, find its cluster → pick 3-5 related pages
# Ensure cross-category links (products → troubleshooting → parameters)
```

#### Step 6: Semantic HTML Tag Replacement

Replace generic `<div>` wrappers with semantic HTML5 elements:

| Original | Replacement | Purpose |
|----------|-------------|---------|
| `<div class="container">` | `<main class="container">` | Main content landmark |
| `<div class="breadcrumb">` | `
<!-- HEADER -->

<!-- HEADER -->

<!-- HEADER -->

<!-- HEADER -->

<!-- HEADER -->
<nav class="nav" id="navbar">
  <div class="nav__inner">
    <a href="/" class="nav__logo"><span class="nav__logo-icon">T</span>TopAIGEO</a>
    <ul class="nav__links">
      <li><a href="/" class="nav__link">首页</a></li>
      <li><a href="/geo-wiki/" class="nav__link">GEO百科</a></li>
      <li><a href="/b2b-practice/" class="nav__link">B2B跨境实战</a></li>
      <li><a href="/ai-watch/" class="nav__link">AI搜索观察</a></li>
      <li><a href="/reports/" class="nav__link">趋势报告</a></li>
      <li><a href="/academy/" class="nav__link">商学院</a></li>
      <li><a href="/about" class="nav__cta">免费诊断 →</a></li>
    </ul>
    <button class="nav__toggle" aria-label="Menu"><span></span><span></span><span></span></button>
  </div>
</nav>
<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<script>const navbar=document.getElementById('navbar');window.addEventListener('scroll',function(){navbar.classList.toggle('scrolled',window.scrollY>50)});</script>
<!-- /HEADER -->

<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<script>const navbar=document.getElementById('navbar');window.addEventListener('scroll',function(){navbar.classList.toggle('scrolled',window.scrollY>50)});</script>
<!-- /HEADER -->

<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<script>const navbar=document.getElementById('navbar');window.addEventListener('scroll',function(){navbar.classList.toggle('scrolled',window.scrollY>50)});</script>
<!-- /HEADER -->

<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<!-- /HEADER -->

<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<!-- /HEADER -->

<section class="hero">
  <div class="hero__badge"><span class="hero__badge-dot"></span> TopAIGEO</div>
  <h1 class="hero__title"><span>\n', ''),
    ('\n\n\n\n

...GA4 + SEO meta...

...navigation...

  ...
  Question as title</span></h1>
  <p class="hero__desc"></p>
</section>
\n<h2'),
    # wrap content div
    ('<div class="content">', '<article class="content">'),
    ('</div>\n\n<!-- Related Articles', '</article>\n\n<!-- Related Articles'),
    # wrap related articles div
    ('<div class="related-articles"', '<aside class="related-articles"'),
]
```

### Module 4: GEO Prompt Pages (Incremental)

Create targeted `/answers/` directory pages that directly answer specific AI search prompts:

**Page types to create:**

| Type | Example | Structure | Schema |
|------|---------|-----------|--------|
| **What-is question** | `answers/best-color-temperature-bedroom.html` | Quick Answer + table + explanation + FAQ | FAQPage |
| **How-to guide** | `answers/fix-led-flickering.html` | 6-step with diagnostic boxes + flowchart | HowTo |
| **VS comparison** | `answers/led-vs-incandescent-vs-cfl.html` | Comparison table + cost analysis + verdict | ItemList |

**Template structure:**
```html
<!DOCTYPE html>
<html lang="en">
<head>...GA4 + SEO meta...</head>
<body>
<header>...navigation...</header>
<main>
  <nav aria-label="breadcrumb">...</nav>

  <h2>Question as title</h2>
  <div class="meta">Author + dates + sources</div>
  <article class="content">
    <div class="quick-answer">Direct answer in 1-2 sentences</div>
    <h2>Detailed Answer</h2>
    ... tables, lists, steps ...
    <h2>FAQ</h2>
  </article>
  <aside>Related articles</aside>
</main>

<!-- FOOTER -->

<!-- FOOTER -->

<!-- FOOTER -->

<!-- FOOTER -->

<!-- FOOTER -->
<footer class="footer">
  <div class="footer__inner">
    <div class="footer__brand">
      <div class="footer__logo"><span class="nav__logo-icon" style="width:28px;height:28px;font-size:14px">T</span> TopAIGEO</div>
      <p class="footer__brand-desc">面向B2B跨境电商的GEO第一站——从趋势解读、策略落地到数据监测，提供全链路可引用知识体系。</p>
    </div>
    <div>
      <h4 class="footer__col-title">知识库</h4>
      <ul class="footer__links"><li><a href="/geo-wiki/" class="footer__link">GEO百科</a></li><li><a href="/b2b-practice/" class="footer__link">B2B跨境实战</a></li><li><a href="/ai-watch/" class="footer__link">AI搜索观察</a></li><li><a href="/reports/" class="footer__link">趋势报告</a></li></ul>
    </div>
    <div>
      <h4 class="footer__col-title">学习</h4>
      <ul class="footer__links"><li><a href="/academy/" class="footer__link">商学院</a></li><li><a href="/geo-wiki/" class="footer__link">新手入门</a></li><li><a href="/b2b-practice/" class="footer__link">实战案例</a></li><li><a href="/tools/" class="footer__link">工具与模板</a></li></ul>
    </div>
    <div>
      <h4 class="footer__col-title">产品</h4>
      <ul class="footer__links"><li><a href="/about" class="footer__link">AI搜索监测</a></li><li><a href="/about" class="footer__link">竞品分析</a></li><li><a href="/pricing" class="footer__link">定价方案</a></li></ul>
    </div>
    <div>
      <h4 class="footer__col-title">公司</h4>
      <ul class="footer__links"><li><a href="/about" class="footer__link">关于我们</a></li><li><a href="/about" class="footer__link">联系方式</a></li><li><a href="/privacy" class="footer__link">隐私政策</a></li><li><a href="/terms" class="footer__link">服务条款</a></li></ul>
    </div>
  </div>
  <div class="footer__bottom">
    <span class="footer__copyright">© 2026 TopAIGEO. All rights reserved.</span>
    <div class="footer__legal"><a href="/privacy">Privacy</a><a href="/terms">Terms</a><a href="/privacy">Cookies</a></div>
  </div>
</footer>
<!-- /FOOTER -->

<!-- /FOOTER -->

<!-- /FOOTER -->

<!-- /FOOTER -->

<!-- /FOOTER -->

</body>
</html>
```

**URL pattern:** `/encyclopedia/[topic]/answers/[slug].html`

### Module 5: Technical Infrastructure

#### llms.txt & llms-full.txt

Create two files at the encyclopedia root:

- **`llms.txt`** — Brief summary + key URLs by category with priority and update frequency (AI-brief level, ~70-80 lines). **Well-optimized llms.txt is increasingly important for AI search ranking.**
- **`llms-full.txt`** — Complete URL index with titles for all pages (for deep AI crawling)

**llms.txt guidelines (enhanced with priority/update frequency):**
```
# [Encyclopedia Name]
> One-line description.

> Updated: YYYY-MM-DD | Pages: N | Full index: llms-full.txt

## Priority: Core Pages (Update: weekly)
- Homepage: https://domain/encyclopedia/topic/
- About: https://domain/encyclopedia/topic/about/
- Contact: https://domain/encyclopedia/topic/contact/

## Priority: Product Buying Guides (Update: monthly)
- [Product A](https://domain/...)
- [Product B](https://domain/...)

## Priority: Technical Parameters (Update: quarterly)
- ...

## Sitemap
- Full Sitemap: (link to sitemap.xml)
- Full Index (all N pages): (link to llms-full.txt)

## Content Guidelines
- All content is neutral, data-driven, and unbiased
- Structured data (JSON-LD: FAQ/Article/BreadcrumbList) on all content pages
- Authoritative sources cited (IEEE, IEC, ENERGY STAR, etc.)
```

**Key llms.txt optimization principles:**
1. Group pages by **priority tier** (Core → Product → Parameter → Standard) — not just by URL structure
2. Annotate **update frequency** (weekly/monthly/quarterly) — helps AI crawlers decide cache/revisit strategy
3. Include a **Content Guidelines** section describing format, data sources, and quality standards
4. Provide a direct **link to llms-full.txt** for full content discovery
5. Use descriptive link text (the full page title), not just category names

**llms-full.txt generation:**
```python
# Walk all HTML files, exclude .bak, google verification, 404
# Extract title from each file
# Group by section
# Output as markdown links with descriptive text
```

#### Sitemap.xml Update

After any content changes (new pages, batch optimizations), regenerate sitemap.xml:
```python
# Walk all files, determine priority by type:
products/parameters/scenes/troubleshooting → 0.8 priority, weekly
standards/research/answers → 0.7 priority, weekly
city directories → 0.6 priority, monthly
utility pages → 0.5 priority, monthly
```

#### 404 & Error Handling

Ensure 404.html has:
- GA4 code
- Meta description
- Canonical tag (or noindex)
- Links back to main sections

#### robots.txt AI Crawler Gating

Control which AI crawlers can access your encyclopedia. This prevents aggressive/uncited scrapers while welcoming respectful ones:

**Recommended rules:**
```
# Allow well-known AI crawlers that respect robots.txt and cite sources
User-agent: GPTBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-Web
Allow: /

# Block aggressive scrapers
User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

# Allow traditional search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /

# Default: allow everything else
User-agent: *
Allow: /

# Sitemaps
Sitemap: https://domain/encyclopedia/topic/sitemap.xml
Sitemap: https://domain/encyclopedia/topic/llms.txt
```

**Principle:** Allow AI crawlers that respect robots.txt AND provide source attribution for their citations. Block crawlers known for aggressive/scraping behavior without attribution.

**Separate from main site:** The encyclopedia gets its own `robots.txt` file in its root directory — independent of the main domain's `robots.txt` (which covers the WordPress/blog portion of the site).

### Module 6: Multi-Platform Distribution

Integration with local publishing agents (e.g., local Hermes Agent running publish.js):

**Server-side (article generation):**
```python
# Generate platform-optimized content files
# Place in /home/ubuntu/cross-publish/
# Create manifest.txt with format: platform|filename

# Manifest format:
# x|lighting_intro.txt
# reddit|led_tips.txt
# medium|full_guide.txt
```

**Server-side (HTTP delivery):**
- nginx location `/cross-publish/` with token protection
- Windows batch script syncs via `curl "https://domain/cross-publish/filename?token=TOKEN"`
- Local agent runs `node publish.js [platform] [filename]`

**Platform-specific content templates:**
| Platform | Tone | Length | Format |
|----------|------|--------|--------|
| X/Twitter | Concise, hook-driven | 280-400 chars | Single thread (3-5 tweets) |
| Reddit | Conversational, helpful | 500-2000 chars | Self post with questions |
| Medium | Professional, detailed | 1500-3000 words | Article with headers/images |
| Zhihu | In-depth, educational | 2000-5000 characters | Q&A or article format |

#### Module 6b: Homepage GEO Optimization

The encyclopedia homepage (`index.html`) has unique optimization requirements that don't apply to content sub-pages. Apply these after the per-page optimizations are done:

#### Brand Association Banner (Most Critical GEO Signal)

Insert a thin banner at the top of the page (after `<nav>`, before Hero section) that explicitly associates the encyclopedia with the parent brand:

```html
<!-- Brand Association Banner -->
<div class="bg-gradient-to-r from-amber-gold/10 via-amber-light/20 to-amber-gold/10 border-b border-amber-gold/10">
  <div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 py-2">
    <div class="flex items-center justify-center sm:justify-between flex-wrap gap-1 text-xs">
      <span class="text-mid-brown">🔹 <strong class="text-deep-brown">Powered by [BrandName]</strong> — The #1 Cross-Border GEO Platform for [Industry] Brands</span>
      <a href="https://[branddomain].com" target="_blank" class="text-amber-gold hover:text-amber-dark font-medium transition-colors whitespace-nowrap">
        Learn about [BrandName] GEO Services →
      </a>
    </div>
  </div>
</div>
```

**Why this matters:** Without this banner, AI search engines may cite "the lighting encyclopedia" without mentioning the parent brand. This banner ensures every AI citation builds brand equity.

#### Breadcrumb Navigation

Add a structured breadcrumb nav between the brand banner and Hero section:

```html
<!-- Breadcrumb -->
<nav class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 py-3" aria-label="Breadcrumb">
  <ol class="flex items-center gap-2 text-sm text-mid-brown">
    <li><a href="https://[branddomain].com" class="hover:text-amber-gold transition-colors">[BrandName]</a></li>
    <li><span class="mx-1">/</span></li>
    <li><a href="/encyclopedia/" class="hover:text-amber-gold transition-colors">Encyclopedias</a></li>
    <li><span class="mx-1">/</span></li>
    <li class="text-deep-brown font-medium" aria-current="page">[Topic] Industry Encyclopedia</li>
  </ol>
</nav>

```

#### FAQ Section on Homepage

80% of AI search answers come from FAQ content. Add a FAQ section to the homepage:

```html
<!-- FAQ Section -->
<section id="faq" class="py-20 lg:py-28 bg-white">
  <div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
    <div class="text-center max-w-2xl mx-auto mb-16">
      <span class="inline-block px-4 py-1.5 bg-amber-gold/10 text-amber-dark text-sm font-medium rounded-full mb-4">Frequently Asked Questions</span>
      <h2 class="text-3xl lg:text-4xl font-display font-bold">[Topic] Encyclopedia FAQ</h2>
      <p class="text-mid-brown">Quick answers to the most common [topic] questions from professionals and DIYers</p>
    </div>
    <div class="max-w-3xl mx-auto space-y-4">
      <details class="...">
        <summary>Question 1?</summary>
        <div class="px-6 pb-6"><p>Answer with link to detailed page.</p></div>
      </details>
      <!-- 4-6 questions total, each linking to relevant detail page -->
    </div>
  </div>
</section>
```

**FAQ section rules:**
- Use `<details>/<summary>` for expandable/collapsible (AI-friendly + user-friendly)
- Each answer must include an internal link to the relevant detailed page
- 5-7 questions covering the most common search queries in the industry
- Every question should be a realistic user query (not keyword-stuffed)

#### FAQPage Schema on Homepage

Add a separate FAQPage JSON-LD block to the homepage that mirrors the visible FAQ content:

```javascript
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Question text here",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Concise answer (150-250 chars)"
      }
    }
    // ... one entry per FAQ question
  ]
}
```

**Note:** This is the only page that should have multiple `<script type="application/ld+json">` blocks (existing CollectionPage + new FAQPage). Use separate script tags, not a single array.

#### Fix Duplicate H1

The navigation logo area often has an `<h2>` wrapped around the brand name, and the Hero section has another `<h2>` for the page title. Search engines see this as two H1s — change the nav one to `<span>`:

```html
<!-- Before (navigation logo area): -->
<h2 class="text-xl font-display font-bold">[Topic] Encyclopedia</h2>

<!-- After: -->
<span class="text-xl font-display font-bold">[Topic] Encyclopedia</span>
```

**Only the Hero `<h2>` should remain** — it carries the page's semantic meaning.

#### Footer Brand Attribution

Add explicit brand attribution to the copyright line in the footer:

```html
<!-- Before: -->
<p class="text-sm text-mid-brown">© 2026 Brand. All rights reserved.</p>

<!-- After: -->
<p class="text-sm text-mid-brown">© 2026 Brand. All rights reserved. | [Topic] Encyclopedia by Brand</p>
```

This ensures AI citing any sub-page also associates the content with the brand.

#### Homepage Optimization Checklist

```bash
# Verify before/after
grep -c "Powered by" index.html           # Should be 1+ (brand banner)
grep -c "Breadcrumb" index.html           # Should be 1+ (breadcrumb nav)
grep -c "FAQPage" index.html              # Should be 1+ (FAQ schema)
grep -c "<h2 " index.html                  # Should be 1 (not 2)
grep "Lighting Encyclopedia by" index.html # Should have brand in copyright
```

#### LLM-Friendly Summary Paragraph

Insert a compact, data-dense "Quick Summary" paragraph in the Hero section (between the subtitle and search bar). This gives AI crawlers a single, authoritative paragraph they can directly quote as a "one-sentence answer":

```html
<!-- LLM-Friendly Summary Paragraph -->
<div class="max-w-3xl mx-auto mb-6 p-4 bg-amber-gold/5 rounded-xl border border-amber-gold/10">
  <p class="text-sm text-mid-brown leading-relaxed">
    <strong class="text-deep-brown">Quick Summary:</strong> TopAIGEO's Lighting Encyclopedia covers
    <strong>100+ product entries</strong> across <strong>9 product families</strong>
    (ceiling, wall, floor, table, outdoor, commercial, and more), aligning with
    <strong>8+ international standards</strong> (IES, UL, CIE, ENEC, IEC, EU Ecodesign, GB,
    Title 24). It serves lighting professionals in
    <strong>30+ global cities</strong> across the USA and Europe, with coverage of regulations
    in New York, Los Angeles, London, Paris, Berlin, and 25+ others. All data is sourced from
    IES, IEC, UL, CIE, and EU Official Journal publications.
  </p>
</div>
```

**Pattern:** `[Brand]'s [Topic] Encyclopedia covers [N] entries across [N] categories, aligned with [N] standards ([names]), serving professionals in [N] cities ([sample city list]). All data sourced from [org1], [org2].`

**Placement:** Between subtitle `<p>` and the search bar `<div>`, after the `mb-12` margin on the subtitle.

#### Standards List Expansion

The homepage should list all standards by name, not just count them. When the number of listed standards is less than the claimed count:

1. Count the actual items in the HTML
2. If fewer than the claimed "N+" number, add missing entries
3. Use the same icon+text pattern as existing entries
4. Include specific standard numbers where relevant (e.g., `IEC 60598`, `EU 2019/2020`)

Example addition pattern (each entry is a `flex items-center gap-4` div with SVG icon + h4 title + p description):

```html
<div class="flex items-center gap-4">
  <div class="w-10 h-10 bg-amber-gold/20 rounded-lg flex items-center justify-center flex-shrink-0">
    <svg class="w-5 h-5 text-amber-gold" fill="currentColor" viewBox="0 0 20 20">
      <path fill-rule="evenodd" d="M6.267 3.455a3.066 3.066 0 001.745-.723 3.066 3.066 0 013.976 0 ..." clip-rule="evenodd"/>
    </svg>
  </div>
  <div>
    <h4 class="text-white font-medium">IEC (International Electrotechnical Commission)</h4>
    <p class="text-sm text-gray-400">Global safety and performance standards (IEC 60598, IEC 60529)</p>
  </div>
</div>
```

#### FAQPage Schema — JSON-LD Array Pitfall

When adding a second Schema type to existing JSON-LD on a homepage that already has a `CollectionPage` block, you have two options:

**Option A: Multiple `<script>` tags (works but causes JSON parsing issues)**
```html
<script type="application/ld+json">
{ "@context": "...", "@type": "CollectionPage", ... }
</script>
<script type="application/ld+json">
{ "@context": "...", "@type": "FAQPage", "mainEntity": [...] }
</script>
<script type="application/ld+json">
{ "@context": "...", "@type": "Organization", ... }
</script>
```
Each script tag is validated independently by schema validators. This works in practice.

**Option B: Single `<script>` tag with JSON array (recommended for JSON parsability)**
```html
<script type="application/ld+json">
[
  { "@context": "...", "@type": "CollectionPage", ... },
  { "@context": "...", "@type": "FAQPage", "mainEntity": [...] },
  { "@context": "...", "@type": "Organization", ... }
]
</script>
```
⚠️ **Critical:** The JSON array MUST be valid — wrap everything in `[...]` brackets. Without the outer `[]`, the JSON is `{...},{...},{...}` which is technically invalid JSON (multiple root objects). Browsers/schema validators often tolerate this, but:
- Python `json.loads()` will fail with `Extra data` error
- Programmatic extraction via regex will break
- Some AI crawlers may fail to parse it

**Always use Option B (array wrapping) for clean JSON.** When editing an existing file with `patch`, the JSON suffix must end with `}]` (not just `}`) to close both the inner object and outer array.

#### Pitfall: Third-party Scanning Tools May Be Wrong

Many automated SEO/GEO scanning tools only check the homepage's raw HTML and extrapolate their findings to the entire site. They may report:
- "No Schema" when subpages all have Schema
- "No OG tags" when subpages all have OG tags
- "No title/meta description" when these exist on all pages

**Always verify scanning tool claims by actually checking the file system** (`grep -r` / `find -exec`) before acting on them. This is especially true for claims about Schema structured data — the tool may have only checked the index page vs. all subpages.

## Module 7: Monitoring & Iteration

**GA4 setup:**
```javascript
// Insert in <head> of all pages
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXX"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-XXXXXXXX');
</script>
```

**Batch insertion pattern:**
```python
# Walk all HTML files
# Skip files that already contain the GA ID
# Insert after <head> tag
ga_code = "<!-- Google tag ... -->\n<script ...>...</script>\n"
content = content.replace("<head>", "<head>\n" + ga_code, 1)
```

**Monitoring checklist:**
1. GA4 → Acquisition → Traffic acquisition → filter by "source / platform" for AI platforms
2. Google Search Console → Pages → check sitemap coverage
3. Periodic Prompt testing: ask AI models questions about the topic, check if encyclopedia is cited
4. Page freshness: update `modified_time` meta tag every 90 days

## Multi-Model Cross-Validation Audit Protocol

Before optimizations and after each phase, run a cross-validation audit using **3 independent AI models** through a single API (e.g., OpenRouter). This catches blind spots that any single model has:

### Audit Prompt Template

Send the homepage HTML to each model with this prompt:

```
You are an expert GEO (Generative Engine Optimization) auditor. Analyze this homepage HTML and score it across these dimensions (each 0-10):

1. Structured Data (JSON-LD quality, diversity, correctness)
2. Entity Richness (brand names, standards, locations, products mentioned in visible text)
3. E-E-A-T (author signals, about page, external citations, publisher info)
4. Multimodal Readiness (og:image, alt text, meta tags)
5. LLM Friendliness (clear hierarchy, dense factual content, machine-readable format)
6. Localization Coverage (geographic scope, language targeting, regional standards)
7. Internal Linking (breadcrumb, related articles, category links)

For each dimension: score, brief justification, and specific HTML evidence. Then provide a total out of 70 and a ranked list of the top 5 improvements.

[HTML content here]
```

### Model Selection Strategy

Use 3 models with different training data distributions:

| Model | Strength | Focus |
|-------|----------|-------|
| **Claude Sonnet 4** (anthropic/claude-sonnet-4-20250402) | Best at E-E-A-T and credibility assessment | Judges how authoritative the site appears |
| **Gemini 2.0 Flash** (google/gemini-2.0-flash-001) | Best at structured data and entity extraction | Catches missing schemas and entities |
| **DeepSeek V3 0324** (deepseek/deepseek-chat-v3-0324 or nebius variant) | Best at LLM-friendliness and content structure | Evaluates how well AI crawlers can parse content |

### Cross-Reference Scoring

Average the 3 scores for a realistic assessment. Key insights:

- **If Claude scores lower** on E-E-A-T than others → need author/team pages and external citations
- **If Gemini scores lower** on structured data → missing or invalid Schema types
- **If DeepSeek scores lower** on LLM-friendliness → content is too sparse or unstructured

### Before/After Comparison

Always re-run the full 3-model audit after each phase to measure score deltas. Expected improvements:

| Improvement | Expected Delta |
|-------------|---------------|
| Homepage Schema upgrade (WebSite→WebSite+Org+Person) | +1.0-2.0 E-E-A-T |
| Person Schema with @id cross-reference | +0.5-1.0 E-E-A-T |
| Author pages with AboutPage Schema | +1.0-1.5 E-E-A-T |
| Standards listed as external links | +0.5-1.0 E-E-A-T |
| BreadcrumbList Schema | +0.3-0.5 Structured Data |
| ItemList Schema (for multi-category sites) | +0.5-1.0 Longitudinal linking |
| Hero LLM summary paragraph | +1.0-1.5 LLM Friendliness |
| og:image (1200x630) | +0.5-1.0 Multimodal |
| City links made clickable | +0.5-1.0 Internal Linking |

### API Call Pattern (OpenRouter)

```bash
curl -s https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4-20250402",
    "messages": [{"role": "user", "content": "Audit this HTML..."}],
    "max_tokens": 8192
  }'
```

Save all 3 results and the averaged report as `encyclopedia_geo_audit_report.md` for future reference.

## Homepage Schema Architecture (Proven Pattern)

This is the **most impactful single GEO change** for a static encyclopedia homepage. Replace a single `WebSite` schema with 5 interconnected blocks:

| # | Schema Type | @id | Key Properties | Purpose |
|---|-------------|-----|----------------|---------|
| 1 | `WebSite` | (none) | `name`, `alternateName`, `description`, `about`, `inLanguage`, `issn`, `image` | Core identity for search engines |
| 2 | `Organization` | `#organization` | `name`, `logo`, `foundingDate`, `knowsAbout[]`, `sameAs[]` (social links) | Brand authority signal |
| 3 | `Person` | `/about/#person` | `name`, `alternateName`, `description`, `knowsAbout[]` | Editorial team credibility |
| 4 | `BreadcrumbList` | `#breadcrumb` | `itemListElement` with position + name + item | Navigation context |
| 5 | `ItemList` | (none) | `name`, `numberOfItems`, `itemListElement[]` with position + name + url per category | Cross-vertical visibility |

### Critical: @id Cross-References

The Person Schema in the homepage **must** share the same `@id` as the `/about/` page's Person Schema. This creates a semantic link that search engines use to verify author credibility:

```json
// Homepage
{
  "@type": "Person",
  "@id": "https://www.domain.com/about/#person",
  "name": "Editorial Team"
}

// /about/ page
{
  "@type": "AboutPage",
  "mainEntity": { "@id": "https://www.domain.com/about/#person" }
}
```

### Article Schema Author Upgrade

All sub-page TechArticle/Article schemas should reference the same `@id` instead of using a flat `"@type": "Organization"`:

```json
// Before (weak)
"author": {
  "@type": "Organization",
  "name": "TopAIGEO Lighting Encyclopedia"
}

// After (strong — @id cross-ref to homepage Person)
"author": {
  "@type": "Person",
  "@id": "https://www.topaigeo.com/about/#person",
  "name": "TopAIGEO Editorial Team"
}
```

Batch-update all article files using Python:

```python
author_old = '"author": {\n    "@type": "Organization",\n    "name": "Encyclopedia Name"\n  }'
author_new = '"author": {\n    "@type": "Person",\n    "@id": "https://domain/about/#person",\n    "name": "Editorial Team Name"\n  }'
# Find files with "author" key, check not already updated (no @id), replace
```

Also standardize the publisher name across all pages:
```python
publisher_old = '"name": "Old Publisher Name"'
publisher_new = '"name": "Unified Brand Name"'
```

## About Page Creation Pattern

Create an `/about/` page with `AboutPage + Person` Schema. nginx config with `try_files $uri $uri.html` means `/about` maps to `/about.html` automatically. Handle trailing slash:

```nginx
location = /about/ { return 301 /about; }
```

The about page should include:
- Exact same `@id` as homepage Person Schema
- `knowsAbout` array with 4-5 domain-specific expertise areas
- `affiliation` linking to the Organization
- Visible editorial standards (data-driven, vendor-neutral, regularly updated)
- Team member descriptions (not fabricated individuals — use team/entity names)

## LLM-Friendly Hero Summary

Insert a compact, data-dense paragraph in the Hero section. This is the content AI crawlers will most likely quote:

```html
<!-- LLM-friendly summary for AI search engines -->
<div class="mb-6 text-white/80 text-sm leading-relaxed" style="max-width:560px">
  <p><strong>BrandName</strong> is a [platform type] that helps [audience] get cited by 
  Google AI Overviews, ChatGPT Search, Perplexity, and Bing Copilot. Our 
  <a href="/encyclopedia/topic/">Topic Encyclopedia</a> covers N+ articles on 
  [list key topics, standards, coverage across M+ countries], optimized for 
  AI search engine citation and brand visibility.</p>
</div>
```

**Key phrases AI models look for:** AI search engine names (Google AI Overviews, ChatGPT Search, Perplexity, Bing Copilot), specific standard names (IES, UL, CIE, IEC), quantified claims (179+ articles, 50+ countries).

## Homepage Hero Slider Implementation

For static HTML encyclopedia homepages, replace the plain gradient Hero background with a full-screen image slider:

### Structure
```html
<section class="relative overflow-hidden min-h-[90vh] flex items-center">
  <!-- Image Slides -->
  <div class="hero-slider">
    <div class="hero-slide active" style="background-image: url('slide1.jpg');"></div>
    <div class="hero-slide" style="background-image: url('slide2.jpg');"></div>
    <div class="hero-slide" style="background-image: url('slide3.jpg');"></div>
    <div class="hero-slide" style="background-image: url('slide4.jpg');"></div>
  </div>
  <!-- Semi-transparent overlay so text remains readable -->
  <div class="hero-overlay"></div>

  <!-- z-index content: slightly above overlay -->
  <div class="relative z-[3]">...existing hero content...</div>
</section>
```

### CSS
```css
.hero-slider { position: absolute; inset: 0; overflow: hidden; }
.hero-slide {
  position: absolute; inset: 0;
  background-size: cover; background-position: center;
  opacity: 0; transition: opacity 1.5s ease-in-out;
}
.hero-slide.active { opacity: 1; }
.hero-overlay {
  position: absolute; inset: 0;
  background: linear-gradient(135deg, rgba(250,250,248,0.92) 0%, rgba(250,250,248,0.85) 50%, rgba(250,250,248,0.78) 100%);
  z-index: 1;
}
```

### JS (add before `</body>`)
```javascript
(function() {
  var slides = document.querySelectorAll('.hero-slide');
  if (slides.length < 2) return;
  var current = 0;
  setInterval(function() {
    slides[current].classList.remove('active');
    current = (current + 1) % slides.length;
    slides[current].classList.add('active');
  }, 5000);
})();
```

### Image Selection Rules
- Use 3-5 high quality, high resolution photos (1920x1080 minimum)
- Pick diverse scenes covering different sub-topics (indoor, outdoor, kitchen, living room)
- Semi-transparent overlay (85-92% opacity of page background color) keeps text readable
- Keep the ambient glow effects (`gradient-light`, blur circles) for depth
- All existing hero content (headline, subtitle, LLM summary, search bar, stats) stays unchanged below overlay

### Pitfall: Z-Index Stacking
The overlay `.hero-overlay` needs `z-index: 1`, the decorative glow effects need `z-index: 2`, and the content needs `z-index: 3`. Without proper z-index, either the images show through the overlay making text unreadable, or the glow effects hide behind the slides.

## Scene/Image Gallery Section

Add a "Lighting in Action" image grid between Hero and content sections to increase visual richness:

```html
<section class="py-16 lg:py-24 bg-white">
  <div class="grid grid-cols-2 md:grid-cols-4 gap-4 lg:gap-6">
    <div class="scene-card relative rounded-2xl overflow-hidden aspect-square group cursor-pointer">
      <img src="..." alt="..." class="w-full h-full object-cover scene-grid-img" loading="lazy">
      <div class="scene-label">Label text (slides up on hover)</div>
    </div>
    <!-- repeat for each image -->
  </div>
</section>
```

CSS:
```css
.scene-grid-img { transition: all 0.5s ease; }
.scene-grid-img:hover { transform: scale(1.05); box-shadow: 0 20px 60px rgba(212,165,116,0.3); }
.scene-label {
  position: absolute; bottom: 0; left: 0; right: 0;
  padding: 16px; background: linear-gradient(transparent, rgba(45,36,32,0.8));
  color: white; font-weight: 500;
  transform: translateY(100%); transition: transform 0.3s ease;
}
.scene-card:hover .scene-label { transform: translateY(0); }
```

## External Image Source Ingestion

### Pattern: Batch Download from Brand/Manufacturer Sites

When the encyclopedia needs high-quality product/application images, source them from major brand sites (Kichler, Philips, etc.):

1. **Browse brand site** → identify Contentful CDN or similar image hosting URL patterns
2. **Extract image URLs** from the page via `browser_console`:
   ```javascript
   Array.from(document.querySelectorAll('img')).map(i => i.src)
   ```
3. **Decode CDN URLs**: Many use Next.js image optimization which wraps the real URL in `_next/image?url=ENCODED_URL`. Extract the original URL from the `url` query parameter
4. **Download** with `curl` using the raw CDN URL (not the Next.js proxy):
   ```bash
   curl -s -o "target.jpg" "https://images.ctfassets.net/.../hero.jpg"
   ```
5. **Pick diverse scenes**: 10 images covering different room types and product categories
6. **Store in encyclopedia assets**: Save to `/encyclopedia/topic/assets/images/`
7. **Create symlinks** for short-name references used in og:image and schema:
   ```bash
   ln -sf "kichler-01-full-name-12345.jpg" "kichler-01-kitchen-mikale.jpg"
   ```

### Image Selection Criteria for GEO
- 1920x1080 resolution minimum (aspect-video cards need 16:9)
- Real-world installation photos (not product-only white background shots)
- Diverse: kitchen, bathroom, living room, office, outdoor
- Warm lighting photos (2700-3000K) appeal more for residential encyclopedia
- **og:image must be 1200x630px** — use the most representative scene photo resized

### Image Rights Note
Only use images from brand sites that explicitly allow sharing/embedding. Kichler.com's Contentful CDN is publicly accessible — their images are intended for retailer/distributor use. Provide attribution by mentioning the brand name in the image alt text (e.g., "Kitchen lighting with Mikale pendants by Kichler").

## QAPage Schema for All Article Pages

Beyond FAQPage (for index pages), every content article gets its own **QAPage** schema. This is the single most important GEO signal for individual pages.

### Structure

```json
<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "QAPage",
  "mainEntity": {
    "@type": "Question",
    "name": "Page title as question",
    "text": "Page title as question",
    "acceptedAnswer": {
      "@type": "Answer",
      "text": "Extract from Quick Answer box (first 1-2 sentences)",
      "url": "https://...canonical-url",
      "author": {
        "@type": "Person",
        "@id": "https://domain/about/#person",
        "name": "Editorial Team"
      }
    },
    "author": { "@type": "Organization", "name": "Brand Name" }
  }
}
</script>
```

### Batch Insertion Script Pattern

Use awk to insert before `</head>`:

```bash
awk -v qa="$QAPAGE_JSON" '/<\/head>/ {print qa} 1' "$file" > "${file}.tmp" && mv "${file}.tmp" "$file"
```

Or for Python execute_code:
```python
patch(
    path="file.html",
    old_string='<meta name="twitter:card" content="summary_large_image">',
    new_string='<meta name="twitter:card" content="summary_large_image">\n  <meta property="og:image" content="https://...">\n  <meta name="twitter:image" content="https://...">'
)
```

### Quick Answer Text Extraction
The Quick Answer box content (`<div class="quick-answer">`) provides the Answer text. Extract with:
```bash
qa=$(grep -oP 'Quick Answer</strong>.*?<p[^>]*>\K[^<]+' "$file" | head -1)
```
Fallback to first `<p>` text if no Quick Answer exists.

## FAQPage Internal Linking Strategy

Every Answer in FAQPage schema **must** end with a full URL to a relevant article:

```json
{
  "@type": "Answer",
  "text": "Answer content here with specific data. Full guide: https://domain/encyclopedia/topic/article-slug"
}
```

This creates a knowledge graph that AI crawlers traverse. When ChatGPT Search or Perplexity cites the FAQ answer, the link leads them to deeper content on the same site.

### Batch FAQPage Injection by Category

Split FAQs by category index page:

| Page | Questions | Topic |
|------|:---------:|-------|
| Homepage | 20 | Broad high-intent questions |
| Products | 10 | Ceiling, bathroom, garage, warehouse |
| Parameters | 10 | CRI, CCT, lumen, IP, beam, UGR |
| Standards | 10 | UL/ETL, CE, RoHS, ERP, IEC, NFPA |
| Scenes | 10 | Living room, office, retail, hospital |
| Troubleshooting | 10 | Flickering, buzzing, ghosting, water |

### Customer Referral Link Integration

After building an encyclopedia's knowledge content, integrate **customer referral links** so AI search citations also drive traffic to the client's site:

**Placement Strategy (by priority):**

| Location | Type | Example |
|----------|------|---------|
| 🥇 Homepage Hero badge | Inline CTA link | `Browse Certified Lighting Products →` next to trusted-badge |
| 🥇 Article footer | CTA card before Sources | `Need to source these products?` + amber button |
| 🥇 Category page top | Colored banner | `Looking for verified [topic] suppliers?` gradient bar |
| 🥈 Sidebar / Supplier Modal | Button | `Browse Products` with logo |
| 🥉 Footer nav | Simple link | `💡 Lighting Products` in brand column |

**Template for article footer CTA:**
```html
<div style="margin:2em 0;padding:20px 24px;background:linear-gradient(135deg,#f5f0e8,#fafaf8);border:1px solid #d4a574;border-radius:12px;text-align:center;">
  <p style="margin:0 0 8px;font-size:1rem;font-weight:600;color:#2d2420;">💡 Need to source these lighting products?</p>
  <p style="margin:0 0 12px;font-size:0.9rem;color:#6b5b4f;">Browse verified LED lighting products from certified suppliers at <strong>KS Import &amp; Export</strong>.</p>
  <a href="https://client.com/product/?utm_source=lighting_encyclopedia&utm_medium=article_footer&utm_campaign=client" target="_blank" style="display:inline-block;padding:10px 24px;background:#d4a574;color:white;border-radius:8px;text-decoration:none;font-weight:500;font-size:0.9rem;">Browse Lighting Products →</a>
</div>
```

**Template for banner (category pages):**
```html
<div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 pt-8">
  <a href="https://client.com/product/?utm_source=lighting_encyclopedia&utm_medium=category_banner&utm_campaign=client" target="_blank" class="block w-full p-4 bg-gradient-to-r from-amber-gold/10 to-deep-brown/5 rounded-2xl border border-amber-gold/20 hover:border-amber-gold/40 transition-all text-center">
    <span class="text-sm text-deep-brown">💡 Looking for verified [topic] suppliers? Visit <strong class="text-amber-gold">Client Name</strong> — certified partner →</span>
  </a>
</div>
```

**UTM Convention:**
- `utm_source`: `lighting_encyclopedia` (fixed)
- `utm_medium`: `hero_banner`, `category_banner`, `article_footer`, `homepage_logo`, `supplier_modal`
- `utm_campaign`: `client` (or specific campaign name)

**Principle: Don't modify FAQ Schema links to point to customer sites.** FAQ Schema's Answer text with internal links serves a distinct purpose — building a knowledge graph that AI crawlers traverse for citations. The customer gets exposure through visible CTAs on every page.

### Insertion Pitfalls

1. **Bash `$` expansion in heredocs**: Use `read -r` with `<< 'EOF'` (quoted EOF prevents variable expansion) or Python's `patch()` function
2. **Double `</script>` tags**: Some static pages have `</script></script>` (one from Tailwind config, one extra) — patch expects this exact pattern
3. **Multi-line awk + sed**: Awk insertion is most reliable for adding before `</head>`
4. **JSON encoding with `@`**: `grep -c "Question"` works; `grep '"@type"'` may fail due to bash escaping of `@` — use single quotes and escape `"` properly

## Batch og:image Injection for All Article Pages

Use keyword matching to automatically assign the most relevant product image to each article:

1. Create a keyword→image map (e.g., "kitchen"→"kichler-01-kitchen.jpg", "bathroom"→"kichler-04-bathroom.jpg")
2. Extract filename from each HTML file → match keywords → assign og:image
3. Insert before `</head>`:
   ```html
   <meta property="og:image" content="https://domain/path/to/image.jpg">
   <meta name="twitter:image" content="https://domain/path/to/image.jpg">
   ```
4. Create symlinks in assets directory (short names → actual filenames) so og:image URLs resolve

## Pitfalls Discovered

### 1. Article vs TechArticle Schema Type

Both `"@type": "Article"` and `"@type": "TechArticle"` exist across different page categories (e.g., product pages use TechArticle, scene/application pages use Article). When batch-searching for schema files, search for **both** types, not just one:

```bash
# Wrong — misses Article pages
grep -l '"@type": "TechArticle"'

# Right — catches all
grep -l '"author"' | while read f; do
  if ! grep -q '@id.*about/#person' "$f"; then echo "$f"; fi
done
```

### 3. Bash Heredoc Variable Expansion ($)

When using heredoc for JSON content in bash scripts, always quote the delimiter. **Without quotes, `$` inside JSON strings gets eaten by bash**, resulting in broken JSON:

```bash
# ❌ WRONG — bash expands $ signs inside JSON (e.g., ">=85 lm/W" loses the $)
read -r MY_JSON << EOF
{"text": ">=85 lm/W for non-directional"}
EOF
# Result: {"text": ">=85 lm/W for non-directional"}  ← $85 expanded to empty!

# ✅ CORRECT — quoted EOF prevents variable expansion
read -r MY_JSON << 'EOF'
{"text": ">=85 lm/W for non-directional"}
EOF
# Result: correct JSON
```

**The same applies to `awk`:** When injecting JSON via awk in a shell script, use a single-quoted heredoc or pipe the JSON content directly rather than using a bash variable that might undergo expansion.

### 4. Multi-line sed Fails for JSON

`sed` cannot reliably replace multi-line JSON strings with different indentation in HTML files. Use Python's `str.replace()` or the `patch` tool instead:

```python
# DO: Python with exact multi-line match
content.replace(author_old, author_new)

# DON'T: Sed multi-line (fails on indentation and line-ending variants)
sed -i 's/"author": {\n    "@type": "Organization"/.../'
```

### 3. Bash Heredoc Variable Expansion ($)

When using heredoc for JSON content in bash scripts, always quote the delimiter:

```bash
# ❌ WRONG — bash expands $ signs inside JSON (e.g., ">=85 lm/W" becomes ">=85 lm/W")
read -r MY_JSON << EOF
{"text": "$90 cost saving"}
EOF

# ✅ CORRECT — quoted EOF prevents variable expansion
read -r MY_JSON << 'EOF'
{"text": "$90 cost saving"}
EOF
```

Without quotes, any `$` in JSON strings (even in unrelated text like product prices `≥$85`) gets eaten by bash, resulting in broken JSON that doesn't have Question entries.

### 3. File Count Validation & Grep Escaping

After batch operations, always triple-verify:
```bash
grep -l 'old_value' --include='*.html' | wc -l  # Should be 0
grep -l 'new_value' --include='*.html' | wc -l  # Should equal file count
```

**Grep escaping pitfall:** When counting `@type` in JSON-LD, `grep -c '"@type": "Question"'` may return 0 due to bash escaping of `@` and `"`. Use simpler patterns:
```bash
# ✅ Works reliably
grep -c "Question" file.html

# ❌ May fail in bash scripts
grep -c '"@type": "Question"' file.html
```

Also verify online via curl with the same simple pattern:
```bash
curl -s "https://site.com/page" | grep -c "Question"
```

## Execution Order (Recommended)

```
Phase 1 (Foundation — same day):
  □ SEO Titles & Meta Descriptions batch rewrite
  □ Structured Data (JSON-LD) injection  
  □ Quick Answer Box injection
  □ E-E-A-T signals (Author + dates + sources)
  □ Internal link network
  □ Semantic HTML tags
  □ Homepage: Standards external links (IES, UL, CIE, etc.)
  □ Homepage: City links (make plain text city names clickable)
  □ Homepage: og:image generation and injection

Phase 2 (Credibility & Depth — same or next day):
  □ Homepage Schema architecture: WebSite + Organization + Person + BreadcrumbList + ItemList
  □ Create /about/ team page with AboutPage + Person Schema (@id cross-ref)
  □ Article sub-page Schema: upgrade author to @id Person, unify publisher name
  □ Hero section LLM-friendly summary paragraph
  □ Add About link to navigation and footer

Phase 3 (Infrastructure & Scale):
  □ GA4 code on all pages
  □ llms.txt + llms-full.txt
  □ sitemap.xml (regenerate)
  □ Google Search Console + IndexNow submission
  □ Weekly freshness cron job

Phase 4 (Incremental):
  □ GEO Prompt pages (/answers/ directory)
  □ Multi-platform distribution pipeline
  □ Monthly content refresh cycle
  □ Re-run 3-model audit quarterly
```

## Verification: Complete 3-Model Audit After All Phases

After all optimizations, re-run the 3-model audit and compare to the baseline report. Expected improvement from baseline to post-Phase 2: ~15 points out of 80 (55→70).

## NEW in v1.2.0: FAQ Expansion to 100+ Questions

### Strategy: Homepage + Category Pages

Rather than putting all 100+ questions on the homepage (which would bloat the page), distribute by topic:

| Page | Questions | Topic |
|------|:---------:|-------|
| Homepage | 20-50 | Broad high-intent + Commerce + Advanced/Long-tail |
| Products | 10 | Ceiling, bathroom, garage, warehouse |
| Parameters | 10 | CRI, CCT, lumen, IP, beam, UGR |
| Standards | 10 | UL/ETL, CE, RoHS, ERP, IEC, NFPA |
| Scenes | 10 | Living room, office, retail, hospital |
| Troubleshooting | 10 | Flickering, buzzing, ghosting, water |
| **Total** | **70-100** | |

### Commerce & Standards Module Questions (English)

When users request FAQ expansion for GEO optimization, generate 10-20 English questions covering:

- **Export certifications**: "What certifications are required to export LED lights to the USA?" — answer with UL/ETL/FCC/Energy Star + customer link
- **EU compliance**: "What CE certifications do LED lights need for the European market?" — LVD/EMC/RoHS/ERP + customer link
- **E-commerce photography**: "How do I choose the right color temperature for e-commerce product photography?" — 5000-5500K daylight + CRI 95+ + customer link
- **Regional standards**: Australia (AS/NZS), commercial kitchen (IP65/NSF), DarkSky compliance
- **ROI calculation**: "How do I calculate ROI when switching to LED lighting in a commercial building?"
- **Emergency lighting**: NFPA 101, IBC requirements

### Advanced & Long-Tail Module Questions (English)

Generate 20+ English questions for deep coverage:

- Niche applications: shower area LED strips (IP67), cold environments (walk-in freezers), insulated ceilings (IC-rated), hazardous locations (Class I Div 1/2)
- Technical comparisons: 0-10V vs TRIAC dimming, DALI vs Zigbee, Type A/B/C LED tubes, CC vs CV drivers
- Color science: R9 value, TM-30 metrics (Rf/Rg), SDCM binning, green/purple tint causes
- Practical calculations: lumens per sq ft, max lights on 15A circuit, driver wattage calculation, voltage drop prevention
- Commercial design: perimeter retail shelf lighting, wireless office controls, motion sensor compatibility

### Each Answer MUST end with a customer referral link

```json
{
  "@type": "Answer",
  "text": "Technical answer content here. Browse certified products: https://customer.com/product/?utm_source=encyclopedia&utm_medium=faq_schema&utm_campaign=customer"
}
```

### Pitfall: JSON-LD Array vs Multiple Script Tags

When the homepage already has multiple Schemas (CollectionPage + FAQPage + Organization), the JSON must be wrapped in an array:

```json
<script type="application/ld+json">
[
  { "@context": "...", "@type": "CollectionPage", ... },
  { "@context": "...", "@type": "FAQPage", "mainEntity": [...] },
  { "@context": "...", "@type": "Organization", ... }
]
</script>
```

Without the outer `[]` brackets, parsing tools may fail. When using `patch()` to extend an existing FAQ, the replacement must end with `]}]` to close both the inner object and outer array.

### Pitfall: Duplicate  Tags After Batch Operations

When extending FAQPage JSON-LD via `patch()`, watch for double `</script>` tags. The patch replacement may leave both the old closing `</script>` and the new one, resulting in:

```html
</script>
</script>
```

Fix by verifying with `grep -n '</script>'` after each batch operation.

### Pitfall: Bash $ Expansion in FAQ JSON Answers

When writing FAQ JSON via bash heredoc in a script, `$` symbols (e.g., ">=85 lm/W", "$90 cost") get eaten by bash if the heredoc delimiter isn't quoted:

```bash
# ❌ WRONG — $85 becomes empty
read -r FAQ << EOF
{"text": ">=85 lm/W for non-directional"}
EOF

# ✅ CORRECT
read -r FAQ << 'EOF'
{"text": ">=85 lm/W for non-directional"}
EOF
```

For complex multi-line JSON with prices and standard numbers, use Python `patch()` instead of bash scripts.

### Pitfall: og:image Duplication

After batch og:image injection, the homepage may end up with duplicate og:image meta tags (one from original template, one from og:image batch). Also, the injected path may differ from the original (e.g., `assets/og.jpg` vs `assets/images/og.jpg`).

**Fix:** Check with `grep -n 'og:image' index.html` and deduplicate manually. Verify the correct file exists at the referenced path.

### Pitfall: Accidental Deletion of twitter:card/twitter:image

When cleaning up duplicate meta tags, be careful not to delete the only twitter:card or twitter:image tag. After cleanup, verify:

```bash
grep -c 'twitter:card' index.html  # Should be 1
grep -c 'twitter:image' index.html  # Should be 1
grep -c 'og:image' index.html  # Should be 4 (URL+width+height+alt)
```

### Pitfall: FAQ Count Mismatch After Patch

When extending a JSON-LD FAQPage array with `patch()`, the final question count may not match expectations due to:
1. The patch replacing more or fewer items than intended
2. Old `]}]` closing the array early vs new content

**Always verify** with `grep -c "Question" file.html` after each patch operation — don't assume the count is correct.

### Post-Optimization Audit Checklist

After all GEO optimization phases, run a comprehensive audit:

```bash
# 1. Page health
curl -s -o /dev/null -w "%{http_code}" "https://domain/page"  # all 200

# 2. Schema coverage
grep -c "FAQPage" index.html          # 1+
grep -c "Question" index.html         # 20-50
grep -c "QAPage" article/*.html       # all articles

# 3. No duplicate meta
grep -c "og:image" index.html         # 4 (URL+width+height+alt)
grep -c "twitter:card" index.html     # 1
grep -c "og:locale" index.html        # 1

# 4. Image access
curl -s -o /dev/null -w "%{http_code}" "https://domain/path/to/image.jpg"  # 200

# 5. Customer link coverage
grep -rl "customer.com/product/" encyclopedia/ --include='*.html' | wc -l

# 6. No Chinese text (for English sites)
grep -cP '[\x{4e00}-\x{9fff}]' index.html  # 0

# 7. Closing tags
grep -c '</html>' index.html          # 1
grep -c '</body>' index.html          # 1
```

## City Directory Page Handling

City store pages (e.g., `us/new-york-lighting-stores.html`) have different requirements:

| Feature | Apply? | Reason |
|---------|--------|--------|
| GA4 | ✅ Yes | Universal tracking |
| Canonical | ✅ Yes | SEO basics |
| Structured Data | ✅ LocalBusiness + ItemList | Google Maps integration |
| Quick Answer | ❌ No | Store directories don't need QA |
| Author/EEAT | ❌ No | Listing pages, not articles |
| Semantic HTML | ✅ Yes | Universal improvement |
| Internal links | ✅ Yes | Link between nearby cities |

## Common Pitfalls

### 1. HTML Template Variations
Not all pages use the same `<div class="content">` wrapper. Some may use `<article>` or `<main>` directly. **Always check the actual HTML structure** before writing batch scripts.

### 1b. Article Tag May Not Exist on Some Pages
After applying semantic HTML Step 6 (`<div class="content">` → `<article>`), some pages may still lack `<article>` tags due to template variations (e.g., some products use `<div>`-only templates without the content wrapper). This causes subsequent batch operations (Quick Answer, depth expansion) to silently skip those pages.

**Fix:** When searching for content boundaries in batch scripts, use multiple fallbacks:
```python
# Strategy: try <article> first, fallback to <div class="container">, then <main>
pos = content.find('<article')
if pos < 0:
    pos = content.find('<div class="container"')
if pos < 0:
    pos = content.find('<main')
# For end boundary, use Related Articles marker as anchor
related_pos = content.find('<!-- Related Articles')
if related_pos < 0:
    related_pos = content.find('<aside class="related-articles"')
```

### 1c. Multiple Batches Needed for Depth Expansion
Content word count expansion often requires multiple passes because: (1) different page types have different starting word counts, and (2) inserting too many paragraphs in one pass may push unrelated sections apart. 

Best practice: run depth expansion in 2-3 rounds, each adding 2-4 data paragraphs per page, with a re-check after each round. Target word counts: 1500 minimum for basic pages, 2000+ for core content pages.

### 1d. Sources Block Insertion Position
Always insert Sources/References blocks **before** the Related Articles section (`<!-- Related Articles -->`), not after. This keeps supplementary content inside `<main>` and semantically grouped. If no Related Articles marker exists, insert before `</main>` or the last closing block.

### 2. Encoding Issues with DOMDocument
PHP's `DOMDocument::saveHTML()` converts non-ASCII characters to HTML entities (e.g., `℃` → `&#8451;`). When processing via PHP, use `mb_substitute_character()` and `ENT_XML1` flags:
```php
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$output = $dom->saveHTML();
$output = html_entity_decode($output, ENT_QUOTES | ENT_XML1, 'UTF-8');
```

### 3. JSON-LD Array vs Single Object
When injecting JSON-LD, if a page might already have one `<script type="application/ld+json">`, use a `<script>` block array `[{BreadcrumbList}, {FAQPage}]` instead of creating a second script tag.

### 4. Breadcrumb Closing Logic
The breadcrumb `<nav>` must close BEFORE `<h2>`, but the breadcrumb `<div>` is inside the `<nav>`. The replacement pattern:
```
Before: <div class="breadcrumb">...</div>\n<h2>Title</h2>
After:  <nav aria-label="breadcrumb"><div class="breadcrumb">...</div></nav>
\n<h2>Title</h2>
```

### 5. Container Element Closure
The `<main>` element wraps everything from container start through related articles:
```
<main class="container">
  <nav>...breadcrumb...</nav>

  <h2>...</h2>
  <div class="meta">...</div>
  <article class="content">...</article>
  <aside class="related-articles">...</aside>
</main>
```
The closing `</div>` of the original container is typically just before `<!-- End Related Articles -->`.

### 6. GA4 Batch Insertion
When batch-inserting GA4 into 100+ files, also update the Google Analytics Measurement ID variable in the script. Use a variable so you only need to change it once:
```javascript
// BAD: hardcoded in script tag
gtag('config', 'G-XXXXXXXX');

// BETTER: use variable at top of script
const GA_MEASUREMENT_ID = 'G-XXXXXXXX';
```

### 7. Modified Time Doesn't Mean New Content
Updating `article:modified_time` tells AI crawlers the page is fresh, but if the actual content hasn't changed, AI engines will notice. Only bump the modified date when content has actually been refreshed.

### 8. Quick Answer Box Content Quality
The auto-generated Quick Answer might truncate awkwardly if the first paragraph doesn't start with a clear answer. For best results, hand-craft the Quick Answer for the top 10 most important pages.

### 9. Sitemap Priority for New Pages
New `/answers/` pages should have lower priority (0.7) initially, then raise to 0.8 after 30 days if they're getting traffic.

### 10. Don't Over-Engineer City Pages
City directory pages (50+ per encyclopedia) are listing pages, not content articles. Spend minimal optimization effort:
- GA4 + Canonical + LocalBusiness schema
- No Quick Answer, no complex JSON-LD
- Simple internal link structure

## Required Tools

| Tool | Purpose | Location |
|------|---------|----------|
| Python 3 | Batch processing scripts | Server default |
| Python `re` module | Regex for HTML parsing | stdlib |
| nginx | Static file serving | `/etc/nginx/` |
| curl | Testing HTTP endpoints | Default installed |

## Verification Checklist

After running the full pipeline:

```bash
# 1. Check GA4 coverage
grep -rl "G-XXXXXXXX" /path/to/encyclopedia/ --include="*.html" | wc -l

# 2. Check Quick Answer coverage (skip city/utility)
grep -rl "quick-answer" /path/to/encyclopedia/ --include="*.html" | wc -l

# 3. Check JSON-LD coverage
grep -rl "application/ld+json" /path/to/encyclopedia/ --include="*.html" | wc -l

# 4. Check semantic HTML
grep -rl "<main" /path/to/encyclopedia/ --include="*.html" | wc -l
grep -rl "<article" /path/to/encyclopedia/ --include="*.html" | wc -l

# 5. Check E-E-A-T signals
grep -rl "Author:" /path/to/encyclopedia/ --include="*.html" | wc -l

# 6. Verify sitemap
python3 -c "import re; xml=open('sitemap.xml').read(); print(f'{len(re.findall(\"<loc>\", xml))} URLs in sitemap')"

# 7. Check llms.txt
curl -s -o /dev/null -w "%{http_code}" https://domain/encyclopedia/topic/llms.txt

# 8. Verify robots.txt AI crawler rules
curl -s "https://domain/encyclopedia/topic/robots.txt" | grep -c "GPTBot"

# 9. Spot-check: is a scanning tool's negative finding actually correct vs actual page state?
#    Many automated scanners only check the homepage and extrapolate — always verify with
#    actual file system scans before spending time on "fixes"
```