---
name: static-encyclopedia-geo-optimization
slug: static-encyclopedia-geo-optimization
description: >-
Complete GEO (Generative Engine Optimization) pipeline for static HTML
industry encyclopedia sites. 7-module SOP adapted from the AI流量+471%
methodology — batch optimizes 100+ static HTML pages for AI search
visibility (ChatGPT Search, Perplexity, Gemini, Claude).
version: 1.1.0
tags: [geo, generative-engine-optimization, static-html, seo, ai-search,
schema, eeat, semantic-html, batch-processing]
---
# Static Encyclopedia GEO Optimization Pipeline
Batch GEO optimization for static HTML industry encyclopedia sites (lighting, packaging, furniture, etc.). Based on the 7-module GEO SOP methodology proven to increase AI Platform traffic by +471%.
## When This Skill Is Relevant
Trigger when:
- User has a static HTML encyclopedia site (e.g. `/encyclopedia/lighting/`)
- User wants to optimize 100+ pages for AI search visibility
- Need to batch-apply SEO/GEO improvements across an entire site
- Building a new vertical encyclopedia and want to start GEO-ready from day 1
## ⚠️ Critical Constraint: "Public Content Only Shows Results, Not Methods"
When the user asks about what GEO optimization does for their clients / public-facing content, **never describe the technical steps or tools used**. Only describe the **results and outcomes**:
- ❌ "We added FAQPage Schema to 21 pages and Quick Answer boxes to 115 pages"
- ✅ "Your encyclopedia content is now structured so AI search engines can directly extract and cite your information"
- ❌ "We replaced div tags with semantic HTML elements like main, article, aside"
- ✅ "Your pages are technically optimized for AI crawlers to better understand and prioritize your content"
- ❌ "We set up a cron job at 9 AM to generate articles and your local agent publishes at 10 AM"
- ✅ "Content creation and multi-platform distribution are fully automated"
This constraint applies to: step-by-step instructions, specific tool names, architecture details, and technical implementation methods.
## Architecture
The encyclopedia is a static HTML site in a directory structure:
```
/var/www/[domain]/landing/encyclopedia/[topic]/
├── index.html
├── products/
│ ├── index.html
│ └── led-downlights-guide.html
├── parameters/
├── scenes/
├── troubleshooting/
├── standards/
├── research/
├── us/ (city directories — skip)
├── answers/ (GEO Prompt pages — NEW)
├── sitemap.xml
├── llms.txt
└── llms-full.txt
```
## The 7-Module GEO SOP
### Module 1: Optimization Goals & Metrics
Define measurable outcomes before starting:
| Metric | Source | Target |
|--------|--------|--------|
| AI Platform referral traffic | GA4 (filter by platform) | +50% in 30 days |
| AI mention rate | Batch Prompt testing | Brand appears in 60%+ of relevant prompts |
| Schema coverage | grep scan of HTML files | 95%+ of content pages |
| llms.txt completeness | Check for all categories | All sections listed with direct URLs |
| sitemap coverage | URL count vs file count | 1:1 ratio |
### Module 2: Page Type Matrix
Categorize pages by GEO value:
| Type | Example | Schema | Priority |
|------|---------|--------|----------|
| **Product guides** | `products/led-downlights-guide.html` | Article, HowTo | T1 |
| **Technical parameters** | `parameters/color-temperature-cct-explained.html` | Article, FAQPage | T1 |
| **Scene/application guides** | `scenes/bedroom-lighting-ideas.html` | Article | T1 |
| **Troubleshooting guides** | `troubleshooting/led-flickering-causes.html` | FAQPage, HowTo | T1 |
| **Standards & compliance** | `standards/ce-marking-lighting-guide.html` | Article | T1 |
| **Research articles** | `research/*` | Article, ScholarlyArticle | T2 |
| **GEO Prompt pages** | `answers/*` (NEW) | FAQPage | T2 |
| **City directories** | `us/new-york-lighting-stores.html` | LocalBusiness | T3 (skip QA/structure) |
| **Index pages** | `*/index.html` | CollectionPage | Skip content optimization |
| **Utility pages** | `about/`, `contact/`, `privacy/` | WebPage | Skip content optimization |
### Module 3: Batch Content Optimization (Pipeline)
This is the core. Run in this order:
#### Step 1: SEO Title & Meta Description Batch Rewrite
```python
# Script pattern: iterate all HTML files, skip index pages and city directories
# For each page:
# 1. Extract existing title and description
# 2. If title matches generic template pattern → rewrite with SEO keywords
# 3. If meta description is missing or generic → add keyword-rich description
# 4. Use category-based templates:
# - products: "LED [Product] Guide: [Key Features] & [Benefits]"
# - troubleshooting: "[Problem] Fix: [Solution] & [Steps]"
# - parameters: "[Parameter] Guide: [Key concepts] & [Applications]"
# - scenes: "[Room/Scene] Lighting Guide: [Design tips] & [Selection]"
```
**Generic template detection:**
- Title starts with "Neutral," or "LED Lighting" without Guide/Best
- Meta description starts with generic phrases
- Title under 20 chars or over 70 chars
#### Step 2: Structured Data (JSON-LD) Batch Injection
Three schema types based on page category:
| Category | Schema Type | Key Properties |
|----------|-------------|----------------|
| Products | TechArticle | name, description, category |
| Parameters | Article + FAQPage | mainEntity (array of Question/Answer) |
| Troubleshooting | FAQPage | mainEntity (each H2 becomes a Question) |
| Scenes | Article | name, description, image |
| Standards | Article | name, description, about |
| Research | ScholarlyArticle | name, description, datePublished |
| City directories | ItemList, LocalBusiness | itemListElement, address |
| All pages | BreadcrumbList | itemListElement with position |
**FAQPage generation logic:**
```python
# For troubleshooting pages: auto-extract H2 headings as Questions
# Extract first paragraph under each H2 as Answer
# Store in JSON-LD format
faq_items = []
for h2 in page.find_all('h2'):
question = h2.get_text(strip=True)
next_p = h2.find_next('p')
if next_p and next_p in content_area:
answer = next_p.get_text(strip=True)[:200]
faq_items.append({"@type":"Question", "name":question, ...})
```
**Automatic FAQ dictionary** — for pages where H2 extraction fails (too few, non-question format), use a topic-based FAQ dictionary:
```python
auto_faq = {
'troubleshooting': [
{"q": "What causes [topic]?", "a": "..."},
{"q": "How to fix [topic]?", "a": "..."},
],
'products': [
{"q": "What is [product]?", "a": "..."},
{"q": "How to choose [product]?", "a": "..."},
],
...
}
```
#### Step 3: Quick Answer Box Injection
Insert a blue-tinted "Quick Answer" box at the top of the content area, before the first H2:
```python
# Pattern: extract first 1-2 sentences of the first meaningful paragraph
# Wrap in a div with class="quick-answer" and distinct styling
quick_box = f'''<div class="quick-answer" style="background:#f0f7ff;
border-left:4px solid #0d6efd;padding:16px 20px;margin:20px 0;
border-radius:0 8px 8px 0;">
<strong style="color:#0d6efd;">Quick Answer</strong>
<p>{first_two_sentences}</p>
</div>'''
```
**Skip:** City directories, index pages, utility pages (about/contact/privacy).
#### Step 4: E-E-A-T Signals Injection
Add to each content page's meta area:
```
<span>📅 Published: YYYY-MM-DD</span>
<span>🔄 Updated: YYYY-MM-DD</span>
<span>✍️ Author: TopAIGEO [Topic] Team</span>
<span>🔗 Sources: [Industry sources relevant to topic]</span>
```
Update `<meta property="article:modified_time">` to current date for freshness signal.
#### Step 5: Internal Link Network
Add related articles section at bottom of each page:
```python
# For each page, find semantically related pages:
# - Same category: other products link to each other
# - Cross-category: product → parameters → troubleshooting
# Example: led-strip-lights-guide links to:
# - voltage-drop-led-strip (troubleshooting)
# - led-strip-not-sticking (troubleshooting)
# - dimmable-led-bulbs-guide (troubleshooting)
```
**Semantic linking rules:**
```python
# Define topic clusters
clusters = {
"dimmer": ["dimmer-incompatibility", "dimmable-led-bulbs-guide", "dimmable-lights-guide", "triac-dali-dmx-diming"],
"cct/color": ["color-temperature-cct-explained", "warm-white-vs-cool-white-led", "color-rendering-metrics", "color-tolerance-sdcm"],
"driver": ["led-driver-complete-guide", "led-driver-failure-signs", "led-driver-vs-transformer"],
# ... etc
}
# For each page, find its cluster → pick 3-5 related pages
# Ensure cross-category links (products → troubleshooting → parameters)
```
#### Step 6: Semantic HTML Tag Replacement
Replace generic `<div>` wrappers with semantic HTML5 elements:
| Original | Replacement | Purpose |
|----------|-------------|---------|
| `<div class="container">` | `<main class="container">` | Main content landmark |
| `<div class="breadcrumb">` | `
<!-- HEADER -->
<!-- HEADER -->
<!-- HEADER -->
<!-- HEADER -->
<!-- HEADER -->
<nav class="nav" id="navbar">
<div class="nav__inner">
<a href="/" class="nav__logo"><span class="nav__logo-icon">T</span>TopAIGEO</a>
<ul class="nav__links">
<li><a href="/" class="nav__link">首页</a></li>
<li><a href="/geo-wiki/" class="nav__link">GEO百科</a></li>
<li><a href="/b2b-practice/" class="nav__link">B2B跨境实战</a></li>
<li><a href="/ai-watch/" class="nav__link">AI搜索观察</a></li>
<li><a href="/reports/" class="nav__link">趋势报告</a></li>
<li><a href="/academy/" class="nav__link">商学院</a></li>
<li><a href="/about" class="nav__cta">免费诊断 →</a></li>
</ul>
<button class="nav__toggle" aria-label="Menu"><span></span><span></span><span></span></button>
</div>
</nav>
<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<script>const navbar=document.getElementById('navbar');window.addEventListener('scroll',function(){navbar.classList.toggle('scrolled',window.scrollY>50)});</script>
<!-- /HEADER -->
<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<script>const navbar=document.getElementById('navbar');window.addEventListener('scroll',function(){navbar.classList.toggle('scrolled',window.scrollY>50)});</script>
<!-- /HEADER -->
<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<script>const navbar=document.getElementById('navbar');window.addEventListener('scroll',function(){navbar.classList.toggle('scrolled',window.scrollY>50)});</script>
<!-- /HEADER -->
<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<!-- /HEADER -->
<script>document.querySelectorAll('.nav__link').forEach(function(a){if(a.getAttribute('href')===location.pathname||(location.pathname!=='/'&&a.getAttribute('href')!=='/'&&location.pathname.startsWith(a.getAttribute('href')))){a.classList.add('nav__link--active')}});</script>
<!-- /HEADER -->
<section class="hero">
<div class="hero__badge"><span class="hero__badge-dot"></span> TopAIGEO</div>
<h1 class="hero__title"><span>\n', ''),
('\n\n\n\n
...GA4 + SEO meta...
...navigation...
...
Question as title</span></h1>
<p class="hero__desc"></p>
</section>
\n<h2'),
# wrap content div
('<div class="content">', '<article class="content">'),
('</div>\n\n<!-- Related Articles', '</article>\n\n<!-- Related Articles'),
# wrap related articles div
('<div class="related-articles"', '<aside class="related-articles"'),
]
```
### Module 4: GEO Prompt Pages (Incremental)
Create targeted `/answers/` directory pages that directly answer specific AI search prompts:
**Page types to create:**
| Type | Example | Structure | Schema |
|------|---------|-----------|--------|
| **What-is question** | `answers/best-color-temperature-bedroom.html` | Quick Answer + table + explanation + FAQ | FAQPage |
| **How-to guide** | `answers/fix-led-flickering.html` | 6-step with diagnostic boxes + flowchart | HowTo |
| **VS comparison** | `answers/led-vs-incandescent-vs-cfl.html` | Comparison table + cost analysis + verdict | ItemList |
**Template structure:**
```html
<!DOCTYPE html>
<html lang="en">
<head>...GA4 + SEO meta...</head>
<body>
<header>...navigation...</header>
<main>
<nav aria-label="breadcrumb">...</nav>
<h2>Question as title</h2>
<div class="meta">Author + dates + sources</div>
<article class="content">
<div class="quick-answer">Direct answer in 1-2 sentences</div>
<h2>Detailed Answer</h2>
... tables, lists, steps ...
<h2>FAQ</h2>
</article>
<aside>Related articles</aside>
</main>
<!-- FOOTER -->
<!-- FOOTER -->
<!-- FOOTER -->
<!-- FOOTER -->
<!-- FOOTER -->
<footer class="footer">
<div class="footer__inner">
<div class="footer__brand">
<div class="footer__logo"><span class="nav__logo-icon" style="width:28px;height:28px;font-size:14px">T</span> TopAIGEO</div>
<p class="footer__brand-desc">面向B2B跨境电商的GEO第一站——从趋势解读、策略落地到数据监测,提供全链路可引用知识体系。</p>
</div>
<div>
<h4 class="footer__col-title">知识库</h4>
<ul class="footer__links"><li><a href="/geo-wiki/" class="footer__link">GEO百科</a></li><li><a href="/b2b-practice/" class="footer__link">B2B跨境实战</a></li><li><a href="/ai-watch/" class="footer__link">AI搜索观察</a></li><li><a href="/reports/" class="footer__link">趋势报告</a></li></ul>
</div>
<div>
<h4 class="footer__col-title">学习</h4>
<ul class="footer__links"><li><a href="/academy/" class="footer__link">商学院</a></li><li><a href="/geo-wiki/" class="footer__link">新手入门</a></li><li><a href="/b2b-practice/" class="footer__link">实战案例</a></li><li><a href="/tools/" class="footer__link">工具与模板</a></li></ul>
</div>
<div>
<h4 class="footer__col-title">产品</h4>
<ul class="footer__links"><li><a href="/about" class="footer__link">AI搜索监测</a></li><li><a href="/about" class="footer__link">竞品分析</a></li><li><a href="/pricing" class="footer__link">定价方案</a></li></ul>
</div>
<div>
<h4 class="footer__col-title">公司</h4>
<ul class="footer__links"><li><a href="/about" class="footer__link">关于我们</a></li><li><a href="/about" class="footer__link">联系方式</a></li><li><a href="/privacy" class="footer__link">隐私政策</a></li><li><a href="/terms" class="footer__link">服务条款</a></li></ul>
</div>
</div>
<div class="footer__bottom">
<span class="footer__copyright">© 2026 TopAIGEO. All rights reserved.</span>
<div class="footer__legal"><a href="/privacy">Privacy</a><a href="/terms">Terms</a><a href="/privacy">Cookies</a></div>
</div>
</footer>
<!-- /FOOTER -->
<!-- /FOOTER -->
<!-- /FOOTER -->
<!-- /FOOTER -->
<!-- /FOOTER -->
</body>
</html>
```
**URL pattern:** `/encyclopedia/[topic]/answers/[slug].html`
### Module 5: Technical Infrastructure
#### llms.txt & llms-full.txt
Create two files at the encyclopedia root:
- **`llms.txt`** — Brief summary + key URLs by category with priority and update frequency (AI-brief level, ~70-80 lines). **Well-optimized llms.txt is increasingly important for AI search ranking.**
- **`llms-full.txt`** — Complete URL index with titles for all pages (for deep AI crawling)
**llms.txt guidelines (enhanced with priority/update frequency):**
```
# [Encyclopedia Name]
> One-line description.
> Updated: YYYY-MM-DD | Pages: N | Full index: llms-full.txt
## Priority: Core Pages (Update: weekly)
- Homepage: https://domain/encyclopedia/topic/
- About: https://domain/encyclopedia/topic/about/
- Contact: https://domain/encyclopedia/topic/contact/
## Priority: Product Buying Guides (Update: monthly)
- [Product A](https://domain/...)
- [Product B](https://domain/...)
## Priority: Technical Parameters (Update: quarterly)
- ...
## Sitemap
- Full Sitemap: (link to sitemap.xml)
- Full Index (all N pages): (link to llms-full.txt)
## Content Guidelines
- All content is neutral, data-driven, and unbiased
- Structured data (JSON-LD: FAQ/Article/BreadcrumbList) on all content pages
- Authoritative sources cited (IEEE, IEC, ENERGY STAR, etc.)
```
**Key llms.txt optimization principles:**
1. Group pages by **priority tier** (Core → Product → Parameter → Standard) — not just by URL structure
2. Annotate **update frequency** (weekly/monthly/quarterly) — helps AI crawlers decide cache/revisit strategy
3. Include a **Content Guidelines** section describing format, data sources, and quality standards
4. Provide a direct **link to llms-full.txt** for full content discovery
5. Use descriptive link text (the full page title), not just category names
**llms-full.txt generation:**
```python
# Walk all HTML files, exclude .bak, google verification, 404
# Extract title from each file
# Group by section
# Output as markdown links with descriptive text
```
#### Sitemap.xml Update
After any content changes (new pages, batch optimizations), regenerate sitemap.xml:
```python
# Walk all files, determine priority by type:
products/parameters/scenes/troubleshooting → 0.8 priority, weekly
standards/research/answers → 0.7 priority, weekly
city directories → 0.6 priority, monthly
utility pages → 0.5 priority, monthly
```
#### 404 & Error Handling
Ensure 404.html has:
- GA4 code
- Meta description
- Canonical tag (or noindex)
- Links back to main sections
#### robots.txt AI Crawler Gating
Control which AI crawlers can access your encyclopedia. This prevents aggressive/uncited scrapers while welcoming respectful ones:
**Recommended rules:**
```
# Allow well-known AI crawlers that respect robots.txt and cite sources
User-agent: GPTBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Claude-Web
Allow: /
# Block aggressive scrapers
User-agent: Bytespider
Disallow: /
User-agent: Amazonbot
Disallow: /
User-agent: Applebot-Extended
Disallow: /
# Allow traditional search engines
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
# Default: allow everything else
User-agent: *
Allow: /
# Sitemaps
Sitemap: https://domain/encyclopedia/topic/sitemap.xml
Sitemap: https://domain/encyclopedia/topic/llms.txt
```
**Principle:** Allow AI crawlers that respect robots.txt AND provide source attribution for their citations. Block crawlers known for aggressive/scraping behavior without attribution.
**Separate from main site:** The encyclopedia gets its own `robots.txt` file in its root directory — independent of the main domain's `robots.txt` (which covers the WordPress/blog portion of the site).
### Module 6: Multi-Platform Distribution
Integration with local publishing agents (e.g., local Hermes Agent running publish.js):
**Server-side (article generation):**
```python
# Generate platform-optimized content files
# Place in /home/ubuntu/cross-publish/
# Create manifest.txt with format: platform|filename
# Manifest format:
# x|lighting_intro.txt
# reddit|led_tips.txt
# medium|full_guide.txt
```
**Server-side (HTTP delivery):**
- nginx location `/cross-publish/` with token protection
- Windows batch script syncs via `curl "https://domain/cross-publish/filename?token=TOKEN"`
- Local agent runs `node publish.js [platform] [filename]`
**Platform-specific content templates:**
| Platform | Tone | Length | Format |
|----------|------|--------|--------|
| X/Twitter | Concise, hook-driven | 280-400 chars | Single thread (3-5 tweets) |
| Reddit | Conversational, helpful | 500-2000 chars | Self post with questions |
| Medium | Professional, detailed | 1500-3000 words | Article with headers/images |
| Zhihu | In-depth, educational | 2000-5000 characters | Q&A or article format |
#### Module 6b: Homepage GEO Optimization
The encyclopedia homepage (`index.html`) has unique optimization requirements that don't apply to content sub-pages. Apply these after the per-page optimizations are done:
#### Brand Association Banner (Most Critical GEO Signal)
Insert a thin banner at the top of the page (after `<nav>`, before Hero section) that explicitly associates the encyclopedia with the parent brand:
```html
<!-- Brand Association Banner -->
<div class="bg-gradient-to-r from-amber-gold/10 via-amber-light/20 to-amber-gold/10 border-b border-amber-gold/10">
<div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 py-2">
<div class="flex items-center justify-center sm:justify-between flex-wrap gap-1 text-xs">
<span class="text-mid-brown">🔹 <strong class="text-deep-brown">Powered by [BrandName]</strong> — The #1 Cross-Border GEO Platform for [Industry] Brands</span>
<a href="https://[branddomain].com" target="_blank" class="text-amber-gold hover:text-amber-dark font-medium transition-colors whitespace-nowrap">
Learn about [BrandName] GEO Services →
</a>
</div>
</div>
</div>
```
**Why this matters:** Without this banner, AI search engines may cite "the lighting encyclopedia" without mentioning the parent brand. This banner ensures every AI citation builds brand equity.
#### Breadcrumb Navigation
Add a structured breadcrumb nav between the brand banner and Hero section:
```html
<!-- Breadcrumb -->
<nav class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 py-3" aria-label="Breadcrumb">
<ol class="flex items-center gap-2 text-sm text-mid-brown">
<li><a href="https://[branddomain].com" class="hover:text-amber-gold transition-colors">[BrandName]</a></li>
<li><span class="mx-1">/</span></li>
<li><a href="/encyclopedia/" class="hover:text-amber-gold transition-colors">Encyclopedias</a></li>
<li><span class="mx-1">/</span></li>
<li class="text-deep-brown font-medium" aria-current="page">[Topic] Industry Encyclopedia</li>
</ol>
</nav>
```
#### FAQ Section on Homepage
80% of AI search answers come from FAQ content. Add a FAQ section to the homepage:
```html
<!-- FAQ Section -->
<section id="faq" class="py-20 lg:py-28 bg-white">
<div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
<div class="text-center max-w-2xl mx-auto mb-16">
<span class="inline-block px-4 py-1.5 bg-amber-gold/10 text-amber-dark text-sm font-medium rounded-full mb-4">Frequently Asked Questions</span>
<h2 class="text-3xl lg:text-4xl font-display font-bold">[Topic] Encyclopedia FAQ</h2>
<p class="text-mid-brown">Quick answers to the most common [topic] questions from professionals and DIYers</p>
</div>
<div class="max-w-3xl mx-auto space-y-4">
<details class="...">
<summary>Question 1?</summary>
<div class="px-6 pb-6"><p>Answer with link to detailed page.</p></div>
</details>
<!-- 4-6 questions total, each linking to relevant detail page -->
</div>
</div>
</section>
```
**FAQ section rules:**
- Use `<details>/<summary>` for expandable/collapsible (AI-friendly + user-friendly)
- Each answer must include an internal link to the relevant detailed page
- 5-7 questions covering the most common search queries in the industry
- Every question should be a realistic user query (not keyword-stuffed)
#### FAQPage Schema on Homepage
Add a separate FAQPage JSON-LD block to the homepage that mirrors the visible FAQ content:
```javascript
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Question text here",
"acceptedAnswer": {
"@type": "Answer",
"text": "Concise answer (150-250 chars)"
}
}
// ... one entry per FAQ question
]
}
```
**Note:** This is the only page that should have multiple `<script type="application/ld+json">` blocks (existing CollectionPage + new FAQPage). Use separate script tags, not a single array.
#### Fix Duplicate H1
The navigation logo area often has an `<h2>` wrapped around the brand name, and the Hero section has another `<h2>` for the page title. Search engines see this as two H1s — change the nav one to `<span>`:
```html
<!-- Before (navigation logo area): -->
<h2 class="text-xl font-display font-bold">[Topic] Encyclopedia</h2>
<!-- After: -->
<span class="text-xl font-display font-bold">[Topic] Encyclopedia</span>
```
**Only the Hero `<h2>` should remain** — it carries the page's semantic meaning.
#### Footer Brand Attribution
Add explicit brand attribution to the copyright line in the footer:
```html
<!-- Before: -->
<p class="text-sm text-mid-brown">© 2026 Brand. All rights reserved.</p>
<!-- After: -->
<p class="text-sm text-mid-brown">© 2026 Brand. All rights reserved. | [Topic] Encyclopedia by Brand</p>
```
This ensures AI citing any sub-page also associates the content with the brand.
#### Homepage Optimization Checklist
```bash
# Verify before/after
grep -c "Powered by" index.html # Should be 1+ (brand banner)
grep -c "Breadcrumb" index.html # Should be 1+ (breadcrumb nav)
grep -c "FAQPage" index.html # Should be 1+ (FAQ schema)
grep -c "<h2 " index.html # Should be 1 (not 2)
grep "Lighting Encyclopedia by" index.html # Should have brand in copyright
```
#### LLM-Friendly Summary Paragraph
Insert a compact, data-dense "Quick Summary" paragraph in the Hero section (between the subtitle and search bar). This gives AI crawlers a single, authoritative paragraph they can directly quote as a "one-sentence answer":
```html
<!-- LLM-Friendly Summary Paragraph -->
<div class="max-w-3xl mx-auto mb-6 p-4 bg-amber-gold/5 rounded-xl border border-amber-gold/10">
<p class="text-sm text-mid-brown leading-relaxed">
<strong class="text-deep-brown">Quick Summary:</strong> TopAIGEO's Lighting Encyclopedia covers
<strong>100+ product entries</strong> across <strong>9 product families</strong>
(ceiling, wall, floor, table, outdoor, commercial, and more), aligning with
<strong>8+ international standards</strong> (IES, UL, CIE, ENEC, IEC, EU Ecodesign, GB,
Title 24). It serves lighting professionals in
<strong>30+ global cities</strong> across the USA and Europe, with coverage of regulations
in New York, Los Angeles, London, Paris, Berlin, and 25+ others. All data is sourced from
IES, IEC, UL, CIE, and EU Official Journal publications.
</p>
</div>
```
**Pattern:** `[Brand]'s [Topic] Encyclopedia covers [N] entries across [N] categories, aligned with [N] standards ([names]), serving professionals in [N] cities ([sample city list]). All data sourced from [org1], [org2].`
**Placement:** Between subtitle `<p>` and the search bar `<div>`, after the `mb-12` margin on the subtitle.
#### Standards List Expansion
The homepage should list all standards by name, not just count them. When the number of listed standards is less than the claimed count:
1. Count the actual items in the HTML
2. If fewer than the claimed "N+" number, add missing entries
3. Use the same icon+text pattern as existing entries
4. Include specific standard numbers where relevant (e.g., `IEC 60598`, `EU 2019/2020`)
Example addition pattern (each entry is a `flex items-center gap-4` div with SVG icon + h4 title + p description):
```html
<div class="flex items-center gap-4">
<div class="w-10 h-10 bg-amber-gold/20 rounded-lg flex items-center justify-center flex-shrink-0">
<svg class="w-5 h-5 text-amber-gold" fill="currentColor" viewBox="0 0 20 20">
<path fill-rule="evenodd" d="M6.267 3.455a3.066 3.066 0 001.745-.723 3.066 3.066 0 013.976 0 ..." clip-rule="evenodd"/>
</svg>
</div>
<div>
<h4 class="text-white font-medium">IEC (International Electrotechnical Commission)</h4>
<p class="text-sm text-gray-400">Global safety and performance standards (IEC 60598, IEC 60529)</p>
</div>
</div>
```
#### FAQPage Schema — JSON-LD Array Pitfall
When adding a second Schema type to existing JSON-LD on a homepage that already has a `CollectionPage` block, you have two options:
**Option A: Multiple `<script>` tags (works but causes JSON parsing issues)**
```html
<script type="application/ld+json">
{ "@context": "...", "@type": "CollectionPage", ... }
</script>
<script type="application/ld+json">
{ "@context": "...", "@type": "FAQPage", "mainEntity": [...] }
</script>
<script type="application/ld+json">
{ "@context": "...", "@type": "Organization", ... }
</script>
```
Each script tag is validated independently by schema validators. This works in practice.
**Option B: Single `<script>` tag with JSON array (recommended for JSON parsability)**
```html
<script type="application/ld+json">
[
{ "@context": "...", "@type": "CollectionPage", ... },
{ "@context": "...", "@type": "FAQPage", "mainEntity": [...] },
{ "@context": "...", "@type": "Organization", ... }
]
</script>
```
⚠️ **Critical:** The JSON array MUST be valid — wrap everything in `[...]` brackets. Without the outer `[]`, the JSON is `{...},{...},{...}` which is technically invalid JSON (multiple root objects). Browsers/schema validators often tolerate this, but:
- Python `json.loads()` will fail with `Extra data` error
- Programmatic extraction via regex will break
- Some AI crawlers may fail to parse it
**Always use Option B (array wrapping) for clean JSON.** When editing an existing file with `patch`, the JSON suffix must end with `}]` (not just `}`) to close both the inner object and outer array.
#### Pitfall: Third-party Scanning Tools May Be Wrong
Many automated SEO/GEO scanning tools only check the homepage's raw HTML and extrapolate their findings to the entire site. They may report:
- "No Schema" when subpages all have Schema
- "No OG tags" when subpages all have OG tags
- "No title/meta description" when these exist on all pages
**Always verify scanning tool claims by actually checking the file system** (`grep -r` / `find -exec`) before acting on them. This is especially true for claims about Schema structured data — the tool may have only checked the index page vs. all subpages.
## Module 7: Monitoring & Iteration
**GA4 setup:**
```javascript
// Insert in <head> of all pages
<!-- Google tag (gtag.js) -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXX"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-XXXXXXXX');
</script>
```
**Batch insertion pattern:**
```python
# Walk all HTML files
# Skip files that already contain the GA ID
# Insert after <head> tag
ga_code = "<!-- Google tag ... -->\n<script ...>...</script>\n"
content = content.replace("<head>", "<head>\n" + ga_code, 1)
```
**Monitoring checklist:**
1. GA4 → Acquisition → Traffic acquisition → filter by "source / platform" for AI platforms
2. Google Search Console → Pages → check sitemap coverage
3. Periodic Prompt testing: ask AI models questions about the topic, check if encyclopedia is cited
4. Page freshness: update `modified_time` meta tag every 90 days
## Multi-Model Cross-Validation Audit Protocol
Before optimizations and after each phase, run a cross-validation audit using **3 independent AI models** through a single API (e.g., OpenRouter). This catches blind spots that any single model has:
### Audit Prompt Template
Send the homepage HTML to each model with this prompt:
```
You are an expert GEO (Generative Engine Optimization) auditor. Analyze this homepage HTML and score it across these dimensions (each 0-10):
1. Structured Data (JSON-LD quality, diversity, correctness)
2. Entity Richness (brand names, standards, locations, products mentioned in visible text)
3. E-E-A-T (author signals, about page, external citations, publisher info)
4. Multimodal Readiness (og:image, alt text, meta tags)
5. LLM Friendliness (clear hierarchy, dense factual content, machine-readable format)
6. Localization Coverage (geographic scope, language targeting, regional standards)
7. Internal Linking (breadcrumb, related articles, category links)
For each dimension: score, brief justification, and specific HTML evidence. Then provide a total out of 70 and a ranked list of the top 5 improvements.
[HTML content here]
```
### Model Selection Strategy
Use 3 models with different training data distributions:
| Model | Strength | Focus |
|-------|----------|-------|
| **Claude Sonnet 4** (anthropic/claude-sonnet-4-20250402) | Best at E-E-A-T and credibility assessment | Judges how authoritative the site appears |
| **Gemini 2.0 Flash** (google/gemini-2.0-flash-001) | Best at structured data and entity extraction | Catches missing schemas and entities |
| **DeepSeek V3 0324** (deepseek/deepseek-chat-v3-0324 or nebius variant) | Best at LLM-friendliness and content structure | Evaluates how well AI crawlers can parse content |
### Cross-Reference Scoring
Average the 3 scores for a realistic assessment. Key insights:
- **If Claude scores lower** on E-E-A-T than others → need author/team pages and external citations
- **If Gemini scores lower** on structured data → missing or invalid Schema types
- **If DeepSeek scores lower** on LLM-friendliness → content is too sparse or unstructured
### Before/After Comparison
Always re-run the full 3-model audit after each phase to measure score deltas. Expected improvements:
| Improvement | Expected Delta |
|-------------|---------------|
| Homepage Schema upgrade (WebSite→WebSite+Org+Person) | +1.0-2.0 E-E-A-T |
| Person Schema with @id cross-reference | +0.5-1.0 E-E-A-T |
| Author pages with AboutPage Schema | +1.0-1.5 E-E-A-T |
| Standards listed as external links | +0.5-1.0 E-E-A-T |
| BreadcrumbList Schema | +0.3-0.5 Structured Data |
| ItemList Schema (for multi-category sites) | +0.5-1.0 Longitudinal linking |
| Hero LLM summary paragraph | +1.0-1.5 LLM Friendliness |
| og:image (1200x630) | +0.5-1.0 Multimodal |
| City links made clickable | +0.5-1.0 Internal Linking |
### API Call Pattern (OpenRouter)
```bash
curl -s https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-sonnet-4-20250402",
"messages": [{"role": "user", "content": "Audit this HTML..."}],
"max_tokens": 8192
}'
```
Save all 3 results and the averaged report as `encyclopedia_geo_audit_report.md` for future reference.
## Homepage Schema Architecture (Proven Pattern)
This is the **most impactful single GEO change** for a static encyclopedia homepage. Replace a single `WebSite` schema with 5 interconnected blocks:
| # | Schema Type | @id | Key Properties | Purpose |
|---|-------------|-----|----------------|---------|
| 1 | `WebSite` | (none) | `name`, `alternateName`, `description`, `about`, `inLanguage`, `issn`, `image` | Core identity for search engines |
| 2 | `Organization` | `#organization` | `name`, `logo`, `foundingDate`, `knowsAbout[]`, `sameAs[]` (social links) | Brand authority signal |
| 3 | `Person` | `/about/#person` | `name`, `alternateName`, `description`, `knowsAbout[]` | Editorial team credibility |
| 4 | `BreadcrumbList` | `#breadcrumb` | `itemListElement` with position + name + item | Navigation context |
| 5 | `ItemList` | (none) | `name`, `numberOfItems`, `itemListElement[]` with position + name + url per category | Cross-vertical visibility |
### Critical: @id Cross-References
The Person Schema in the homepage **must** share the same `@id` as the `/about/` page's Person Schema. This creates a semantic link that search engines use to verify author credibility:
```json
// Homepage
{
"@type": "Person",
"@id": "https://www.domain.com/about/#person",
"name": "Editorial Team"
}
// /about/ page
{
"@type": "AboutPage",
"mainEntity": { "@id": "https://www.domain.com/about/#person" }
}
```
### Article Schema Author Upgrade
All sub-page TechArticle/Article schemas should reference the same `@id` instead of using a flat `"@type": "Organization"`:
```json
// Before (weak)
"author": {
"@type": "Organization",
"name": "TopAIGEO Lighting Encyclopedia"
}
// After (strong — @id cross-ref to homepage Person)
"author": {
"@type": "Person",
"@id": "https://www.topaigeo.com/about/#person",
"name": "TopAIGEO Editorial Team"
}
```
Batch-update all article files using Python:
```python
author_old = '"author": {\n "@type": "Organization",\n "name": "Encyclopedia Name"\n }'
author_new = '"author": {\n "@type": "Person",\n "@id": "https://domain/about/#person",\n "name": "Editorial Team Name"\n }'
# Find files with "author" key, check not already updated (no @id), replace
```
Also standardize the publisher name across all pages:
```python
publisher_old = '"name": "Old Publisher Name"'
publisher_new = '"name": "Unified Brand Name"'
```
## About Page Creation Pattern
Create an `/about/` page with `AboutPage + Person` Schema. nginx config with `try_files $uri $uri.html` means `/about` maps to `/about.html` automatically. Handle trailing slash:
```nginx
location = /about/ { return 301 /about; }
```
The about page should include:
- Exact same `@id` as homepage Person Schema
- `knowsAbout` array with 4-5 domain-specific expertise areas
- `affiliation` linking to the Organization
- Visible editorial standards (data-driven, vendor-neutral, regularly updated)
- Team member descriptions (not fabricated individuals — use team/entity names)
## LLM-Friendly Hero Summary
Insert a compact, data-dense paragraph in the Hero section. This is the content AI crawlers will most likely quote:
```html
<!-- LLM-friendly summary for AI search engines -->
<div class="mb-6 text-white/80 text-sm leading-relaxed" style="max-width:560px">
<p><strong>BrandName</strong> is a [platform type] that helps [audience] get cited by
Google AI Overviews, ChatGPT Search, Perplexity, and Bing Copilot. Our
<a href="/encyclopedia/topic/">Topic Encyclopedia</a> covers N+ articles on
[list key topics, standards, coverage across M+ countries], optimized for
AI search engine citation and brand visibility.</p>
</div>
```
**Key phrases AI models look for:** AI search engine names (Google AI Overviews, ChatGPT Search, Perplexity, Bing Copilot), specific standard names (IES, UL, CIE, IEC), quantified claims (179+ articles, 50+ countries).
## Homepage Hero Slider Implementation
For static HTML encyclopedia homepages, replace the plain gradient Hero background with a full-screen image slider:
### Structure
```html
<section class="relative overflow-hidden min-h-[90vh] flex items-center">
<!-- Image Slides -->
<div class="hero-slider">
<div class="hero-slide active" style="background-image: url('slide1.jpg');"></div>
<div class="hero-slide" style="background-image: url('slide2.jpg');"></div>
<div class="hero-slide" style="background-image: url('slide3.jpg');"></div>
<div class="hero-slide" style="background-image: url('slide4.jpg');"></div>
</div>
<!-- Semi-transparent overlay so text remains readable -->
<div class="hero-overlay"></div>
<!-- z-index content: slightly above overlay -->
<div class="relative z-[3]">...existing hero content...</div>
</section>
```
### CSS
```css
.hero-slider { position: absolute; inset: 0; overflow: hidden; }
.hero-slide {
position: absolute; inset: 0;
background-size: cover; background-position: center;
opacity: 0; transition: opacity 1.5s ease-in-out;
}
.hero-slide.active { opacity: 1; }
.hero-overlay {
position: absolute; inset: 0;
background: linear-gradient(135deg, rgba(250,250,248,0.92) 0%, rgba(250,250,248,0.85) 50%, rgba(250,250,248,0.78) 100%);
z-index: 1;
}
```
### JS (add before `</body>`)
```javascript
(function() {
var slides = document.querySelectorAll('.hero-slide');
if (slides.length < 2) return;
var current = 0;
setInterval(function() {
slides[current].classList.remove('active');
current = (current + 1) % slides.length;
slides[current].classList.add('active');
}, 5000);
})();
```
### Image Selection Rules
- Use 3-5 high quality, high resolution photos (1920x1080 minimum)
- Pick diverse scenes covering different sub-topics (indoor, outdoor, kitchen, living room)
- Semi-transparent overlay (85-92% opacity of page background color) keeps text readable
- Keep the ambient glow effects (`gradient-light`, blur circles) for depth
- All existing hero content (headline, subtitle, LLM summary, search bar, stats) stays unchanged below overlay
### Pitfall: Z-Index Stacking
The overlay `.hero-overlay` needs `z-index: 1`, the decorative glow effects need `z-index: 2`, and the content needs `z-index: 3`. Without proper z-index, either the images show through the overlay making text unreadable, or the glow effects hide behind the slides.
## Scene/Image Gallery Section
Add a "Lighting in Action" image grid between Hero and content sections to increase visual richness:
```html
<section class="py-16 lg:py-24 bg-white">
<div class="grid grid-cols-2 md:grid-cols-4 gap-4 lg:gap-6">
<div class="scene-card relative rounded-2xl overflow-hidden aspect-square group cursor-pointer">
<img src="..." alt="..." class="w-full h-full object-cover scene-grid-img" loading="lazy">
<div class="scene-label">Label text (slides up on hover)</div>
</div>
<!-- repeat for each image -->
</div>
</section>
```
CSS:
```css
.scene-grid-img { transition: all 0.5s ease; }
.scene-grid-img:hover { transform: scale(1.05); box-shadow: 0 20px 60px rgba(212,165,116,0.3); }
.scene-label {
position: absolute; bottom: 0; left: 0; right: 0;
padding: 16px; background: linear-gradient(transparent, rgba(45,36,32,0.8));
color: white; font-weight: 500;
transform: translateY(100%); transition: transform 0.3s ease;
}
.scene-card:hover .scene-label { transform: translateY(0); }
```
## External Image Source Ingestion
### Pattern: Batch Download from Brand/Manufacturer Sites
When the encyclopedia needs high-quality product/application images, source them from major brand sites (Kichler, Philips, etc.):
1. **Browse brand site** → identify Contentful CDN or similar image hosting URL patterns
2. **Extract image URLs** from the page via `browser_console`:
```javascript
Array.from(document.querySelectorAll('img')).map(i => i.src)
```
3. **Decode CDN URLs**: Many use Next.js image optimization which wraps the real URL in `_next/image?url=ENCODED_URL`. Extract the original URL from the `url` query parameter
4. **Download** with `curl` using the raw CDN URL (not the Next.js proxy):
```bash
curl -s -o "target.jpg" "https://images.ctfassets.net/.../hero.jpg"
```
5. **Pick diverse scenes**: 10 images covering different room types and product categories
6. **Store in encyclopedia assets**: Save to `/encyclopedia/topic/assets/images/`
7. **Create symlinks** for short-name references used in og:image and schema:
```bash
ln -sf "kichler-01-full-name-12345.jpg" "kichler-01-kitchen-mikale.jpg"
```
### Image Selection Criteria for GEO
- 1920x1080 resolution minimum (aspect-video cards need 16:9)
- Real-world installation photos (not product-only white background shots)
- Diverse: kitchen, bathroom, living room, office, outdoor
- Warm lighting photos (2700-3000K) appeal more for residential encyclopedia
- **og:image must be 1200x630px** — use the most representative scene photo resized
### Image Rights Note
Only use images from brand sites that explicitly allow sharing/embedding. Kichler.com's Contentful CDN is publicly accessible — their images are intended for retailer/distributor use. Provide attribution by mentioning the brand name in the image alt text (e.g., "Kitchen lighting with Mikale pendants by Kichler").
## QAPage Schema for All Article Pages
Beyond FAQPage (for index pages), every content article gets its own **QAPage** schema. This is the single most important GEO signal for individual pages.
### Structure
```json
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "QAPage",
"mainEntity": {
"@type": "Question",
"name": "Page title as question",
"text": "Page title as question",
"acceptedAnswer": {
"@type": "Answer",
"text": "Extract from Quick Answer box (first 1-2 sentences)",
"url": "https://...canonical-url",
"author": {
"@type": "Person",
"@id": "https://domain/about/#person",
"name": "Editorial Team"
}
},
"author": { "@type": "Organization", "name": "Brand Name" }
}
}
</script>
```
### Batch Insertion Script Pattern
Use awk to insert before `</head>`:
```bash
awk -v qa="$QAPAGE_JSON" '/<\/head>/ {print qa} 1' "$file" > "${file}.tmp" && mv "${file}.tmp" "$file"
```
Or for Python execute_code:
```python
patch(
path="file.html",
old_string='<meta name="twitter:card" content="summary_large_image">',
new_string='<meta name="twitter:card" content="summary_large_image">\n <meta property="og:image" content="https://...">\n <meta name="twitter:image" content="https://...">'
)
```
### Quick Answer Text Extraction
The Quick Answer box content (`<div class="quick-answer">`) provides the Answer text. Extract with:
```bash
qa=$(grep -oP 'Quick Answer</strong>.*?<p[^>]*>\K[^<]+' "$file" | head -1)
```
Fallback to first `<p>` text if no Quick Answer exists.
## FAQPage Internal Linking Strategy
Every Answer in FAQPage schema **must** end with a full URL to a relevant article:
```json
{
"@type": "Answer",
"text": "Answer content here with specific data. Full guide: https://domain/encyclopedia/topic/article-slug"
}
```
This creates a knowledge graph that AI crawlers traverse. When ChatGPT Search or Perplexity cites the FAQ answer, the link leads them to deeper content on the same site.
### Batch FAQPage Injection by Category
Split FAQs by category index page:
| Page | Questions | Topic |
|------|:---------:|-------|
| Homepage | 20 | Broad high-intent questions |
| Products | 10 | Ceiling, bathroom, garage, warehouse |
| Parameters | 10 | CRI, CCT, lumen, IP, beam, UGR |
| Standards | 10 | UL/ETL, CE, RoHS, ERP, IEC, NFPA |
| Scenes | 10 | Living room, office, retail, hospital |
| Troubleshooting | 10 | Flickering, buzzing, ghosting, water |
### Customer Referral Link Integration
After building an encyclopedia's knowledge content, integrate **customer referral links** so AI search citations also drive traffic to the client's site:
**Placement Strategy (by priority):**
| Location | Type | Example |
|----------|------|---------|
| 🥇 Homepage Hero badge | Inline CTA link | `Browse Certified Lighting Products →` next to trusted-badge |
| 🥇 Article footer | CTA card before Sources | `Need to source these products?` + amber button |
| 🥇 Category page top | Colored banner | `Looking for verified [topic] suppliers?` gradient bar |
| 🥈 Sidebar / Supplier Modal | Button | `Browse Products` with logo |
| 🥉 Footer nav | Simple link | `💡 Lighting Products` in brand column |
**Template for article footer CTA:**
```html
<div style="margin:2em 0;padding:20px 24px;background:linear-gradient(135deg,#f5f0e8,#fafaf8);border:1px solid #d4a574;border-radius:12px;text-align:center;">
<p style="margin:0 0 8px;font-size:1rem;font-weight:600;color:#2d2420;">💡 Need to source these lighting products?</p>
<p style="margin:0 0 12px;font-size:0.9rem;color:#6b5b4f;">Browse verified LED lighting products from certified suppliers at <strong>KS Import & Export</strong>.</p>
<a href="https://client.com/product/?utm_source=lighting_encyclopedia&utm_medium=article_footer&utm_campaign=client" target="_blank" style="display:inline-block;padding:10px 24px;background:#d4a574;color:white;border-radius:8px;text-decoration:none;font-weight:500;font-size:0.9rem;">Browse Lighting Products →</a>
</div>
```
**Template for banner (category pages):**
```html
<div class="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 pt-8">
<a href="https://client.com/product/?utm_source=lighting_encyclopedia&utm_medium=category_banner&utm_campaign=client" target="_blank" class="block w-full p-4 bg-gradient-to-r from-amber-gold/10 to-deep-brown/5 rounded-2xl border border-amber-gold/20 hover:border-amber-gold/40 transition-all text-center">
<span class="text-sm text-deep-brown">💡 Looking for verified [topic] suppliers? Visit <strong class="text-amber-gold">Client Name</strong> — certified partner →</span>
</a>
</div>
```
**UTM Convention:**
- `utm_source`: `lighting_encyclopedia` (fixed)
- `utm_medium`: `hero_banner`, `category_banner`, `article_footer`, `homepage_logo`, `supplier_modal`
- `utm_campaign`: `client` (or specific campaign name)
**Principle: Don't modify FAQ Schema links to point to customer sites.** FAQ Schema's Answer text with internal links serves a distinct purpose — building a knowledge graph that AI crawlers traverse for citations. The customer gets exposure through visible CTAs on every page.
### Insertion Pitfalls
1. **Bash `$` expansion in heredocs**: Use `read -r` with `<< 'EOF'` (quoted EOF prevents variable expansion) or Python's `patch()` function
2. **Double `</script>` tags**: Some static pages have `</script></script>` (one from Tailwind config, one extra) — patch expects this exact pattern
3. **Multi-line awk + sed**: Awk insertion is most reliable for adding before `</head>`
4. **JSON encoding with `@`**: `grep -c "Question"` works; `grep '"@type"'` may fail due to bash escaping of `@` — use single quotes and escape `"` properly
## Batch og:image Injection for All Article Pages
Use keyword matching to automatically assign the most relevant product image to each article:
1. Create a keyword→image map (e.g., "kitchen"→"kichler-01-kitchen.jpg", "bathroom"→"kichler-04-bathroom.jpg")
2. Extract filename from each HTML file → match keywords → assign og:image
3. Insert before `</head>`:
```html
<meta property="og:image" content="https://domain/path/to/image.jpg">
<meta name="twitter:image" content="https://domain/path/to/image.jpg">
```
4. Create symlinks in assets directory (short names → actual filenames) so og:image URLs resolve
## Pitfalls Discovered
### 1. Article vs TechArticle Schema Type
Both `"@type": "Article"` and `"@type": "TechArticle"` exist across different page categories (e.g., product pages use TechArticle, scene/application pages use Article). When batch-searching for schema files, search for **both** types, not just one:
```bash
# Wrong — misses Article pages
grep -l '"@type": "TechArticle"'
# Right — catches all
grep -l '"author"' | while read f; do
if ! grep -q '@id.*about/#person' "$f"; then echo "$f"; fi
done
```
### 3. Bash Heredoc Variable Expansion ($)
When using heredoc for JSON content in bash scripts, always quote the delimiter. **Without quotes, `$` inside JSON strings gets eaten by bash**, resulting in broken JSON:
```bash
# ❌ WRONG — bash expands $ signs inside JSON (e.g., ">=85 lm/W" loses the $)
read -r MY_JSON << EOF
{"text": ">=85 lm/W for non-directional"}
EOF
# Result: {"text": ">=85 lm/W for non-directional"} ← $85 expanded to empty!
# ✅ CORRECT — quoted EOF prevents variable expansion
read -r MY_JSON << 'EOF'
{"text": ">=85 lm/W for non-directional"}
EOF
# Result: correct JSON
```
**The same applies to `awk`:** When injecting JSON via awk in a shell script, use a single-quoted heredoc or pipe the JSON content directly rather than using a bash variable that might undergo expansion.
### 4. Multi-line sed Fails for JSON
`sed` cannot reliably replace multi-line JSON strings with different indentation in HTML files. Use Python's `str.replace()` or the `patch` tool instead:
```python
# DO: Python with exact multi-line match
content.replace(author_old, author_new)
# DON'T: Sed multi-line (fails on indentation and line-ending variants)
sed -i 's/"author": {\n "@type": "Organization"/.../'
```
### 3. Bash Heredoc Variable Expansion ($)
When using heredoc for JSON content in bash scripts, always quote the delimiter:
```bash
# ❌ WRONG — bash expands $ signs inside JSON (e.g., ">=85 lm/W" becomes ">=85 lm/W")
read -r MY_JSON << EOF
{"text": "$90 cost saving"}
EOF
# ✅ CORRECT — quoted EOF prevents variable expansion
read -r MY_JSON << 'EOF'
{"text": "$90 cost saving"}
EOF
```
Without quotes, any `$` in JSON strings (even in unrelated text like product prices `≥$85`) gets eaten by bash, resulting in broken JSON that doesn't have Question entries.
### 3. File Count Validation & Grep Escaping
After batch operations, always triple-verify:
```bash
grep -l 'old_value' --include='*.html' | wc -l # Should be 0
grep -l 'new_value' --include='*.html' | wc -l # Should equal file count
```
**Grep escaping pitfall:** When counting `@type` in JSON-LD, `grep -c '"@type": "Question"'` may return 0 due to bash escaping of `@` and `"`. Use simpler patterns:
```bash
# ✅ Works reliably
grep -c "Question" file.html
# ❌ May fail in bash scripts
grep -c '"@type": "Question"' file.html
```
Also verify online via curl with the same simple pattern:
```bash
curl -s "https://site.com/page" | grep -c "Question"
```
## Execution Order (Recommended)
```
Phase 1 (Foundation — same day):
□ SEO Titles & Meta Descriptions batch rewrite
□ Structured Data (JSON-LD) injection
□ Quick Answer Box injection
□ E-E-A-T signals (Author + dates + sources)
□ Internal link network
□ Semantic HTML tags
□ Homepage: Standards external links (IES, UL, CIE, etc.)
□ Homepage: City links (make plain text city names clickable)
□ Homepage: og:image generation and injection
Phase 2 (Credibility & Depth — same or next day):
□ Homepage Schema architecture: WebSite + Organization + Person + BreadcrumbList + ItemList
□ Create /about/ team page with AboutPage + Person Schema (@id cross-ref)
□ Article sub-page Schema: upgrade author to @id Person, unify publisher name
□ Hero section LLM-friendly summary paragraph
□ Add About link to navigation and footer
Phase 3 (Infrastructure & Scale):
□ GA4 code on all pages
□ llms.txt + llms-full.txt
□ sitemap.xml (regenerate)
□ Google Search Console + IndexNow submission
□ Weekly freshness cron job
Phase 4 (Incremental):
□ GEO Prompt pages (/answers/ directory)
□ Multi-platform distribution pipeline
□ Monthly content refresh cycle
□ Re-run 3-model audit quarterly
```
## Verification: Complete 3-Model Audit After All Phases
After all optimizations, re-run the 3-model audit and compare to the baseline report. Expected improvement from baseline to post-Phase 2: ~15 points out of 80 (55→70).
## NEW in v1.2.0: FAQ Expansion to 100+ Questions
### Strategy: Homepage + Category Pages
Rather than putting all 100+ questions on the homepage (which would bloat the page), distribute by topic:
| Page | Questions | Topic |
|------|:---------:|-------|
| Homepage | 20-50 | Broad high-intent + Commerce + Advanced/Long-tail |
| Products | 10 | Ceiling, bathroom, garage, warehouse |
| Parameters | 10 | CRI, CCT, lumen, IP, beam, UGR |
| Standards | 10 | UL/ETL, CE, RoHS, ERP, IEC, NFPA |
| Scenes | 10 | Living room, office, retail, hospital |
| Troubleshooting | 10 | Flickering, buzzing, ghosting, water |
| **Total** | **70-100** | |
### Commerce & Standards Module Questions (English)
When users request FAQ expansion for GEO optimization, generate 10-20 English questions covering:
- **Export certifications**: "What certifications are required to export LED lights to the USA?" — answer with UL/ETL/FCC/Energy Star + customer link
- **EU compliance**: "What CE certifications do LED lights need for the European market?" — LVD/EMC/RoHS/ERP + customer link
- **E-commerce photography**: "How do I choose the right color temperature for e-commerce product photography?" — 5000-5500K daylight + CRI 95+ + customer link
- **Regional standards**: Australia (AS/NZS), commercial kitchen (IP65/NSF), DarkSky compliance
- **ROI calculation**: "How do I calculate ROI when switching to LED lighting in a commercial building?"
- **Emergency lighting**: NFPA 101, IBC requirements
### Advanced & Long-Tail Module Questions (English)
Generate 20+ English questions for deep coverage:
- Niche applications: shower area LED strips (IP67), cold environments (walk-in freezers), insulated ceilings (IC-rated), hazardous locations (Class I Div 1/2)
- Technical comparisons: 0-10V vs TRIAC dimming, DALI vs Zigbee, Type A/B/C LED tubes, CC vs CV drivers
- Color science: R9 value, TM-30 metrics (Rf/Rg), SDCM binning, green/purple tint causes
- Practical calculations: lumens per sq ft, max lights on 15A circuit, driver wattage calculation, voltage drop prevention
- Commercial design: perimeter retail shelf lighting, wireless office controls, motion sensor compatibility
### Each Answer MUST end with a customer referral link
```json
{
"@type": "Answer",
"text": "Technical answer content here. Browse certified products: https://customer.com/product/?utm_source=encyclopedia&utm_medium=faq_schema&utm_campaign=customer"
}
```
### Pitfall: JSON-LD Array vs Multiple Script Tags
When the homepage already has multiple Schemas (CollectionPage + FAQPage + Organization), the JSON must be wrapped in an array:
```json
<script type="application/ld+json">
[
{ "@context": "...", "@type": "CollectionPage", ... },
{ "@context": "...", "@type": "FAQPage", "mainEntity": [...] },
{ "@context": "...", "@type": "Organization", ... }
]
</script>
```
Without the outer `[]` brackets, parsing tools may fail. When using `patch()` to extend an existing FAQ, the replacement must end with `]}]` to close both the inner object and outer array.
### Pitfall: Duplicate Tags After Batch Operations
When extending FAQPage JSON-LD via `patch()`, watch for double `</script>` tags. The patch replacement may leave both the old closing `</script>` and the new one, resulting in:
```html
</script>
</script>
```
Fix by verifying with `grep -n '</script>'` after each batch operation.
### Pitfall: Bash $ Expansion in FAQ JSON Answers
When writing FAQ JSON via bash heredoc in a script, `$` symbols (e.g., ">=85 lm/W", "$90 cost") get eaten by bash if the heredoc delimiter isn't quoted:
```bash
# ❌ WRONG — $85 becomes empty
read -r FAQ << EOF
{"text": ">=85 lm/W for non-directional"}
EOF
# ✅ CORRECT
read -r FAQ << 'EOF'
{"text": ">=85 lm/W for non-directional"}
EOF
```
For complex multi-line JSON with prices and standard numbers, use Python `patch()` instead of bash scripts.
### Pitfall: og:image Duplication
After batch og:image injection, the homepage may end up with duplicate og:image meta tags (one from original template, one from og:image batch). Also, the injected path may differ from the original (e.g., `assets/og.jpg` vs `assets/images/og.jpg`).
**Fix:** Check with `grep -n 'og:image' index.html` and deduplicate manually. Verify the correct file exists at the referenced path.
### Pitfall: Accidental Deletion of twitter:card/twitter:image
When cleaning up duplicate meta tags, be careful not to delete the only twitter:card or twitter:image tag. After cleanup, verify:
```bash
grep -c 'twitter:card' index.html # Should be 1
grep -c 'twitter:image' index.html # Should be 1
grep -c 'og:image' index.html # Should be 4 (URL+width+height+alt)
```
### Pitfall: FAQ Count Mismatch After Patch
When extending a JSON-LD FAQPage array with `patch()`, the final question count may not match expectations due to:
1. The patch replacing more or fewer items than intended
2. Old `]}]` closing the array early vs new content
**Always verify** with `grep -c "Question" file.html` after each patch operation — don't assume the count is correct.
### Post-Optimization Audit Checklist
After all GEO optimization phases, run a comprehensive audit:
```bash
# 1. Page health
curl -s -o /dev/null -w "%{http_code}" "https://domain/page" # all 200
# 2. Schema coverage
grep -c "FAQPage" index.html # 1+
grep -c "Question" index.html # 20-50
grep -c "QAPage" article/*.html # all articles
# 3. No duplicate meta
grep -c "og:image" index.html # 4 (URL+width+height+alt)
grep -c "twitter:card" index.html # 1
grep -c "og:locale" index.html # 1
# 4. Image access
curl -s -o /dev/null -w "%{http_code}" "https://domain/path/to/image.jpg" # 200
# 5. Customer link coverage
grep -rl "customer.com/product/" encyclopedia/ --include='*.html' | wc -l
# 6. No Chinese text (for English sites)
grep -cP '[\x{4e00}-\x{9fff}]' index.html # 0
# 7. Closing tags
grep -c '</html>' index.html # 1
grep -c '</body>' index.html # 1
```
## City Directory Page Handling
City store pages (e.g., `us/new-york-lighting-stores.html`) have different requirements:
| Feature | Apply? | Reason |
|---------|--------|--------|
| GA4 | ✅ Yes | Universal tracking |
| Canonical | ✅ Yes | SEO basics |
| Structured Data | ✅ LocalBusiness + ItemList | Google Maps integration |
| Quick Answer | ❌ No | Store directories don't need QA |
| Author/EEAT | ❌ No | Listing pages, not articles |
| Semantic HTML | ✅ Yes | Universal improvement |
| Internal links | ✅ Yes | Link between nearby cities |
## Common Pitfalls
### 1. HTML Template Variations
Not all pages use the same `<div class="content">` wrapper. Some may use `<article>` or `<main>` directly. **Always check the actual HTML structure** before writing batch scripts.
### 1b. Article Tag May Not Exist on Some Pages
After applying semantic HTML Step 6 (`<div class="content">` → `<article>`), some pages may still lack `<article>` tags due to template variations (e.g., some products use `<div>`-only templates without the content wrapper). This causes subsequent batch operations (Quick Answer, depth expansion) to silently skip those pages.
**Fix:** When searching for content boundaries in batch scripts, use multiple fallbacks:
```python
# Strategy: try <article> first, fallback to <div class="container">, then <main>
pos = content.find('<article')
if pos < 0:
pos = content.find('<div class="container"')
if pos < 0:
pos = content.find('<main')
# For end boundary, use Related Articles marker as anchor
related_pos = content.find('<!-- Related Articles')
if related_pos < 0:
related_pos = content.find('<aside class="related-articles"')
```
### 1c. Multiple Batches Needed for Depth Expansion
Content word count expansion often requires multiple passes because: (1) different page types have different starting word counts, and (2) inserting too many paragraphs in one pass may push unrelated sections apart.
Best practice: run depth expansion in 2-3 rounds, each adding 2-4 data paragraphs per page, with a re-check after each round. Target word counts: 1500 minimum for basic pages, 2000+ for core content pages.
### 1d. Sources Block Insertion Position
Always insert Sources/References blocks **before** the Related Articles section (`<!-- Related Articles -->`), not after. This keeps supplementary content inside `<main>` and semantically grouped. If no Related Articles marker exists, insert before `</main>` or the last closing block.
### 2. Encoding Issues with DOMDocument
PHP's `DOMDocument::saveHTML()` converts non-ASCII characters to HTML entities (e.g., `℃` → `℃`). When processing via PHP, use `mb_substitute_character()` and `ENT_XML1` flags:
```php
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$output = $dom->saveHTML();
$output = html_entity_decode($output, ENT_QUOTES | ENT_XML1, 'UTF-8');
```
### 3. JSON-LD Array vs Single Object
When injecting JSON-LD, if a page might already have one `<script type="application/ld+json">`, use a `<script>` block array `[{BreadcrumbList}, {FAQPage}]` instead of creating a second script tag.
### 4. Breadcrumb Closing Logic
The breadcrumb `<nav>` must close BEFORE `<h2>`, but the breadcrumb `<div>` is inside the `<nav>`. The replacement pattern:
```
Before: <div class="breadcrumb">...</div>\n<h2>Title</h2>
After: <nav aria-label="breadcrumb"><div class="breadcrumb">...</div></nav>
\n<h2>Title</h2>
```
### 5. Container Element Closure
The `<main>` element wraps everything from container start through related articles:
```
<main class="container">
<nav>...breadcrumb...</nav>
<h2>...</h2>
<div class="meta">...</div>
<article class="content">...</article>
<aside class="related-articles">...</aside>
</main>
```
The closing `</div>` of the original container is typically just before `<!-- End Related Articles -->`.
### 6. GA4 Batch Insertion
When batch-inserting GA4 into 100+ files, also update the Google Analytics Measurement ID variable in the script. Use a variable so you only need to change it once:
```javascript
// BAD: hardcoded in script tag
gtag('config', 'G-XXXXXXXX');
// BETTER: use variable at top of script
const GA_MEASUREMENT_ID = 'G-XXXXXXXX';
```
### 7. Modified Time Doesn't Mean New Content
Updating `article:modified_time` tells AI crawlers the page is fresh, but if the actual content hasn't changed, AI engines will notice. Only bump the modified date when content has actually been refreshed.
### 8. Quick Answer Box Content Quality
The auto-generated Quick Answer might truncate awkwardly if the first paragraph doesn't start with a clear answer. For best results, hand-craft the Quick Answer for the top 10 most important pages.
### 9. Sitemap Priority for New Pages
New `/answers/` pages should have lower priority (0.7) initially, then raise to 0.8 after 30 days if they're getting traffic.
### 10. Don't Over-Engineer City Pages
City directory pages (50+ per encyclopedia) are listing pages, not content articles. Spend minimal optimization effort:
- GA4 + Canonical + LocalBusiness schema
- No Quick Answer, no complex JSON-LD
- Simple internal link structure
## Required Tools
| Tool | Purpose | Location |
|------|---------|----------|
| Python 3 | Batch processing scripts | Server default |
| Python `re` module | Regex for HTML parsing | stdlib |
| nginx | Static file serving | `/etc/nginx/` |
| curl | Testing HTTP endpoints | Default installed |
## Verification Checklist
After running the full pipeline:
```bash
# 1. Check GA4 coverage
grep -rl "G-XXXXXXXX" /path/to/encyclopedia/ --include="*.html" | wc -l
# 2. Check Quick Answer coverage (skip city/utility)
grep -rl "quick-answer" /path/to/encyclopedia/ --include="*.html" | wc -l
# 3. Check JSON-LD coverage
grep -rl "application/ld+json" /path/to/encyclopedia/ --include="*.html" | wc -l
# 4. Check semantic HTML
grep -rl "<main" /path/to/encyclopedia/ --include="*.html" | wc -l
grep -rl "<article" /path/to/encyclopedia/ --include="*.html" | wc -l
# 5. Check E-E-A-T signals
grep -rl "Author:" /path/to/encyclopedia/ --include="*.html" | wc -l
# 6. Verify sitemap
python3 -c "import re; xml=open('sitemap.xml').read(); print(f'{len(re.findall(\"<loc>\", xml))} URLs in sitemap')"
# 7. Check llms.txt
curl -s -o /dev/null -w "%{http_code}" https://domain/encyclopedia/topic/llms.txt
# 8. Verify robots.txt AI crawler rules
curl -s "https://domain/encyclopedia/topic/robots.txt" | grep -c "GPTBot"
# 9. Spot-check: is a scanning tool's negative finding actually correct vs actual page state?
# Many automated scanners only check the homepage and extrapolate — always verify with
# actual file system scans before spending time on "fixes"
```