XML Sitemaps
This guide covers how our XML sitemaps work, common pitfalls, and the constraints that keep Google Search Console (GSC) indexing stable.
Architecture Overview
Section titled “Architecture Overview”XML sitemaps live under src/pages/sitemaps/ and serve two layers:
- Sitemap index (
sitemap.xml.ts) — lists paginated child sitemaps - Paginated sitemaps (
page/[page]/sitemap.xml.ts) — contain the actual URLs
Each content type has its own index + page pair:
| Content type | Index | Page cap | Items per page | Data source |
|---|---|---|---|---|
| Articles | Count-driven | 60 | 2,000 | Rakiura |
| Destinations | Count-driven | 10 | 2,000 | CAPI |
| POIs | Count-driven | 38 | 2,000 | CAPI |
| Google News | Single sitemap | — | — | Rakiura |
| Legal | Single sitemap | — | — | Static |
The site-wide sitemap index is assembled by @astrojs/sitemap in astro.config.mjs, which merges auto-discovered routes with customSitemaps pointing at these endpoints.
CAPI Batching
Section titled “CAPI Batching”CAPI enforces a hard limit of 100 items per request. To fill a 2,000-item sitemap page, each page endpoint fires 20 parallel requests (CAPI_LIMIT = 100) and merges the results. This batching logic lives in the _graphql/api.ts files for each content type.
If any batch request fails, the endpoint checks whether all items are missing (returns 503 + Retry-After) or partial data was recovered (serves what it has).
Count Queries
Section titled “Count Queries”Articles and destinations use dedicated count queries to determine how many pages to list in their sitemap index. These are separate GraphQL operations from the list queries.
All three content types (articles, destinations, POIs) use the same count-driven pattern with a safety cap. Empty pages beyond actual content return 404 gracefully.
Critical Constraint: Index and Page Caps Must Match
Section titled “Critical Constraint: Index and Page Caps Must Match”The sitemap index determines how many child sitemaps Google will try to crawl. The page endpoint has a MAX_PAGE_LIMIT that 404s requests beyond it. These two values must agree. If the index advertises more pages than the page endpoint will serve, Google crawls sitemap URLs that return 404 — wasting crawl budget and creating noise in GSC.
Index: totalPages = Math.min(calculateTotalPages(count, PAGE_SIZE), MAX_PAGES) ^^^^^^^^^Page: if (page > MAX_PAGE_LIMIT) return 404 ^^^^^^^^^^^^^^These must be the same value.Current caps:
| Content type | Index cap (MAX_PAGES) | Page cap (MAX_PAGE_LIMIT) |
|---|---|---|
| Articles | 60 | 60 |
| Destinations | 10 | 10 |
| POIs | 38 | 38 |
If you need to raise a cap, update both the index and page endpoint together.
Incident: Feb 2026 GSC Indexing Spike
Section titled “Incident: Feb 2026 GSC Indexing Spike”What happened
Section titled “What happened”After migrating sitemaps to Astro (Feb 9–10, 2026), indexed pages in GSC spiked from ~116K to ~364K over one week, then gradually declined back to ~108K by late March.
Root cause
Section titled “Root cause”A Feb 11 follow-up commit (#734) made three changes that combined to inflate the sitemap indexes:
-
PAGE_SIZE dropped from 2,000 to 100 on page endpoints to match CAPI’s per-request limit, but the index still calculated pages as
count / PAGE_SIZE. Dividing by 100 instead of 2,000 produced 20x more sitemap pages. -
POI hard cap removed from the index entirely. With ~75K POIs and PAGE_SIZE=100, the index listed 750+ pages instead of 38.
-
Inaccurate count query — the count was embedded in the list query and returned inflated totals, making the indexes list even more pages than actual content warranted.
How it was fixed
Section titled “How it was fixed”- Mar 11 (PR #782) — Dedicated count queries introduced, providing accurate totals.
- Mar 19 (PR #814) — PAGE_SIZE restored to 2,000 via parallel batched requests. POI index hardcoded back to 38 pages.
Lessons
Section titled “Lessons”- PAGE_SIZE affects index math. Changing how many items a page holds also changes how many pages the index advertises. Always consider both sides.
- Hard caps are a safety net. Even with accurate counts, cap the index to prevent runaway page generation. The page endpoint already handles empty results gracefully (404).
- Count queries must be accurate. An inflated count means an inflated index. Use dedicated count operations, not counts embedded in list queries.
- Test sitemap changes by checking the index output. Before deploying, verify the index lists a reasonable number of child sitemaps. A sudden jump from 10 to 200 entries is a red flag.
Adding a New Sitemap
Section titled “Adding a New Sitemap”- Create
src/pages/sitemaps/{type}/sitemap.xml.ts(index) andsrc/pages/sitemaps/{type}/page/[page]/sitemap.xml.ts(pages). - If count-driven, add a dedicated count query in
_graphql/. - Set
MAX_PAGESin the index andMAX_PAGE_LIMITin the page endpoint to the same value. - If using CAPI, implement batched fetching (see existing
_graphql/api.tsfiles for the pattern). - Register the index URL in
astro.config.mjsundercustomSitemaps. - Add the sitemap to
robots.txt.
File Reference
Section titled “File Reference”src/pages/sitemaps/├── _utils/index.ts # generateSitemap, generateSitemapIndex, calculateTotalPages├── _graphql/│ ├── articles/ # Article list + count queries (CAPI)│ ├── destinations/ # Destination list + count queries (CAPI)│ └── editorialAssemblies/ # Editorial assembly list + count + news queries (Rakiura)├── articles/│ ├── sitemap.xml.ts # Index (count-driven, capped at 60)│ └── page/[page]/sitemap.xml.ts # Pages (MAX_PAGE_LIMIT=60, PAGE_SIZE=2000)├── destinations/│ ├── sitemap.xml.ts # Index (count-driven, capped at 10)│ └── page/[page]/sitemap.xml.ts # Pages (MAX_PAGE_LIMIT=10, PAGE_SIZE=2000)├── points-of-interest/│ ├── sitemap.xml.ts # Index (count-driven, capped at 38)│ └── page/[page]/sitemap.xml.ts # Pages (MAX_PAGE_LIMIT=38, PAGE_SIZE=2000)├── google-news/sitemap.xml.ts # Single sitemap (recent articles)└── legal/sitemap.xml.ts # Single sitemap (static legal pages)