Skip to content

XML Sitemaps

This guide covers how our XML sitemaps work, common pitfalls, and the constraints that keep Google Search Console (GSC) indexing stable.

XML sitemaps live under src/pages/sitemaps/ and serve two layers:

  • Sitemap index (sitemap.xml.ts) — lists paginated child sitemaps
  • Paginated sitemaps (page/[page]/sitemap.xml.ts) — contain the actual URLs

Each content type has its own index + page pair:

Content typeIndexPage capItems per pageData source
ArticlesCount-driven602,000Rakiura
DestinationsCount-driven102,000CAPI
POIsCount-driven382,000CAPI
Google NewsSingle sitemapRakiura
LegalSingle sitemapStatic

The site-wide sitemap index is assembled by @astrojs/sitemap in astro.config.mjs, which merges auto-discovered routes with customSitemaps pointing at these endpoints.

CAPI enforces a hard limit of 100 items per request. To fill a 2,000-item sitemap page, each page endpoint fires 20 parallel requests (CAPI_LIMIT = 100) and merges the results. This batching logic lives in the _graphql/api.ts files for each content type.

If any batch request fails, the endpoint checks whether all items are missing (returns 503 + Retry-After) or partial data was recovered (serves what it has).

Articles and destinations use dedicated count queries to determine how many pages to list in their sitemap index. These are separate GraphQL operations from the list queries.

All three content types (articles, destinations, POIs) use the same count-driven pattern with a safety cap. Empty pages beyond actual content return 404 gracefully.

Critical Constraint: Index and Page Caps Must Match

Section titled “Critical Constraint: Index and Page Caps Must Match”

The sitemap index determines how many child sitemaps Google will try to crawl. The page endpoint has a MAX_PAGE_LIMIT that 404s requests beyond it. These two values must agree. If the index advertises more pages than the page endpoint will serve, Google crawls sitemap URLs that return 404 — wasting crawl budget and creating noise in GSC.

Index: totalPages = Math.min(calculateTotalPages(count, PAGE_SIZE), MAX_PAGES)
^^^^^^^^^
Page: if (page > MAX_PAGE_LIMIT) return 404
^^^^^^^^^^^^^^
These must be the same value.

Current caps:

Content typeIndex cap (MAX_PAGES)Page cap (MAX_PAGE_LIMIT)
Articles6060
Destinations1010
POIs3838

If you need to raise a cap, update both the index and page endpoint together.

After migrating sitemaps to Astro (Feb 9–10, 2026), indexed pages in GSC spiked from ~116K to ~364K over one week, then gradually declined back to ~108K by late March.

A Feb 11 follow-up commit (#734) made three changes that combined to inflate the sitemap indexes:

  1. PAGE_SIZE dropped from 2,000 to 100 on page endpoints to match CAPI’s per-request limit, but the index still calculated pages as count / PAGE_SIZE. Dividing by 100 instead of 2,000 produced 20x more sitemap pages.

  2. POI hard cap removed from the index entirely. With ~75K POIs and PAGE_SIZE=100, the index listed 750+ pages instead of 38.

  3. Inaccurate count query — the count was embedded in the list query and returned inflated totals, making the indexes list even more pages than actual content warranted.

  • Mar 11 (PR #782) — Dedicated count queries introduced, providing accurate totals.
  • Mar 19 (PR #814) — PAGE_SIZE restored to 2,000 via parallel batched requests. POI index hardcoded back to 38 pages.
  1. PAGE_SIZE affects index math. Changing how many items a page holds also changes how many pages the index advertises. Always consider both sides.
  2. Hard caps are a safety net. Even with accurate counts, cap the index to prevent runaway page generation. The page endpoint already handles empty results gracefully (404).
  3. Count queries must be accurate. An inflated count means an inflated index. Use dedicated count operations, not counts embedded in list queries.
  4. Test sitemap changes by checking the index output. Before deploying, verify the index lists a reasonable number of child sitemaps. A sudden jump from 10 to 200 entries is a red flag.
  1. Create src/pages/sitemaps/{type}/sitemap.xml.ts (index) and src/pages/sitemaps/{type}/page/[page]/sitemap.xml.ts (pages).
  2. If count-driven, add a dedicated count query in _graphql/.
  3. Set MAX_PAGES in the index and MAX_PAGE_LIMIT in the page endpoint to the same value.
  4. If using CAPI, implement batched fetching (see existing _graphql/api.ts files for the pattern).
  5. Register the index URL in astro.config.mjs under customSitemaps.
  6. Add the sitemap to robots.txt.
src/pages/sitemaps/
├── _utils/index.ts # generateSitemap, generateSitemapIndex, calculateTotalPages
├── _graphql/
│ ├── articles/ # Article list + count queries (CAPI)
│ ├── destinations/ # Destination list + count queries (CAPI)
│ └── editorialAssemblies/ # Editorial assembly list + count + news queries (Rakiura)
├── articles/
│ ├── sitemap.xml.ts # Index (count-driven, capped at 60)
│ └── page/[page]/sitemap.xml.ts # Pages (MAX_PAGE_LIMIT=60, PAGE_SIZE=2000)
├── destinations/
│ ├── sitemap.xml.ts # Index (count-driven, capped at 10)
│ └── page/[page]/sitemap.xml.ts # Pages (MAX_PAGE_LIMIT=10, PAGE_SIZE=2000)
├── points-of-interest/
│ ├── sitemap.xml.ts # Index (count-driven, capped at 38)
│ └── page/[page]/sitemap.xml.ts # Pages (MAX_PAGE_LIMIT=38, PAGE_SIZE=2000)
├── google-news/sitemap.xml.ts # Single sitemap (recent articles)
└── legal/sitemap.xml.ts # Single sitemap (static legal pages)