Page MenuHomePhabricator

Emit external-indexability metadata alongside the rendered article
Open, MediumPublic

Description

Emit the external-indexability metadata alongside the rendered article: <link rel="canonical"> pointing at the local mainspace URL (so that crawlers index the local surface rather than the cross-wiki source), a robots meta tag marking the page as indexable, and an hreflang self-declaration for the page's own language.

Whether to also emit an hreflang="mul" alternate pointing at the corresponding page on abstract.wikipedia.org is worth exploring during implementation — the mul value is ISO 639-2's "multiple languages" code, accepted by the major search engines, and abstract.wikipedia.org's source content is genuinely multilingual at source, so the match is semantically accurate if crawlers honour it. Enumerating every other language the cache has populated is explicitly not in scope: that would balloon the metadata surface and make the hreflang set drift silently as the cache changes, for no corresponding reader benefit. This is the implementation of the M2 preamble bullet "external-search-engine indexability metadata" and lives here because the metadata is emitted as part of the same rendered output the reader sees, not as a separate surface. Sitemap inclusion is a distinct concern and is handled in its own sub-bullet below.