Page MenuHomePhabricator

Enrich articles with schema.org metadata
Open, Needs TriagePublic

Description

This was discussed at various places but did not seem to have its own bug so far.

Schema.org is a metadata standard used by search engines. Google, Bing and similar search engines consume Schema.org markup on web pages and use it to weigh the content better, provide rich search results, and presumably to enrich Knowledge Graph and similar semantic databases. By providing Schema.org metadata Wikipedia articles could become easier to find.

Schema.org defines various content types; some of these are highly relevant to Wikipedia, e.g. MedicalEntity and its subtypes, MedicalSignOrSymptom, Country / City, CreativeWork subtypes, Corporation...

Schema.org markup is used to annotate existing text content, not to add new, invisible content, so it's not clear how an attempt to add it to Wikimedia content should look like - a new software feature? a wikiproject? a mix of the two? Nevertheless, until we figure out, it's nice to have a central place for dicussion.


See also:

  • T106651 Schema.org in emails
  • T33338 OpenGraph (another major metadata standard, more focused on creating snippets for sharing on social sites)

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added projects: SEO, MediaWiki-General.
Tgr subscribed.
In T64811#667863, @greg wrote:

See also: https://www.mediawiki.org/wiki/Extension:HTML_Tags

That extension was written to implement Schema.org metadata (Full disclosure: I funded Max Klein to lead the effort, who subcontracted out to Yaron to write the extension, again, through my work at Creative Commons). It can probably be extended to fit multiple use-cases including twittercards, fb fauxpen graph, etc, if wanted.

Seems to me fixed by T130034: Set $wgAllowMicrodataAttributes = true for all wikis by default - please verify and close if that applies. Thanks.

Enabled but not fixed. Although if you think it should be handled entirely by the existing content mechanisms such as templates and no software support is needed, this should probably be closed as invalid. IMO at least schema validation by MediaWiki would be nice.

@Tgr I can't imagine off head how the editable page content could be automagically enriched with schema.org metadata. On the other hand I agree that validation should be done. Let's keep it open then. Thanks for clarification.

Kingsley left a related comment here about which URLs to use as property values: https://twitter.com/kidehen/status/1303031433188048897