The Structured Data development team is building on the Structured Data on Commons project to begin a second phase we're calling Structured Data Across Wikimedia. This second phase will consider how to structure the content on text pages so that it is machine recognizable and relatable, to make reading, editing, and searching easier and more accessible. Specifically, the grant requires us to build infrastructure and tools to allow structured metadata to be added to other content across Wikimedia projects, including Wikipedia itself.
The goal of the project is to design and prototype a new system that aims to be flexible enough to serve all the kinds of metadata we might need to support in the near future.
The first area of action that has been identified is topical metadata to describe what a section of a Wikipedia article is about. This will be supported by data storage infrastructure that can structure section data in wikitext as its own entity and associate topical metadata with each section entity.
The project will investigating link analysis systems and concept relationships as ways to determine the topical metadata of a Wikipedia article's sections, via the blue interwiki links in Wikipedia articles. Relationships between items in the Wikidata ontology are also being considered to infer, and potentially identify, relevant concepts that are not explicitly mentioned in the text.