Page MenuHomePhabricator

Investigate features of Phoenix Project
Open, LowPublic

Description

Project Github: https://github.com/wikimedia/phoenix

The project deals with chunking Wikipedia pages into sections, collecting Wikilinks, and generating QID backlinks for Wikipedia articles.

Tasks

  • We already do section parsing
  • Compare their sections to our sections
  • Compare their link parser to ours
  • Evaluate how they convert words into QIDs

Suggestions:
Stephanie suggests that they use Rosette APIs for this mapping. https://www.rosette.com/. The code to import Rosette is here: https://github.com/wikimedia/phoenix/tree/master/import

Deliverable:
A short report on the features of Phoenix Project. Pros and Cons of their approach and if we can adopt some or all of their features. Can we scale their approach for WME?

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone

Event Timeline

The requirement is to find all internal wikilinks on the page, then parse the child pages and extract all the Wikidata links (probably more appropriate to extract jut the right panel Wikidata links). Associate the child QID URLs with the wikilinks on the original page.