Project Github: https://github.com/wikimedia/phoenix
The project deals with chunking Wikipedia pages into sections, collecting Wikilinks, and generating QID backlinks for Wikipedia articles.
Tasks
- We already do section parsing
- Compare their sections to our sections
- Compare their link parser to ours
- Evaluate how they convert words into QIDs
Suggestions:
Stephanie suggests that they use Rosette APIs for this mapping. https://www.rosette.com/. The code to import Rosette is here: https://github.com/wikimedia/phoenix/tree/master/import
Deliverable:
A short report on the features of Phoenix Project. Pros and Cons of their approach and if we can adopt some or all of their features. Can we scale their approach for WME?