Change Details

Project Github: https://github.com/wikimedia/phoenixinvestigate and document the Phoenix project, to understand what it could mean in light of our Sections, wikilinks QID as well as Entity Type work. The project deals withPhoenix is - an WMF experimental service demonstrating the value of a structured content store - chunking Wikipedia pages into sections, collecting Wikilinks, and generating QID backlinks for Wikipedia articles.connects them to QIDs - using Rosette (3rd party ML) to predict entity types based on QIDs - Github: https://github.com/wikimedia/phoenix Tasks [X] We already do section parsing [] Compare their sections to our sections [] Compare their link parser to ours [] Evaluate how they convert words into QIDs Suggestionlinks into QIDs [] Evaluate Rosette's entity types: https://www.rosette.com/capability/entity-extractor/#tech-specs, look into free API and their results. Evaluate what it would mean to have something similar running inhouse. Resources: Stephanie suggests that they use Rosette APIs for this mapping. https://www.rosette.com/. The- code to import Rosette is here: https://github.com/wikimedia/phoenix/tree/master/import Deliverable: A short report on the features of Phoenix Project as well as Rosette. Pros and Cons of their approach and if we can adopt some or all of their features. Can we scale their approach for WME?