Both Enterprise product pillars, Content integrity and Machine Readability, have an interest in most accurately parsing references across wikiprojects, starting with wikipedia, of course. Correctly doing so would give our reusers a lot of space to play with references from wiki for their own products.
One idea is to create our own dump of references from wikiprojects, for internal (WMF) and external (third-party) usage.
This following link contains such a dump, we need to understand: https://tarb.sawood-dev.us.archive.org/data
Acceptance criteria & To Do
- Speak to Francisco for general info from emails
- Analyze quality and usability of this dump, responding basic questions
- how complete is this dump?
- general thoughts on quality of parsing
- Meet with Francisco to clarify and define next steps
- (take notes if needed but no formal documentation necessary)