Sources (or references) are one of the key atomic units of Wikipedia, the base on which all credibility is built. Some language projects have created lists that explain and perpetuate their sourcing standard, one which WME appreciates, and will not and cannot affect.**//User Story//**: As an Enterprise Streaming API customer, En.wiki uses this Perennial Sources pageI want to be able to understand the quality of references that are used in a new revision to better make decisions about ingestion.
The first major step would demand parsing the en.Wiki [[ https://en.wikipedia.org/wiki/Wikipedia:Reliable_sources/Perennial_sources | "perennial sources list" ]] into a signal (using a minimal set of metadata like the 'banned' and 'considered reliable' tags + the RegEx for the url strings of those sites - refreshed each month)References are one of the key atomic units of Wikipedia, the base on which all credibility is built. Some language projects have created lists that explain and perpetuate their sourcing standard, one which WME appreciates, and will not and cannot affect. En.wiki uses the Perennial Sources page.
Can Credibility Signals serialize this information and output it to the reuser to help them judge the claims they are ingesting?
User Story: As an Enterprise Streaming API customer, I want to be able to understand the quality of sources that are used in an entry or a new edit to better make decisions about ingestion.
**Acceptance criteria**
[] Investigation on engineering method to best do this. {Timebox 4 days}
- does this involve parsing all new references?
[] Ideal — a working PoC end quarter
Please see [[ https://docs.google.com/document/d/1TFwOvQ1p6ULX3JF5DQcIiJuDW61f7sBoXYUGh9-VfzI/edit | the following document ]] for scope and acceptance criteria