This task covers the internship project to convert the wikitext-based article quality model to use Parsoid HTML as its source instead. This will have several benefits:
- More reliable feature extraction -- e.g., only 90% of references can be easily identified in wikitext (as opposed to HTML).
- Extend the model with new features -- e.g., potentially valuable features like infoboxes and article maintenance messages are very difficult to extract consistently from wikitext but far easier in the HTML. Readability is another key facet that we're interested in incorporating.
- Explore alternative architectures for the model.
This task is an umbrella task for the individual steps in this project.