Internal documentation and overview
Design
TBD - Sketches are available in the documentation linked above.
Annotool
We may consider using Annotool in some capacity - Tool - Gitlab
This already provides functionality for loading a dataset and having users review edits.
Investigation
We want to determine the technical approach we will take for building this tool, answering questions such as:
- Should we build a new tool or extend an existing one (e.g. Annotool)?
- How will we store data such that it is accessible to our data analyst?
- Are there concerns about any of the design features, which we should reconsider to simplify the solution?
- How should we ingest revert risk scores into the interface?
Findings
Should we build a new tool or extend an existing one (e.g. Annotool)?
After spending about a week experimenting with annotool, I believe that we can either extend it or fork it to meet our needs. I've already been able to add a view for filtering lists of revisions on probability:
https://gitlab.wikimedia.org/jsn/annotool/-/tree/Jsn.sherman/threshold-filter?ref_type=heads
In this case, I just grabbed a slider with both a min and max value out of the ui library that annotool is already using.
How will we store data such that it is accessible to our data analyst?
Annotool supports csv export. We could just use that, or potentially integrate it with google sheets without too much hassle. I suggest that we don't use the internal db for long term storage or analysis.
Are there concerns about any of the design features, which we should reconsider to simplify the solution?
Not at this time. I do think it's worth having both a min/max threshold that can be tested for "maybe"/"marginal" buckets for scored revs.
How should we ingest revert risk scores into the interface?
Annotool supports bulk csv import as well in addition to accepting individual scores via its api.
