When reviewing pending changes, we need to detect whether text added in a pending revision still exists in the current version of the article. If all additions made in the pending change have been removed or superseded by subsequent edits, the pending change can be automatically accepted (or skipped), as there's no longer any content to review.
Desired bot behavior
- Detect when text added in a pending change no longer exists (or has been substantially modified) in the current article version
- Detect also cases where additions are only partially intact (i.e., there is new text inside the added text from other users, the addition was partially removed, or text has been moved)
- Automatically accept or skip such pending changes, as the content is no longer present to review
Task
Investigate how detection should be done and whether we need per-revision word-level annotation indicating which revision is the origin of each word, or if an (open source) LLM can do this for us just by providing the original diff and latest revision wikitext.
Also determine if there are existing (Python) tools or libraries for this kind of work. (WikiTrust for example did this so we know it is technically possible but it was written in OCaml and has been defunct for over 10 years.)
Provide a proposal or results of the investigation in the comments, and we can write an actual task ticket based on the investigation.