Page MenuHomePhabricator

PendingChangesBot: Auto-approve reverts to already-reviewed content via SHA1 matching
Open, Needs TriagePublic

Description

Add a check to determine if a pending edit has already been reviewed by detecting if it's a revert to a previously reviewed version.

Github issue #3

Logic:

  1. Check if there is a newer version in the edit history with a change tag indicating the edit is a revert or has been reverted (tags: mw-manual-revert, mw-reverted, mw-rollback, mw-undo)
  2. If such versions exist, check if those versions have already-reviewed identical versions in the version history
  3. Match versions using SHA1 content hashes to identify identical content
  4. If an identical reviewed version is found, the edit can be treated accordingly (e.g., auto-approved)

Performance Considerations:
The provided SQL query fetches maximum revision IDs with reviews for all pending articles at once when data is refreshed. This query performs well on small wikis but is too slow on large wikis (ruwiki, plwiki, dewiki). Because this add a configuration setting to enable/disable this feature based on wiki size until there is better alternative.

SQL query
Following SQL query can be used for fetching latest revision id:s for all pending articles.

SELECT 
   fr_page_id, 
   fr_rev_id, 
   max(r1.rev_id) AS max_reviewed_rev_id, 
   c1.content_sha1, 
   page_title 
FROM 
    page, 
    flaggedpages, 
    flaggedrevs, 
    slots AS s1, 
    content AS c1, 
    content AS c2, 
    slots AS s2, 
    revision AS r1,
    change_tag,
    change_tag_def  
WHERE 
   fp_pending_since IS NOT NULL 
   AND fp_page_id = fr_page_id 
   AND fr_rev_id = s1.slot_revision_id 
   AND c1.content_id = s1.slot_content_id
   AND c1.content_size = c2.content_size
   AND c1.content_sha1 = c2.content_sha1 
   AND c1.content_id != c2.content_id 
   AND c2.content_id = s2.slot_content_id 
   AND s2.slot_revision_id = r1.rev_id 
   AND r1.rev_page = fp_page_id 
   AND r1.rev_id > fp_stable
   AND r1.rev_id = ct_rev_id
   AND ct_tag_id = ctd_id
   AND ctd_name IN ("mw-manual-revert", "mw-reverted", "mw-rollback", "mw-undo")
   AND page_namespace = 0 
   AND fp_page_id = page_id 
GROUP BY fp_page_id;

Example for SQL query can be found here

Configuration:
Add a configuration parameter to enable/disable this check (recommended: disabled by default for large wikis).

Tests:

  • Test SHA1 matching logic
  • Test that Superset SQL query is working
  • Test that finding latest reviewed version works correctly
  • Test configuration parameter handling

Event Timeline

Note: Pointbr8ker-123 is working with this in github ticket.

Hello everyone, I'm Pointbr8ker-123. I'd like to work on this issue. If anyone else is already working on it or plans to, please let me know