Aim:
Explore the feasibility of comparing the html of the wikipedia page before and after reparsing with the wikidata change
The prototype will compare the html from the cache (if available) and check if it changed, and insert the RC log accordingly (if diff is 'no change', suppress recent change log. If there is no cache to compare with, send the RC change log anyway.)
Tradeoffs for the approach:
- Around a third of pages have no cache for comparison. Therefore things would remain in the current state, of there being βsomeβ false positives. So this would only be able to solve the problem by a maximum of 66%. If the page is not in the cache, we canβt suppress the notification
- False positives due to race conditions
Notes for reviewers:
Things to consider:
- Does this seem likely to adversely affect e.g. page loading in production?
- There are known code quality issues on the tickets & it would conflict with master, no worries about these things for now - this is just a prototype for proof of concept.
The prototype commits:
- Core: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1260687
- Wikibase: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/1261347
Architecture:
- We created a new hook (includes/Hook/RefreshLinksJobBeforeInsertRecentChangeHook.php) and handled it on Wikibase (client/includes/Hooks/RefreshLinksJobBeforeInsertRecentChangeHandler.php).
- We reused some already existing diff and comparison with the parser cache logic from (includes/JobQueue/Jobs/RefreshLinksJob.php)
- The hook will be fired only when there is a hit on ParserCache and the html changed after the edit to Wikibase.
- Then, the hook will be handled on Wikibase by injecting related change to the RC table.
- We removed previous RC table injection code from (client/includes/Changes/ChangeHandler.php) class and put it to hook handler after diff check is done.
Local testing
The details for local testing are in the comments.
Ticket acceptance criteria:
- Issue - recent changes not being written to the database (and/or mention it as an issue for the reviewers)
- Check it's working locally - with no change, seeing that the recent change doesn't get inserted. Ideally, make a video showing it working
- There is a one-line that was added (some other ticket from the updated master) - we need to 'undo' it and leave a comment that it would need to be reconciled with our version
- Have a list of 'still todos' for the reviewers (including the ones above)
