Page MenuHomePhabricator

Feature request: garbage checker
Closed, DeclinedPublic

Description

Garbage checking is a programming technique of doing last minute checks for known bugs and problems in the data. Most compilers have garbage checkers.

The final function before saving checks the article for known data problems. For example a regex that searches for the string "http[s]?[:]///" .. if it exists, log the article title and abort processing (no save). This way you have the leisure to let the program inform you of problems without injecting errors. Errors take a lot of work and time to program the fix for, debug the fix, find the articles impacted, run the process, check it went through OK etc.. it's a big deal to fix these problems once they are in the wikitext.

Garbage check: count the number of citations before and after processing. If there are fewer, then some citations have been deleted by mistake, so log and abort processing. It is a simple 5-10 line check that will save huge damage to the system and massive clean up work. Yes the problem should be fixed at the source, but you don't always have time to do it right away while a garbage check is a simple technique to solve the problem immediately,

Garbage check: Look for embedded templates. Medic found over 600 articles with this problems in the recent run. It keeps coming up it should be checked for.

As new problems come up, they can be stopped immediately with the garbage checker. You can take as much time as needed to fix the bug at its source, and in the mean time articles are not disrupted and the bot can continue running.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Considering that the bot's output appears to be flawless now, this doesn't seem necessary anymore.