Page MenuHomePhabricator

Allow content blobs to be marked as broken in the content table
Closed, ResolvedPublic

Description

Sometimes, revision data is lost due to data corruption (see e.g. T205936). Such corruption should not be silently ignored, but should be reported and handled gracefully, probably by treating the content as empty.

However, in instance where the corruption has been recognized and handled as well as possible, the broken entries in the content table would continue to cause warnings in the log. To avoid this, the bad entries in the content tables should be marked as "known to be bad". When reading such "known bad" entries, nothing is written to the logs in production, and the content is treated as empty.

One obvious way to do this is to change the content_address field to something that represents the problem or the desired outcome. We could introduce a (pseudo-)address scheme called "bad", with possible values like bad:gone missing or bad:T205936. The value after the "bad:" prefix is arbitrary and can be used for later eyeballing, investigation, or processing.

Event Timeline

Change 557068 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] BlobStore: suppoer "known bad" addresses.

https://gerrit.wikimedia.org/r/557068

daniel triaged this task as Medium priority.

Change 557068 merged by jenkins-bot:
[mediawiki/core@master] BlobStore: support "known bad" addresses.

https://gerrit.wikimedia.org/r/557068