In the current backfill code, `mediawiki_wikitext_history` does not provide an unambiguous source to know whether the incoming revision details are suppressed. Thus we just blindly mark them as not suppressed:
```
...
WHEN MATCHED AND to_timestamp('{snapshot}') > t.row_last_update THEN
UPDATE SET
t.page_id = s_page_id,
...
t.user_is_visible = TRUE, -- set to TRUE for now, need to figure source for this
t.revision_id = s_revision_id,
t.revision_parent_id = s_revision_parent_id,
...
t.revision_comment = s_revision_comment,
t.revision_comment_is_visible = TRUE, -- set to TRUE for now, need to figure source for this
t.revision_sha1 = s_revision_sha1, -- from backfill, revision_sha1 == main slot sha1
...
t.revision_content_is_visible = TRUE, -- set to TRUE for now, need to figure source for this
```
@Milimetric [[ https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/4#note_49482 | comments ]] that indeed the current schema of `mediawiki_wikitext_history` does not contain such info, and suggests a possible solution:
>for this backfill, specifically from `mediawiki_wikitext_history`, deleted is written out in the XML, for example:
>
>https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/refs/heads/master/includes/export/XmlDumpWriter.php#371
>
>I found some examples in mysql and then looked them up in the '2023-07' snapshot:
>
>```
>mysql:research@dbstore1007.eqiad.wmnet [etwiki]> select * from revision where rev_deleted {> 0, > 1, > 3} and rev_timestamp > '2023-05' limit 1;
>```
>
>I found that deleted user meant `user_id = -1`, deleted content meant `revision_text = ''`, and deleted comment meant `revision_comment = ''`. This is useful for the `user_id` but not for the others which could be like that normally (empty comments). Without joining, there's no way to get this data, and joining in general would be too expensive I would think.
>
>However, collecting only the revisions where rev_deleted is <> 0 and broadcasting that to join might work, there might just not be that many of these things.
Another possibility is to modify `mediawiki_wikitext_history` so that this data is included. Source code: https://gerrit.wikimedia.org/r/plugins/gitiles/analytics/refinery/source/+/refs/heads/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/mediawikidumps/MediawikiXMLParser.scala#43
In this task we should:
[] Figure out if the suggestion can be built into the current backfill
[] If not, figure out another source for this data, perhaps by modifying `mediawiki_wikitext_history`.
[] Additionally, [[ https://gitlab.wikimedia.org/repos/data-engineering/dumps/mediawiki-content-dump/-/merge_requests/4#note_49307 | take care of some cosmetic issues ]] discussed in the same review thread.