Page MenuHomePhabricator

Nullify ct_rc_id and ts_rc_id when recent changes entries are being deleted
Closed, DeclinedPublic

Description

When recent changes entries are being deleted (i.e. when purging old entries or deleting pages), the ct_rc_id and ts_rc_id fields for the corresponding rows in the change_tag and tag_summary tables, respectively, should be nullified, because the fields would become pointless. Recent changes entries are not "undeleted" even when undeleting pages or increasing the value of $wgRCMaxAge. Also, if the fields were not nullified, eventual confusion may arise after clearing out the recentchanges table, or immediate confusion if maintenance/rebuildrecentchanges.php is ran after the table is cleared. We should also create a maintenance script to retroactively nullify ct_rc_id and ts_rc_id fields that are no longer the rc_id for any row in the recentchanges table.

Event Timeline

TTO subscribed.

Is there any benefit for doing so? For example, would it improve performance on certain joins? It's hard to see how it would, but I'm open to being corrected.

To clarify, I don't consider "the fields would become pointless" or "eventual confusion may arise after clearing out the recentchanges table" to be strong reasons. MySQL does not reuse AUTO_INCREMENT values even after the rows are deleted, so there is no risk of changetags rows pointing to the wrong RC row.

jcrespo added subscribers: Ladsgroup, jcrespo.

@TTO I have some opinion as a DBA, which will express here, but I will start by pointing out that this is more of a "logical database" mediawiki decision (meaning of fields) than a storage backend concern. All other opinion as a mediawiki user match that of @TTO, but I wanted to give a disclaimer that I am not the expert on mediawiki semantics, and that there could be some cases I don't think about that I could be failing to understand. But unless a strong point is made towards it I will fall on the side of "if ain't broken", with a clearly biased "site reliability" view.

DBA thoughts:

  • There is one edge case in which auto_increments can be resused, and that is in order InnoDB (MySQL/MariaDB) versions, on server shutdown, autoincrement id count was resetted back to max(id) + 1. This is filed as T135851, but it is considered low because it is mostly a MySQL bug, not really mediawiki, has been solved on the latest issues, and it cannot happen on WMF infrastructure anyway because we never shutdown all db servers at once. If somone outside of WMF is running into that issue, I think the best way is follow the documentation (which probably is yet not done): T136045. So my proposal would be to merge into T135851.

However, I am not familiar with maintenance/rebuildrecentchanges.php so I cannot speak if there are other cases that are logically bad. A more concrete example by the original reporter would be useful to understand the extent of the issue (maybe it is a bug on the script there).

I also noticed the MediaWiki-Change-tagging and I can attest there is quite some important ongoing issues regarding that extension and database consistency, most of which are right now being worked on at T185355. I am at this point CC'ing @Ladsgroup not to seek him to fix this, but to have him notice this issue if that helps him do informed decisions on that ticket.

(changing DBA to MediaWiki-libs-Rdbms, as the first one is only about WMF infrastructure, even if mostly the same people are attending both)

I agree with @TTO Performance-wise and storage-wise, It doesn't matter. We have proper indexes in place and it's int (and not string), Normalizing the ct_tag would help a lot which is being worked on.

This is not quite the same as T135851, which was about a MySQL bug that had been fixed in version 8.0. Instead, I'm going to close this as declined per TTO and Ladsgroup.

Per TTO and Ladsgroup (originally closed as a duplicate of T135851).