Page MenuHomePhabricator

When marking for deletion, logs twice in Special:Log (proposed solution: delete Deletion Tag Log)
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Make sure PageTriage is installed
  • Make sure you are a new page patroller or admin
  • Find an unreviewed page
  • Page Curation toolbar will open
  • Click the trash can (deletion menu)
  • Choose any deletion option (CSD, PROD, AFD)
  • Click "Mark for deletion"

What happens?:

  • Action is logged in two logs: Deletion tag log, and Page curation log

What should have happened instead?:

  • Action should only be logged in Page curation log

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Split from T49891

I was originally thinking this was a wontfix, but I changed my mind. PageTriage writes exactly the same log entry to two logs: pagetriage-deletion/deletion, and pagetriage-curation/deletion. No other PageTriage actions write to two logs... other actions simply go into pagetriage-curation, which has 5 log_actions to choose from.

I'm going to write a patch that stops writing to Deletion tag log, and also hides the Deletion tag log.

I've advertised this possible removal at NPP and VPT. No objections. Not much response in general, I think this log is unused.

The goal is to clean up double log entries like this:

image.png (953×2 px, 327 KB)

The only other place that "pagetriage-deletion" is found in our codebases is in this file, where I think it is used to allow the replica databases to display rows containing log_type=pagetriage-deletion. I think this is OK to leave, in case anyone wants to run queries on the data.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Hi @Ladsgroup . It's nice to meet you. Hope you don't mind the ping, am hoping for some DBA advice. Extension PageTriage currently writes two of the exact same log entry to two Special:Logs, and I am thinking about stopping this behavior. My question is, since one of the logs is just duplicate data, would DBAs be interested in deleting the data later, or should we leave it alone?

The logs are pagetriage-deletion/deletion and pagetriage-curation/deletion. The one I am thinking about stopping writing to is pagetriage-deletion/deletion. The code is here. I see the same duplication code in a commit from 2012, so I think PageTriage has behaved this way its entire life.

Thanks a lot. Looking forward to your feedback.

Please tag DBA for requested input from DBAs (as pinging individuals does not scale) - thanks a lot! :)

jcrespo subscribed.

There is BTW a tag for asking question to the data persistence team, but for projects in which we are not involved directly- for asking feedback. I've added it here (but didn't remove the dba one, as it is up to the dbas to manage it).

The one thing I would ask back is about is the size of the deletion request- how many wikis and how many log entries (approximately- if you know it) will be deleted- if 5 or 5 million, the scope will change dramatically :-D. Being on the data recovery side of things I am interested on monitoring this, in case the deletion process goes badly. Also please be patient, cleanup tasks are not usually the highest priority, and the DBAs are super-busy. Thank you! 0:-)

Hey there. It'd be enwiki only. It'd be 102,000-ish rows in the logging table. Quarry link.

This task has no urgency. I'm just checking as a courtesy. I figure it'd be a small efficiency to delete 100k duplicate log entries.

Of course don't do it yet, I haven't even written the patch. But wanted to formulate a gameplan.

Change 815702 had a related patch set uploaded (by Novem Linguae; author: NovemLinguae):

[mediawiki/extensions/PageTriage@master] Remove "Deletion tag log" from Special:Log

https://gerrit.wikimedia.org/r/815702

Nice to meet you and thank you for caring about new page patrolling! It's highly appreciated.

I can take care of the deletion once the patch is approved/merged.

If the data is being deleted, it can probably be dropped from maintain-views too

How that's related in this case?

I don't think deleting the log data is a good idea. What if someone linked to a specific log entry and it gets deleted? I think it is unfortunate that we have been double-logging up until now but we should just stop doing it and move on, leaving the existing duplication in place.

We have had cases of mass removal of log entries. e.g. five years ago we removed all auto patrol logs removing 1.1B rows. I did a similar thing for flaggedrevs auto-review logs which were tens of millions if not hundreds of millions of rows.

I'm not saying this means we should delete the rows but I'm saying we have precedent for such cases. We don't guarantee our logs stay there forever.

Change 815702 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Remove "Deletion tag log" from Special:Log

https://gerrit.wikimedia.org/r/815702

It is not obvious to editors not involved with page curation (ie most of them) that deletion marking is to be found in the page curation log rather than the deletion log. An easy way for users to get to the correct log should be provided by pointing them to it in the relevant places.

Happy to add. What are some examples of relevant places?

In the header of the deletion log special page for a start, since that's where it used to be logged and people will go if they've used that search before. Possibly also the header of Special:Logs, at least temporarily until people get used to the new system.

The deletion tag log is still appearing in the list of logs. Likely need another patch to hide this, then as discussed above we can ask a DBA to delete the log entries.

To recap: we did not move the deletion tag log to the page curation log. Rather, the PageTriage software has been writing duplicate log entries to both logs since the beginning. So the goal is to hide and delete an unneeded, duplicated log (the deletion tag log).

Do first:

  • Ask DBA to delete deletion tag log. Sounds like they are onboard with this in the phab discussion above.

Do after:

  • Patch PageTriage to hide deletion tag log
  • Patch operations/puppet -> maintain-views.yaml to not clone non-existent log type. This patch is similar but adds instead of removes a log type.

Note that a warning about this deletion was added to https://en.wikipedia.org/wiki/MediaWiki:Alllogstext

Or if there isn't an appetite to delete the Deletion Tag Log, can we just hide it from the picker at Special:Log? Again, all data in the Deletion Tag Log is a 1:1 duplicate of items in the Page Curation Log subtype Delete, so we are not losing any data.

I may release a patch just for this, as it seems more likely to gain consensus and is a good incremental step.

Novem_Linguae renamed this task from When marking for deletion, logs twice in Special:Log to When marking for deletion, logs twice in Special:Log (proposed solution: delete Deletion Tag Log).Nov 10 2022, 8:04 AM

Change 855483 had a related patch set uploaded (by Novem Linguae; author: Novem Linguae):

[mediawiki/extensions/PageTriage@master] hide Deletion Tag Log from Special:Log

https://gerrit.wikimedia.org/r/855483

@Novem_Linguae Why do you think it can't be deleted? If it's okay with the community, just create a ticket and we will get to it.

Change 855483 merged by jenkins-bot:

[mediawiki/extensions/PageTriage@master] Hide Deletion Tag Log from Special:Log

https://gerrit.wikimedia.org/r/855483

MPGuy2824 assigned this task to Novem_Linguae.
MPGuy2824 subscribed.

Nothing seems to be left to do here. Please reopen if i'm mistaken.