Page MenuHomePhabricator

AbuseLog no longer recording revids of saved edits
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

View the the enwiki AbuseLog.

What happens?:

No "diff" links are shown, even for edits that saved. The problem looks to have started at 19:44 on July 1.

Same problem with the API (note the missing revid field after 19:44):

https://en.wikipedia.org/w/api.php?action=query&list=abuselog&aflstart=20210701195900&afllimit=500

What should have happened instead?:

Hits with saved changes have a "diff" link in the log, and a revid field in the API result.

Event Timeline

DannyS712 triaged this task as Unbreak Now! priority.Jul 4 2021, 5:15 AM
DannyS712 added subscribers: Jdforrester-WMF, DannyS712.

This corresponds to the time 1.37.0-wmf.12 reached enwiki

All wikis were rolled to wmf.12 at 2021-07-01 19:42:47.

Adding this as a train blocker (tagging both last week's train, in case it needs a rollback, and next week's for visibility)

Yes, was noticing that at metawiki

2021-7-1 19:35 last saved edit logged

https://meta.wikimedia.org/wiki/Special:AbuseLog?wpSearchUser=&wpSearchPeriodStart=&wpSearchPeriodEnd=&wpSearchTitle=&wpSearchImpact=1&wpSearchAction=any&wpSearchActionTaken=&wpSearchFilter=&wpSearchWiki=

Just left @brennen a message as they were deploying stuff at that time, and they may be able to add some light.

it started after 19:35 1 July at metawiki

It seems the info isn't being added to the database.
The top result for the enwiki api search (https://en.wikipedia.org/w/api.php?action=query&list=abuselog&aflstart=20210701195900&afllimit=500) is

	{
		"id": 30327362,
		"filter_id": "432",
		"user": "2603:8001:2507:5EAA:943E:8647:4A34:AF46",
		"ns": 0,
		"title": "Deanna Bogart",
		"action": "edit",
		"result": "tag",
		"timestamp": "2021-07-01T19:58:51Z"
	},

which corresponds to the edit https://en.wikipedia.org/w/index.php?title=Deanna_Bogart&diff=prev&oldid=1031468796
Looking in the replicas

MariaDB [enwiki_p]> SELECT afl_id, afl_filter_id, afl_rev_id FROM abuse_filter_log WHERE afl_id = 30327362;
+----------+---------------+------------+
| afl_id   | afl_filter_id | afl_rev_id |
+----------+---------------+------------+
| 30327362 |           432 |       NULL |
+----------+---------------+------------+
1 row in set (0.00 sec)

(I confirmed afl_rev_id is included in the replicas by fetching it for an older hit). Thus, it appears the issue is that the data isn't being stored in the first place.

mediawikiwiki stopped logging 19:00, 29 June 2021

19:00, 29 June 2021: 208.66.198.16 (talk | block) triggered filter 76, performing the action "edit" on Translations:Help:VisualEditor/User guide/251/fo. Actions taken: Tag; Filter description: Problematic translation (details | examine | diff)

https://www.mediawiki.org/wiki/Special:AbuseLog?wpSearchUser=&wpSearchPeriodStart=&wpSearchPeriodEnd=&wpSearchTitle=&wpSearchImpact=1&wpSearchAction=any&wpSearchActionTaken=&wpSearchGroup=0&wpSearchFilter=

and that aligns with
https://sal.toolforge.org/log/NX8rWXoBa_6PSCT9CMVF
19:07 <brennen@deploy1002> rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.12

I tested on patchdemo with just core and abusefilter (master on both) and this is also an issue there (https://patchdemo.wmflabs.org/wikis/2877d5c2b3/wiki/Special:AbuseLog) so its likely not due to another extension interfering

Change 702956 had a related patch set uploaded (by DannyS712; author: DannyS712):

[mediawiki/extensions/AbuseFilter@master] Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage"

https://gerrit.wikimedia.org/r/702956

Change 702956 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage"

https://gerrit.wikimedia.org/r/702956

Change 702957 had a related patch set uploaded (by DannyS712; author: DannyS712):

[mediawiki/extensions/AbuseFilter@wmf/1.37.0-wmf.12] Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage"

https://gerrit.wikimedia.org/r/702957

Change 702958 had a related patch set uploaded (by DannyS712; author: DannyS712):

[mediawiki/extensions/AbuseFilter@master] Add tests for afl_rev_id being set

https://gerrit.wikimedia.org/r/702958

WikiPage objects are checked for identity in the post-save hook. I admit this is a poor choice (one of the old leftovers from ~2009 that I still haven't had a chance to clean up), but it's broken now since a new WikiPage object is created. I had already suspected this and wanted to verify the assumption and comment on r701896, but eventually forgot to do that and it fell off the radar.

I think ideally, core would offer a way to link the same ongoing edit in different edit-related hooks. Internally it could compare the relevant fields from a WikiPage etc., but extensions shouldn't have to do that, or even just know the logic.

As an immediate solution, reverting the patch works. For the short term, we'd want to manually check the relevant fields in WikiPage to match; unsure which ones exactly (timestamp? last revision?).

WikiPage objects are checked for identity in the post-save hook. I admit this is a poor choice (one of the old leftovers from ~2009 that I still haven't had a chance to clean up), but it's broken now since a new WikiPage object is created. I had already suspected this and wanted to verify the assumption and comment on r701896, but eventually forgot to do that and it fell off the radar.

I think ideally, core would offer a way to link the same ongoing edit in different edit-related hooks. Internally it could compare the relevant fields from a WikiPage etc., but extensions shouldn't have to do that, or even just know the logic.

As an immediate solution, reverting the patch works. For the short term, we'd want to manually check the relevant fields in WikiPage to match; unsure which ones exactly (timestamp? last revision?).

Yeah, thats what I was realizing as I was trying to add tests for this, because those were failing - the wikipage is checked with === but we should accept different WikiPage objects that refer to the same underlying page

If its enough to make sure that the title is the same (without checking last revision or timestamp) we can use WikiPage::isSamePageAs()

If its enough to make sure that the title is the same (without checking last revision or timestamp) we can use WikiPage::isSamePageAs()

I don't know, maybe it should also check that the editing user is the same. The problem is that WikiPage has a lot of data stored in class members (recursively), so it's hard to make a list of edit "identifiers" that are checked for equality. As I said, this concept (of having a way to uniquely identify an edit) should likely exist in MediaWiki core.

I think we could play super-safe and join + hash the following: prefixedtitle, maybe page ID, the content model, the timesamp, and something that identifies the associated RevisionRecord. The other problem is to figure out which members are already guaranteed to be loaded, since we don't want to trigger unnecessary lazy-loading.

I can deploy; going ahead as I think it will be clear from reproduction steps above if the fix worked.

Change 702957 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@wmf/1.37.0-wmf.12] Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage"

https://gerrit.wikimedia.org/r/702957

Mentioned in SAL (#wikimedia-operations) [2021-07-04T17:43:53Z] <brennen@deploy1002> Synchronized php-1.37.0-wmf.12/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: [[gerrit:702957|Revert "Replace depricating method IContextSource::getWikiPage to WikiPageFactory usage" (T286140)]] (duration: 01m 06s)

brennen lowered the priority of this task from Unbreak Now! to Needs Triage.Jul 4 2021, 5:48 PM

Backport deployed, confirmed that enwiki AbuseLog now shows diff links for new edits.

Removing this as a train blocker.

I can deploy; going ahead as I think it will be clear from reproduction steps above if the fix worked.

Backport deployed, confirmed that enwiki AbuseLog now shows diff links for new edits.

Removing this as a train blocker.

Thanks for deploying, sorry I wasn't around. I'd like to keep this task open until the regression tests are merged

I think ideally, core would offer a way to link the same ongoing edit in different edit-related hooks. Internally it could compare the relevant fields from a WikiPage etc., but extensions shouldn't have to do that, or even just know the logic.

Would T242249: Unclear MCR replacement for WikiPage::prepareContentForEdit be what you are looking for? I have been working on a fix for that, but it's a side project at the moment. Patch is WIP: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/701074

However, even when PreparedUpdate exists, the EditFilterMergedContent hook would need to pass it to the handler. It currently provides access to the ongoing edit only via the state of the WikiPage instance inside the RequestContext. If we want to remove IContextSource::getWikiPage, this becomes inaccessible. We'll need to replace the hook to fix that.

Basically, all hooks that deal with ongoing edits need to have a PreparedUpdate passed in.

I think we could play super-safe and join + hash the following: prefixedtitle, maybe page ID, the content model, the timesamp, and something that identifies the associated RevisionRecord. The other problem is to figure out which members are already guaranteed to be loaded, since we don't want to trigger unnecessary lazy-loading.

The logic you are looking for exists, but it's currently @internal: PreparedPageDataUpdater::isReusableFor

Change 702958 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Add tests for afl_rev_id being set

https://gerrit.wikimedia.org/r/702958

DannyS712 claimed this task.
DannyS712 removed a project: Patch-For-Review.

I think ideally, core would offer a way to link the same ongoing edit in different edit-related hooks. Internally it could compare the relevant fields from a WikiPage etc., but extensions shouldn't have to do that, or even just know the logic.

Would T242249: Unclear MCR replacement for WikiPage::prepareContentForEdit be what you are looking for? I have been working on a fix for that, but it's a side project at the moment. Patch is WIP: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/701074

However, even when PreparedUpdate exists, the EditFilterMergedContent hook would need to pass it to the handler. It currently provides access to the ongoing edit only via the state of the WikiPage instance inside the RequestContext. If we want to remove IContextSource::getWikiPage, this becomes inaccessible. We'll need to replace the hook to fix that.

Basically, all hooks that deal with ongoing edits need to have a PreparedUpdate passed in.

I'm not sure if PreparedUpdate does what I have in mind, but ideally, I'd want a way to answer the following question:

If I subscribe e.g. to the EditFilterMergedContent and the PageSaveComplete hook and they both handle an event, can I know if the two events were fired for the same edit?

What i was thinking of is a plain value object with some info about the edit (user, page, time etc.) that is passed by all these hooks, and caller may store somewhere for later comparison with an equals() method. I think this might pretty much be what you were proposing re PreparedUpdate. The logic would then be similar to the PreparedPageDataUpdater::isReusableFor method that you mentioned. Oversimplifying, it could look something like:

class EditIdentifier {
   private $id;
   function __construct( UserIdentity $user, WikiPage $page, string $timestamp ) { // These are just examples, in particular we want something thinner than WikiPage
      $this->id = hash_relevant_pieces_of_arguments();
   }

   function equals( self $other ) : bool {
      return $this->id === $other->id;
   }
}

class MyExtensionHookHandlers {
   private $editIdent;

   function onEditFilterMergedContent( EditIdentifier $x ) {
      $this->editIdent = $x;
   }

   function onPageSaveComplete( EditIdentifier $y ) {
      if ( $y->equals( $this->editIdent ) ) {
         // Same edit, do stuff
      }
   }
}

While not necessary per se, this EditIdentifier object might as well expose the single components (user, page, timestamp, whatever it has), at which point it would have more purposes but could be used in those hooks (and maybe core code) instead of passing the components separately.

I'm not sure if PreparedUpdate does what I have in mind, but ideally, I'd want a way to answer the following question:

If I subscribe e.g. to the EditFilterMergedContent and the PageSaveComplete hook and they both handle an event, can I know if the two events were fired for the same edit?

For a PreparedUpdate, that can and should be checked with ===. The PreparedUpdate is the object used internally to track the edit state, so having two "equal" instances doesn't really make sense. It would be theoretically possible to instantiate an emulate equivalent, but that doesn't really seem useful. I'd like to avoid going there.

Of course, the question is how the hook handler would get the PreparedUpdate. Ideally, it would be passed in directly. PreparedUpdate provides access to the RevisionRecord and the ParserOutput, which should have all the info the hooks need. But this means changing the signature of all the relevant hooks, which will take time. So for now, the alternative is to call WikiPage::getCurrentUpdate() to get any currently active PreparedUpdate.

However, things get tricky when the hook doesn't get a WikiPage either. EditFilterMergedContent only gets an IContextSource, which is kind of an odd choice, though I guess the logic is that the filter may need access to the user's session and the request (e.g. for captcha responses). And of course, we want to get rid of IContextSource::getWikiPage()....

While not necessary per se, this EditIdentifier object might as well expose the single components (user, page, timestamp, whatever it has), at which point it would have more purposes but could be used in those hooks (and maybe core code) instead of passing the components separately.

That is indeed the idea behind PreparedEdit: it provides access to all relevant bits and peaces, with lazy initialization.

For a PreparedUpdate, that can and should be checked with ===. The PreparedUpdate is the object used internally to track the edit state, so having two "equal" instances doesn't really make sense. It would be theoretically possible to instantiate an emulate equivalent, but that doesn't really seem useful. I'd like to avoid going there.

While this might be true in theory, guaranteeing no clones around might be hard to enforce (we can prevent shallow cloning via __clone, but we cannot prevent constructing another instance with the same arguments, or deep cloning an existing one). It might be OK to use $this === $other as the implementation of equals(), but I wouldn't encourage callers to rely directly on that.

Of course, the question is how the hook handler would get the PreparedUpdate. Ideally, it would be passed in directly. PreparedUpdate provides access to the RevisionRecord and the ParserOutput, which should have all the info the hooks need. But this means changing the signature of all the relevant hooks, which will take time. So for now, the alternative is to call WikiPage::getCurrentUpdate() to get any currently active PreparedUpdate.

Having that in WikiPage seems fine for the time being. I think changing signatures is going to be inevitable, especially if we account for the ongoing/completed refactoring of related classes (Title, User, WikiPage, etc.). I really wish we had a better way to do that, besides adding optional parameters that will later become mandatory, or adding new hooks and deprecating the old ones.

However, things get tricky when the hook doesn't get a WikiPage either. EditFilterMergedContent only gets an IContextSource, which is kind of an odd choice, though I guess the logic is that the filter may need access to the user's session and the request (e.g. for captcha responses). And of course, we want to get rid of IContextSource::getWikiPage()....

I think no hook should receive a full IContextSource. I mean, yeah, it's great and handy if you need to do complicated stuff, but it drags in a bazillion of dependencies that the handler might not even need. Although I guess this is going to improve as more and more classes are refactored (for the session example you mentioned, I believe Authority should help?).

While not necessary per se, this EditIdentifier object might as well expose the single components (user, page, timestamp, whatever it has), at which point it would have more purposes but could be used in those hooks (and maybe core code) instead of passing the components separately.

That is indeed the idea behind PreparedEdit: it provides access to all relevant bits and peaces, with lazy initialization.

OK nice, we're on the same page then. I'll try to take a look at the patch once I have some spare time.