Page MenuHomePhabricator

Find out why AbuseFilter was emitting 1000s of non-object property access errors
Open, LowPublic

Description

Error

Request URL:
Request ID: XSeXlgpAIDsAAIYEx8cAAADD

message
PHP Notice: Cannot access property on non-object
trace
#0 /srv/mediawiki/php-1.34.0-wmf.11/extensions/AbuseFilter/includes/AbuseFilter.php(661): MWExceptionHandler::handleError(integer, string, string, integer, array, array)
#1 /srv/mediawiki/php-1.34.0-wmf.11/extensions/AbuseFilter/includes/AbuseFilter.php(609): AbuseFilter::checkFilter(array, AbuseFilterParser, Title, string, string)
#2 /srv/mediawiki/php-1.34.0-wmf.11/extensions/AbuseFilter/includes/AbuseFilter.php(1178): AbuseFilter::checkAllFilters(AbuseFilterVariableHolder, Title, string, string, AbuseFilterParser)
#3 /srv/mediawiki/php-1.34.0-wmf.11/extensions/AbuseFilter/includes/AbuseFilterHooks.php(138): AbuseFilter::filterAction(AbuseFilterVariableHolder, Title, string, User)
#4 /srv/mediawiki/php-1.34.0-wmf.11/extensions/AbuseFilter/includes/AbuseFilterHooks.php(60): AbuseFilterHooks::filterEdit(DerivativeContext, Wikibase\ItemContent, string, Status, string, string)
#5 /srv/mediawiki/php-1.34.0-wmf.11/includes/Hooks.php(174): AbuseFilterHooks::onEditFilterMergedContent(DerivativeContext, Wikibase\ItemContent, Status, string, User, boolean, string)
#6 /srv/mediawiki/php-1.34.0-wmf.11/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#7 /srv/mediawiki/php-1.34.0-wmf.11/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditFilterHookRunner.php(116): Hooks::run(string, array)
#8 /srv/mediawiki/php-1.34.0-wmf.11/extensions/Wikibase/repo/includes/EditEntity/StatsdTimeRecordingEditFilterHookRunner.php(44): Wikibase\Repo\EditEntity\MediawikiEditFilterHookRunner->run(Wikibase\DataModel\Entity\Item, User, string)
#9 /srv/mediawiki/php-1.34.0-wmf.11/extensions/Wikibase/repo/includes/EditEntity/MediawikiEditEntity.php(702): Wikibase\Repo\EditEntity\StatsdTimeRecordingEditFilterHookRunner->run(Wikibase\DataModel\Entity\Item, User, string)
#10 /srv/mediawiki/php-1.34.0-wmf.11/extensions/Wikibase/repo/includes/EditEntity/StatsdSaveTimeRecordingEditEntity.php(73): Wikibase\Repo\EditEntity\MediawikiEditEntity->attemptSave(Wikibase\DataModel\Entity\Item, string, integer, string, boolean)
#11 /srv/mediawiki/php-1.34.0-wmf.11/extensions/Wikibase/repo/includes/Api/EntitySavingHelper.php(351): Wikibase\Repo\EditEntity\StatsdSaveTimeRecordingEditEntity->attemptSave(Wikibase\DataModel\Entity\Item, string, integer, string)
#12 /srv/mediawiki/php-1.34.0-wmf.11/extensions/Wikibase/repo/includes/Api/ModifyEntity.php(310): Wikibase\Repo\Api\EntitySavingHelper->attemptSaveEntity(Wikibase\DataModel\Entity\Item, string)
#13 /srv/mediawiki/php-1.34.0-wmf.11/includes/api/ApiMain.php(1583): Wikibase\Repo\Api\ModifyEntity->execute()
#14 /srv/mediawiki/php-1.34.0-wmf.11/includes/api/ApiMain.php(531): ApiMain->executeAction()
#15 /srv/mediawiki/php-1.34.0-wmf.11/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#16 /srv/mediawiki/php-1.34.0-wmf.11/api.php(87): ApiMain->execute()
#17 /srv/mediawiki/w/api.php(3): include(string)

Impact

Unknown.

Notes

Seems to be limited to wmf.11, but happened immediately after wmf.13 was deployed.

Two huge pro-longed spikes which are dominating the unfiltered error dashboards.

Event Timeline

Krinkle created this task.Jul 11 2019, 8:13 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 11 2019, 8:13 PM

Moving to "Meta" because this for analysing afterwards, and to determine how to avoid it in the future (e.g. tests or static rules).

Back to non-meta because wikidata was put back on wmf.11, and it is still spewing thousands of errors continuously.

Tagging CPT (per stewardship list) as it seems to be quite immediate and Daimona might not be available right away as volunteer.

I'm writing from mobile so I cannot push anything or investigate. However, I had noticed yesterday's spike. Once again, I believe it has to do with global filters caching. Maybe some conflict between wikis running different MW versions? Although I believed adding the version to the cache key would have been enough.

Krinkle added a comment.EditedJul 11 2019, 9:34 PM

This spike is from today and still on-going. It started an hour ago with the 1.34-wmf.13 deployment.

Right now most wikis are on wmf.13, except Wikidata which is still on wmf.11. The errors are coming from Wikidata/wmf.11.

@Daimona What is the impact of this error? How does the code deal with the unexpected null values implied in all attempted accesses of the returned value?

Should all wikis be on wmf.11 for now, or is AbuseFilter still working fine with this bug in place?

greg added a subscriber: greg.
15:56:58 <Krinkle> greg-g: thcipriani: likely one-off. It looks like the problem happened because the code changed from an unversioned cache key to a versioned cache key, and to them make a breaking change in the value, in the same branch.
15:57:27 <greg-g> Krinkle: ah, well, that's "good"
15:57:38 <Krinkle> So what happened is probably something like wmf.13 populating the versioned key with the new value format and wmf.11 reading it regardless and then breaking. 
15:57:47 <Krinkle> not sure how/why but it wouldn't happen again

Removing from wmf.13 blockers.

15:56:58 <Krinkle> greg-g: thcipriani: likely one-off. It looks like the problem happened because the code changed from an unversioned cache key to a versioned cache key, and to them make a breaking change in the value, in the same branch.
15:57:27 <greg-g> Krinkle: ah, well, that's "good"
15:57:38 <Krinkle> So what happened is probably something like wmf.13 populating the versioned key with the new value format and wmf.11 reading it regardless and then breaking. 
15:57:47 <Krinkle> not sure how/why but it wouldn't happen again

This is exactly what happened, although I don't understand why, given the versioned cache key. And even weirder: the failures happen when trying to access some properties which always used to exist in the row: the fields added when versioning the key are an overhead for that bit of code.

@Daimona What is the impact of this error? How does the code deal with the unexpected null values implied in all attempted accesses of the returned value?

Well, it depends on what properties are missing. For sure, it's af_pattern and af_id, and probably all of the others. So:

  1. line 608 will write in the 'global-' key for all global filters, keeping only the last value.
  2. Then in checkFilter, line 659, the same happens for the filterCache.
  3. Then at line 661 $pattern will be the empty string, and the filter won't match.
  4. And then we'll send some bogus profiling data to stash in recordProfilingResult; the runtimes will all be very low, and all for $filter == ''.
  5. Back to checkAllFilters, we'll send some other bogus profiling data in recordStats.
  6. Then out of checkAllFilters, we get to line 1201 with the array built at line 608, which is filtered, and thus the 'global-' key is removed because the filter(s) didn't match.

To recap, during the outage:

  1. Global filters won't ever match. Given the short duration of the outage, this shouldn't be that bad.
  2. We'll send some bogus profiling data to various places. Well, no harm from this, just some wrong statistics for a couple of days (and probably they won't be too off)

Is there anything actionable here?

@greg Can this be closed?

@WDoranWMF CPT is tagged as fallback for Daimona (as steward for AbuseFilter). The task is not UBN or train-blocking.

Id like to keep it open until we've found the root cause of this incident so as to prevent it in the future. It went away by itself, but that's not particularly comforting by itself.

Krinkle triaged this task as Low priority.Jul 18 2019, 4:10 PM

@Krinkle Excellent, thanks for the context, I'll move it out of our inbox but keep it categorised on our board.

Krinkle renamed this task from AbuseFilter emitting 1000s of non-object property access errors to Find out why AbuseFilter was emitting 1000s of non-object property access errors.Jul 26 2019, 4:28 PM