Page MenuHomePhabricator

Invalid timing value for mediawiki_GrowthExperiments_suggested_edits_server_tti_seconds
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
message
Invalid timing value for mediawiki_GrowthExperiments_suggested_edits_server_tti_seconds
trace
	
at eval  https://en.wikipedia.org/w/load.php?lang=en&modules=ext.wikimediaEvents&skin=vector-2022&version=ip7wa:4:167
at Object.handler  https://en.wikipedia.org/w/load.php?lang=en&modules=mediawiki.base&skin=vector-2022&version=152ia:3:713
at fire  https://en.wikipedia.org/w/load.php?lang=en&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=4z9dz:45:76
at Object.fireWith  https://en.wikipedia.org/w/load.php?lang=en&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=4z9dz:46:264
at Object.fire  https://en.wikipedia.org/w/load.php?lang=en&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=4z9dz:46:308
at mw.track  https://en.wikipedia.org/w/load.php?lang=en&modules=mediawiki.base&skin=vector-2022&version=152ia:3:508
at initSuggestedTasks  https://en.wikipedia.org/w/load.php?lang=en&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=4z9dz:396:576
at https://en.wikipedia.org/w/load.php?lang=en&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=4z9dz:397:189
at index.js  https://en.wikipedia.org/w/load.php?lang=en&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=4z9dz:397:479
at runScript  https://en.wikipedia.org/w/load.php?lang=en&modules=startup&only=scripts&raw=1&skin=vector-2022:11:687
Impact

1500 events per day. This affects Growth's performance grafana dashboard accuracy

Notes

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Sgs triaged this task as High priority.May 14 2025, 10:54 AM
Sgs raised the priority of this task from High to Needs Triage.
Sgs moved this task from Inbox to Current Sprint on the Growth-Team board.
Sgs edited projects, added Growth-Team (Current Sprint); removed Growth-Team.

We should check logs once rEGRE9f3d06d44d3e447d8137631c8999acda310798fb reaches production to hopefully get additional data in the log. So far I haven't seen any log of this in beta, but I'm not a great filterer and unfamiliar with beta logstash.

I’m trying to figure out why we’re not getting the additional error log implemented in 1142597: fix(logging): temporary subscribe to analytics uncaught errors. I can reliably reproduce the error by running mw.track('stats.mediawiki_GrowthExperiments_suggested_edits_tti_seconds', NaN ) in the JS console. The first error is correctly logged (statsd handles it) and I can see our additional log code executed, but for some reason the second mw.errorLogger.logError( seems dismissed. Some of my current hypothesis are:

  • The additional properties we're adding to error_context are not compliant with the schema for some reason. Although my understanding from mediawiki/client/error/2.0.0.yaml#L21 is that we can add arbitrary data. Maybe the camel case vs underscore case?
  • mw.errorLog.log does some kind of optimization and interprets the second error as already logged because of similarity with the prior (?) Or because it is fired very short time after another?

In my debugging checks I found the mw.errorLogger state to be "firing" while trying to process the additional log and not triggering it (although the debug session may be changing the conditions of execution).

Ideas are welcome, cc @Cyndymediawiksim @Michael @Urbanecm_WMF

  • The additional properties we're adding to error_context are not compliant with the schema for some reason. Although my understanding from mediawiki/client/error/2.0.0.yaml#L21 is that we can add arbitrary data. Maybe the camel case vs underscore case?

We can add arbitrary strings. The current implementation logs numbers instead. For this reason, the event is refused by the validation service:

'.error_context['serverDuration']' should be string, '.error_context['GEHomepageStartTime']' should be string

Fortunately, the validation errors include the actual event. But, we should either fix the patch or remove the extra data altogether.

Hope this helps!

Change #1147720 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] stats(SuggestedEdits): avoid tracking negative tti durations

https://gerrit.wikimedia.org/r/1147720

We can add arbitrary strings. The current implementation logs numbers instead. For this reason, the event is refused by the validation service:

Gotcha.

Hope this helps!

It does indeed, ty. I've submitted a patch to avoid sending negative integers to stats, I'm keeping the additional logging for now with stringified values in case the are other sources for invalid timings.

  • The additional properties we're adding to error_context are not compliant with the schema for some reason. Although my understanding from mediawiki/client/error/2.0.0.yaml#L21 is that we can add arbitrary data. Maybe the camel case vs underscore case?

We can add arbitrary strings. The current implementation logs numbers instead. For this reason, the event is refused by the validation service:

'.error_context['serverDuration']' should be string, '.error_context['GEHomepageStartTime']' should be string

Where do you see these errors from the validation service?

Fortunately, the validation errors include the actual event. But, we should either fix the patch or remove the extra data altogether.

Could you point me to where the actual event is included? I'm not seeing it in logstash: https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-default-1-7.0.0-1-2025.05.20?id=I_6x7JYB-jfKnJise96Y

Change #1147720 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] stats(SuggestedEdits): avoid tracking negative tti durations

https://gerrit.wikimedia.org/r/1147720

Change #1149345 had a related patch set uploaded (by Michael Große; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@wmf/1.45.0-wmf.2] stats(SuggestedEdits): avoid tracking negative tti durations

https://gerrit.wikimedia.org/r/1149345

Change #1149345 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.45.0-wmf.2] stats(SuggestedEdits): avoid tracking negative tti durations

https://gerrit.wikimedia.org/r/1149345

Mentioned in SAL (#wikimedia-operations) [2025-05-22T13:58:09Z] <sbisson@deploy1003> Started scap sync-world: Backport for [[gerrit:1149345|stats(SuggestedEdits): avoid tracking negative tti durations (T394289 T394701)]]

Mentioned in SAL (#wikimedia-operations) [2025-05-22T14:00:25Z] <sbisson@deploy1003> sbisson, migr: Backport for [[gerrit:1149345|stats(SuggestedEdits): avoid tracking negative tti durations (T394289 T394701)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-05-22T14:09:13Z] <sbisson@deploy1003> Finished scap sync-world: Backport for [[gerrit:1149345|stats(SuggestedEdits): avoid tracking negative tti durations (T394289 T394701)]] (duration: 11m 04s)

Etonkovidova closed this task as Resolved.EditedMay 27 2025, 12:06 AM
Etonkovidova subscribed.

Errors are not present anymore - the last error timestamp May 25, 2025 @ 14:50:43.955, https://logstash.wikimedia.org/goto/d58bb261436bee059d4949888d0098ea