Page MenuHomePhabricator

initSuggestedTasks: server duration is negative error
Open, In Progress, MediumPublic2 Estimated Story PointsPRODUCTION ERROR

Description

T394289: Invalid timing value for mediawiki_GrowthExperiments_suggested_edits_server_tti_seconds

30-day report on the error - https://logstash.wikimedia.org/goto/db4552a927cc66c919b290b486185892

Error

https://logstash.wikimedia.org/goto/eb91e1daaaa5a8dbce558c503c9fa411

$schema/mediawiki/client/error/2.0.0
@timestampMay 29, 2025 @ 21:51:14.234
error_context.GEHomepageStartTime1748555472388
error_context.GEHomepageStartTimeTypenumber
error_context.componentgrowthexperiments
error_context.serverDuration-14390.39990234375
error_context.serverDurationTypenumber
error_context.special_pageHomepage
error_context.version1.45.0-wmf.3
error_context.wikieswiki
file_urlhttps://es.wikipedia.org/w/load.php?lang=es&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Cmoment%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=1mqrr
levelERROR
messageinitSuggestedTasks: server duration is negative [Additional data]
meta.domaines.wikipedia.org
meta.dt2025-05-29T21:51:13.365Z
meta.id61d389e2-ebbc-4f79-842b-81937b4c75aa
meta.request_idb237edb5-cfd3-46d5-b225-10ebe60a191e
meta.streammediawiki.client.error
normalized_messageinitSuggestedTasks: server duration is negative [Additional data]
stack_traceat initSuggestedTasks https://es.wikipedia.org/w/load.php lang=es&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Cmoment%2Coojs-ui-core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=1mqrr:498:23 at https://es.wikipedia.org/w/load.php?lang=es&modules=%40wikimedia%2Fcodex%2Ccodex styles%2Cjquery%2Cmoment%2Coojs-ui core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=1mqrr:498:603 at index.js https://es.wikipedia.org/w/load.php?lang=es&modules=%40wikimedia%2Fcodex%2Ccodex-styles%2Cjquery%2Cmoment%2Coojs-ui core%2Cvue%7Cext.growthExperiments.Homepage.Impact%2CSuggestedEdits&skin=vector-2022&version=1mqrr:498:893 at runScript https://es.wikipedia.org/w/load.php?lang=es&modules=startup&only=scripts&raw=1&skin=vector-2022:11:748 at Array.<anonymous> https://es.wikipedia.org/w/load.php?lang=es&modules=startup&only=scripts&raw=1&skin=vector 2022:12:504 at flushCssBuffer https://es.wikipedia.org/w/load.php lang=es&modules=startup&only=scripts&raw=1&skin=vector-2022:4:385
typeclienterror
urlhttps://es.wikipedia.org/wiki/Special:Homepage?title=Especial:P%C3%A1gina_inicial&source=specialwelcomesurvey
Impact

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Production Error". · View Herald TranscriptMay 29 2025, 10:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Our current guess is that this is due to out of sync client devices. Should we completely silent this error, or configure and alarm on spikes?

Our current guess is that this is due to out of sync client devices. Should we completely silent this error, or configure and alarm on spikes?

Yes, this is the most likely cause. However, what is puzzling

normalized_message:"initSuggestedTasks: server duration is negative [Additional data]" AND NOT error_context.component:growthexperiments

I wonder if it would make sense to just remove that particular tracking. Maybe adding together a client timestamp and a server timestamp is just a bad idea. While we see the noise in one direction as negative duration errors, the noise in the other direction exists probably just as much.
If we want to include the duration that the request took on the server, then maybe we can make use of some of the early timing marks that the browser provides: https://developer.mozilla.org/en-US/docs/Web/API/PerformanceNavigationTiming

That is, we could replace the servertime with the time the request was started, which would be performance.timeOrigin + performance.getEntriesByType('navigation')[0].requestStart. (see PerformanceResourceTiming/requestStart)

But for now, it seems to me that this composite duration is causing much more problems for us than what we are gaining here, especially given that these things are also already tracked individually.

What do you think?

If we do not want to fix the issue itself, then removing the broken metrics definitely makes sense to me @Michael. Should we repurpose this task to do just that?

Michael triaged this task as Medium priority.EditedJul 14 2025, 9:34 AM
Michael moved this task from Triaged to Up Next (estimated tasks) on the Growth-Team board.

This task itself is about fixing the error, and given that it is our most common error on the client by about 1.5 order of magnitude, it makes sense to me to tackle it sooner rather than later:

image.png (454×646 px, 38 KB)

Above, I outlined both the option of removing it as well as replacing it with a measurement that lives fully in the browser. I'm fine with either. As far as I am concerned, it can be up to the person picking up this task to make that call.

KStoller-WMF set the point value for this task to 2.Jul 21 2025, 4:10 PM
Cyndymediawiksim changed the task status from Open to In Progress.Jul 22 2025, 1:01 PM
Cyndymediawiksim moved this task from Incoming to Doing on the Growth-Team (Current Sprint) board.

Change #1171579 had a related patch set uploaded (by Cyndywikime; author: Cyndywikime):

[mediawiki/extensions/GrowthExperiments@master] Replace server-side duration tracking with Performance API

https://gerrit.wikimedia.org/r/1171579

Change #1171579 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Replace server-side duration tracking with Performance API

https://gerrit.wikimedia.org/r/1171579

Checked - the last timestamp for the error Aug 7, 2025 @ 08:38:47.157 - https://logstash.wikimedia.org/goto/025cee5f2c8d6b245f55809350ba523b. ( only for error_context.version: 1.45.0-wmf.12)

For error_context.version: "1.45.0-wmf.14"shows only one error; https://logstash.wikimedia.org/goto/b6a9971eca372006168141b931791579 - querying only Growth specific reports).

Michael removed Cyndymediawiksim as the assignee of this task.
Michael moved this task from QA to Incoming on the Growth-Team (Current Sprint) board.

For error_context.version: "1.45.0-wmf.14"shows only one error; https://logstash.wikimedia.org/goto/b6a9971eca372006168141b931791579 - querying only Growth specific reports).

There are now a bunch of those errors. So we re-introduced somehow some issue :/

Let's have another look.

Change #1184569 had a related patch set uploaded (by Michael Große; author: Michael Große):

[mediawiki/extensions/WikimediaEvents@master] feat: improve statsd error logs by adding invalid value

https://gerrit.wikimedia.org/r/1184569

Waiting on the above change to be merged and then for it to arrive in production with the train, and then we can better figure out what is going on.

Michael changed the task status from Open to In Progress.Oct 2 2025, 6:21 PM

This is still waiting on code review. It will need some additional follow-up there, but that is currently not a priority for us.

Change #1184569 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] statsd: Improve error logs by adding invalid value

https://gerrit.wikimedia.org/r/1184569