Page MenuHomePhabricator

Convert performanceNow datatype to Integer in QuickSurvey Initiation
Closed, ResolvedPublic2 Estimated Story Points

Description

User Story

As a system, I want the performanceNow property to be consistent with the value set in QuickSurveyLogger.js for mw.now().

Technical information

  1. Wrap performanceNow event property with Math.round() to insure integer datatype end-to-end.

Bug Info:

Hi folks, not sure if it is https://gerrit.wikimedia.org/r/c/mediawiki/extensions/QuickSurveys/+/768194 or something else, but now QuickSurveyInitiation events carry attributes like "performanceNow":1648718876207.3, meanwhile the schema mentions integers, so validation fails. A change in the schema may be required, or a follow up in the JS code :)

Event Timeline

Change 776985 had a related patch set uploaded (by Eigyan; author: Eigyan):

[mediawiki/extensions/QuickSurveys@master] Convert performanceNow datatype to Integer in QuickSurvey Initiation in order to resolve data type mismatch in schema.

https://gerrit.wikimedia.org/r/776985

Change 776985 merged by jenkins-bot:

[mediawiki/extensions/QuickSurveys@master] Convert performanceNow datatype to Integer in QuickSurvey Initiation in order to resolve data type mismatch in schema.

https://gerrit.wikimedia.org/r/776985

Scardenasmolinar subscribed.

Since this has been merged, I'm moving this to the Review/Feedback column.

Jdlrobson subscribed.

Thanks for looking into this! As stated in T305498#7833561 could I recommend this gets backported to 1.39.0-wmf.5 today or tomorrow?

Greetings @Jdlrobson how would I go about merging this patch as you mentioned?

We'd need to schedule a backport on https://wikitech.wikimedia.org/wiki/Deployments in one of the "backport window" table rows. See Thursday, March 31 for an example and feel free to reach out on Slack if something is not clear.

Jdlrobson triaged this task as Unbreak Now! priority.Apr 6 2022, 9:17 PM

This still hasn't been backported and error rate is still unhealthily high. https://logstash.wikimedia.org/app/dashboards#/view/AXN5OoJu3_NNwgAUlbUT?_g=h@40bc43e&_a=h@c32d4b6

(In case it's not obvious there is never a guarantee that the train will be completed in any given week (it might be rolled back or stalled), and it would be dangerous to keep things in this state going into the weekend). Either the surveys should be disabled or the patch applied to 1.39.0-wmf.5.

Greetings @Jdlrobson thank you for checking those logs as I am unable to actually see them I hope to gain access soon. We were unable to make the backport to push our patch to group2 (group0 and group1 are fixed now) however we have undeployed the GDI survey for the EN, ES and FR wikis as of 16:00 UTC - 4 yesterday hoping to help stop some/most of the failures. Did you notice a drop? From what I notice at https://versions.toolforge.org/ the ptwiki is the only active survey that is unpatched, I think.

I'm not seeing a drop. ru.wikipedia.org and es.wikipedia.org seem to be generating the most errors here. There seems to be active surveys on Russian, Spanish, French and Portugeuse (some of those surveys appear to belong to the Performance-Team so tagging accordingly ) all of these in group2 which is still running 1.39.0-wmf.5..

There were 2,774 errors in the last 15 minutes.

@Jdlrobson, looks like the next backport is 03:00 UTC-4. I can get the patch installed at that time, I haven't deployed at that hour yet. @Jdlrobson have you seen deployers usually available at that time?

@Jdlrobson, looking at what surveys are configured in production and the coverage levels set, seems a bit strange to me that we would have such a high error rate for surveys that have 0.0 coverage. That would mean that the QuickSurvey Initiation event is firing at a higher rate than expected per the survey configs. Can we tell that these are all QuickSurvey initiation events?

Change 777775 had a related patch set uploaded (by Jdlrobson; author: Eigyan):

[mediawiki/extensions/QuickSurveys@wmf/1.39.0-wmf.5] Convert performanceNow datatype to Integer in QuickSurvey Initiation in order to resolve data type mismatch in schema.

https://gerrit.wikimedia.org/r/777775

Ideally we'd backport it in the 00:00–01:00 PDT window. If noone in your team is available, I've left this instruction to make sure we don't have this problem over the weekend: T305212#7836795

looking at what surveys are configured in production and the coverage levels set, seems a bit strange to me that we would have such a high error rate for surveys that have 0.0 coverage. That would mean that the QuickSurvey Initiation event is firing at a higher rate than expected per the survey configs. Can we tell that these are all QuickSurvey initiation events?

That's a good question and definitely strange and worth more analysis. They are definitely QuickSurveyInitiation events.

If I go to this page:
https://ru.wikipedia.org/wiki/%D0%9B%D0%B8%D1%87%D0%BD%D1%8B%D0%B9_%D1%87%D0%B5%D0%BC%D0%BF%D0%B8%D0%BE%D0%BD%D0%B0%D1%82_%D0%A1%D0%A1%D0%A1%D0%A0_%D0%BF%D0%BE_%D1%88%D0%B0%D1%85%D0%BC%D0%B0%D1%82%D0%BD%D0%BE%D0%B9_%D0%BA%D0%BE%D0%BC%D0%BF%D0%BE%D0%B7%D0%B8%D1%86%D0%B8%D0%B8_1976

I can see that the QuickSurveys modules are present in the page

15:07:25.009 mw.loader.getState('ext.quicksurveys.lib')
15:07:25.025 "ready"
15:07:45.874 mw.loader.getState('ext.quicksurveys.init')
15:07:45.891 "ready"

It does appear to be logging impressions. Why I'm not sure, but it's possible some older code is loading or there is an error.

Note if a survey is enabled, but at coverage 0.0 it will still load the QuickSurveys code. Any inactive surveys, for this reason, as a best practice should always be disabled using the enabled key.

Note if a survey is enabled, but at coverage 0.0 it will still load the QuickSurveys code. Any inactive surveys, for this reason, as a best practice should always be disabled using the enabled key.

That clears up a lot -> thanks! @Jdlrobson

Jdlrobson lowered the priority of this task from Unbreak Now! to Medium.Apr 7 2022, 2:59 PM

No more validation errors now that wmf5 is everywhere :-)

Change 777775 abandoned by Jdlrobson:

[mediawiki/extensions/QuickSurveys@wmf/1.39.0-wmf.5] Convert performanceNow datatype to Integer in QuickSurvey Initiation in order to resolve data type mismatch in schema.

Reason:

wmf5 seems stable.

https://gerrit.wikimedia.org/r/777775

Thanks for the heads up. Looks like the performanceNow property is part of the base object for all surveys, and our survey is, by quantity, the most affected. To my knowledge, despite our team and survey being labelled as "performance", we do not utilise this property of the event object for this survey.