Page MenuHomePhabricator

NewcomerTask EventLogging schema has invalid array items type specification
Closed, ResolvedPublic

Description

Has

"maintenance_templates": {
    "type": "array",
    "items": [
        {
            "type": "string"
        }
    ],

But should be

"maintenance_templates": {
    "type": "array",
    "items": 
        {
            "type": "string"
        }

This is causing the data to fail the Hive ingestion (Refine) step:

Failure(org.wikimedia.analytics.refinery.job.refine.RefineTargetException: Failed refinement of hdfs://analytics-hadoop/wmf/data/raw/eventlogging/eventlogging_NewcomerTask/hourly/2020/06/16/15 -> `event`.`NewcomerTask` (year=2020,month=6,day=16,hour=15). Original exception: java.lang.IllegalArgumentException: `maintenance_templates` array schema must specify the items type field)

Event Timeline

Ottomata created this task.Jun 16 2020, 6:52 PM
Restricted Application added a project: Analytics. · View Herald TranscriptJun 16 2020, 6:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Tgr added a comment.EditedJun 16 2020, 8:11 PM

This is actually valid syntax, although incorrect for the specific schema as they mean different things.

Change 606008 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Fix NewcomerTask schema

https://gerrit.wikimedia.org/r/606008

Change 606121 had a related patch set uploaded (by Kosta Harlan; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.37] Fix NewcomerTask schema

https://gerrit.wikimedia.org/r/606121

Change 606122 had a related patch set uploaded (by Kosta Harlan; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.36] Fix NewcomerTask schema

https://gerrit.wikimedia.org/r/606122

Change 606008 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Fix NewcomerTask schema

https://gerrit.wikimedia.org/r/606008

Change 606122 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.36] Fix NewcomerTask schema

https://gerrit.wikimedia.org/r/606122

Change 606121 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@wmf/1.35.0-wmf.37] Fix NewcomerTask schema

https://gerrit.wikimedia.org/r/606121

Mentioned in SAL (#wikimedia-operations) [2020-06-17T11:18:32Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.36/extensions/GrowthExperiments/extension.json: [[gerrit:606121|Fix NewcomerTask schema (T255597)]] (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2020-06-17T11:23:04Z] <ladsgroup@deploy1001> Synchronized php-1.35.0-wmf.37/extensions/GrowthExperiments/extension.json: [[gerrit:606122|Fix NewcomerTask schema (T255597)]] (duration: 01m 04s)

New events are being refined correctly with this schema, the hour that failed has been scheduled to re-run.

MNeisler moved this task from Triage to Tracking on the Product-Analytics board.Jun 17 2020, 3:21 PM

This is actually valid syntax, although incorrect for the specific schema as they mean different things.

Ya valid JSONSchema but not valid for what we allow. All array items types must be specific.

Thanks BTW :)

fdans edited projects, added Analytics-Radar; removed Analytics.Jun 18 2020, 4:01 PM
fdans added a subscriber: fdans.

@Tgr can you confirm the correct data is there?

Ya valid JSONSchema but not valid for what we allow. All array items types must be specific.

The items tuple notation could still achieve that together with additionalItems: false, so that documentation should probably be more specific about what is accepted.

@Tgr can you confirm the correct data is there?

@nettrom_WMF or @Etonkovidova would you be able to help out with that?

that documentation should probably be more specific about what is accepted.

Ok, updated https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#arrays, does that sound better?

Etonkovidova added a comment.EditedJun 22 2020, 4:37 PM

@Tgr There are no validation errors in wmf.37 - I checked it on testwiki, but the error is still present in betalabs.

 EventLogging Validation: [NewcomerTask] Value null is the wrong type for property "maintenance_templates" (array expected)
EventLogging Validation: [NewcomerTask] Value null is the wrong type for property "revision_id" (integer expected) 

EventLogging Validation: [NewcomerTask] Value null is the wrong type for property "page_id" (integer expected)

Should it be corrected in betalabs?

Tgr added a comment.Jun 22 2020, 5:01 PM

That only happens because the page doesn't exist, which won't happen in production.

Etonkovidova closed this task as Resolved.Jun 22 2020, 6:01 PM

That only happens because the page doesn't exist, which won't happen in production.

Right! Thanks.

FYI, this was not a JSONSchema validation error, it was a Hive ingestion error. The JSONSchema did not fully specify the strict types of all elements of the array, so Hive could not declare a SQL type for the field. I think Fdans was asking to make sure that the data that Dan backfilled in Hive is in place correctly. If you aren't worried about that data because this instrumentation isn't fully in production yet, then no need to check. :)