Page MenuHomePhabricator

Fix occasional eventgate-analytics-external.error.validation errors from WikiLambda code
Closed, ResolvedPublic

Description

We're only getting a handful of these a week, but e.g. https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-default-1-7.0.0-1-2025.02.01?id=3JsiwZQB60NnMROH5rAc – complaints that `'.zobjecttype' should be string
`, but presumably is set to null/undefined.

Event Timeline

Jdforrester-WMF changed the task status from Open to In Progress.
Jdforrester-WMF triaged this task as Low priority.

Change #1111668 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/extensions/WikiLambda@master] metrics: Don't fall-back to null for ZIDs, that breaks validation

https://gerrit.wikimedia.org/r/1111668

Thanks, @Jdforrester-WMF ! We have code that removes null and undefined values from these events so I don't think we need the #1111668 patch above. In the log stash event that you have linked, under raw_event, I see this:

"zobjecttype":{"Z1K1":"Z7","Z7K1":""},

which I believe accounts for the validation error (even though it complains about ".zobjecttype", with the extra '.' at the beginning). That raises 2 questions:

  1. Does it make sense for the value of zobjecttype to sometimes be a Z7-based type (in which case this should be an easy fix)?
  2. Does the presence of this (incomplete) value suggest some type of bug in our UI code?

Note: by looking further back in log stash, I can also see similar events where the Z7-based type is completely specified.

I will investigate further.

Change #1121485 had a related patch set uploaded (by David Martin; author: David Martin):

[mediawiki/extensions/WikiLambda@master] Event logging: ensure zobjecttype, if known, is a string

https://gerrit.wikimedia.org/r/1121485

I looked through a number of logs with this error, and they all involve zobjecttype being a ZObject representing a function call. I checked our table queries on our metrics dashboard, for uses of zobjecttype. I believe it's fine to just stringify these. I propose using https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikiLambda/+/1121485

Change #1111668 abandoned by Jforrester:

[mediawiki/extensions/WikiLambda@master] metrics: Don't fall-back to null for ZIDs, that breaks validation

Reason:

Going with I5f41560143587239d7306c7a1a6b0976a82622e3 instead.

https://gerrit.wikimedia.org/r/1111668

Change #1121485 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Event logging: ensure zobjecttype, if known, is a string

https://gerrit.wikimedia.org/r/1121485

Hi David - I decided to wait a week and then check log stash to confirm that the (incorrect) log messages have stopped. The log messages were not that frequent, so I think it makes sense to wait a week to have confidence that they have stopped. I'm planning to check on that today or tomorrow.

Okay, good results. I see none of these errors for the last 7 days. I was able to use a logstash query that James kindly provided. (And I increased the time frame to confirm that the query was working properly.)