Page MenuHomePhabricator

Monitor "no suggestion" rate for Add Link tasks
Closed, ResolvedPublic

Description

In the past we had repeated problems with the Add Link database and search index getting out of sync and users getting "no suggestions for this page" errors upon arrival. While we think we fixed all those problems, we should set up some monitoring so we can notice if there are a significant number of such errors.

Event Timeline

Tgr triaged this task as High priority.
Tgr moved this task from Incoming to In Progress on the Growth-Team (Current Sprint) board.

Change 697680 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Add Link: Send "no suggestions found" events to statsd

https://gerrit.wikimedia.org/r/697680

kostajh added a subscriber: kostajh.

Oops, sorry, this one still needs to be merged. @Tgr the patch needs a rebase.

Change 697680 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add Link: Send "no suggestions found" events to statsd

https://gerrit.wikimedia.org/r/697680

Thanks for rebasing! I'm putting it back to In Progress for adding to the dashboard.

This still needs an addition to the Grafana dashboard, so moving back to In Progress.

Dashboard: https://grafana.wikimedia.org/d/vGq7hbnMz/special-homepage-and-suggested-edits?viewPanel=35&orgId=1
(Shows the number of errors per hour. I guess it would be more informative if we knew the number of tasks opens per hour.)

I'm thinking we should also chart the number of tasks which are present in the search index but not in the database (can be counted with fixLinkRecommendationData.php --verbose, we could give it a --statsd option, like for listTaskCounts.php). Maybe also the opposite (tasks present in the database but not in the search index) although that's less immediately problematic.

Change 702751 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: add statsd option

https://gerrit.wikimedia.org/r/702751

Change 702751 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData.php: add statsd option

https://gerrit.wikimedia.org/r/702751

Moving back to in progress for adding the cronjob.

Change 712924 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] AddLink: Use statsd monitoring for errors on server side

https://gerrit.wikimedia.org/r/712924

Change 713019 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData: allow random sampling

https://gerrit.wikimedia.org/r/713019

There are some more patches to review now.

Change 713019 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData: allow random sampling

https://gerrit.wikimedia.org/r/713019

Change 714449 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData: Allos --db-table in dry-run mode

https://gerrit.wikimedia.org/r/714449

Change 714449 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] fixLinkRecommendationData: Allow --db-table in dry-run mode

https://gerrit.wikimedia.org/r/714449

Change 715824 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.20] fixLinkRecommendationData: Allow --db-table in dry-run mode

https://gerrit.wikimedia.org/r/715824

Change 715824 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.37.0-wmf.20] fixLinkRecommendationData: Allow --db-table in dry-run mode

https://gerrit.wikimedia.org/r/715824

Mentioned in SAL (#wikimedia-operations) [2021-09-01T23:24:42Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 3c7d4ecc699b7c68467a372686f5514375d2b74f: fixLinkRecommendationData: Allow --db-table in dry-run mode (T283868) (duration: 01m 06s)

Change 716755 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/puppet@production] Run GrowthExperiments fixLinkRecommendationData --dry-run every day

https://gerrit.wikimedia.org/r/716755

Tested in production, seems to work well, takes about 2 min to run on all wikis. Dashboards: dangling search index records, dangling DB records (they only contain data from the manual runs so not much to see there until the puppet patch is merged). We have a significant but not large amount (hundreds) of both types of dangling records on some wikis.

Change 716755 merged by Jbond:

[operations/puppet@production] Run GrowthExperiments fixLinkRecommendationData --dry-run every day

https://gerrit.wikimedia.org/r/716755

All patches merged, moving back to Ready for Dev for further dashboard work.

Change 712924 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] AddLink: Use statsd monitoring for errors on server side

https://gerrit.wikimedia.org/r/712924

All patches merged, moving back to Ready for Dev for further dashboard work.

... which, as a note to self / others, we should do after MW-1.37-notes (1.37.0-wmf.23; 2021-09-13) is in production to include the logging from https://gerrit.wikimedia.org/r/712924

kostajh lowered the priority of this task from High to Medium.Sep 9 2021, 1:42 PM

It would be nice to figure out how to show a graph instead of a single data point for the daily cronjob based data, but no need to keep this task for that.

Etonkovidova added a subscriber: Etonkovidova.

Checked in wmf.2 - both Dangling search index records and Dangling DB records charts are present on Growth Team/Special:Homepage and Suggested Edits dash board. No spikes after wmf.2 deployment.