In the past we had repeated problems with the Add Link database and search index getting out of sync and users getting "no suggestions for this page" errors upon arrival. While we think we fixed all those problems, we should set up some monitoring so we can notice if there are a significant number of such errors.
|Open||Tgr||T283868 Monitor "no suggestion" rate for Add Link tasks|
|Resolved||Tgr||T289550 Add Link: Set up cronjob for collecting statsd metrics about dangling search index entries|
(Shows the number of errors per hour. I guess it would be more informative if we knew the number of tasks opens per hour.)
I'm thinking we should also chart the number of tasks which are present in the search index but not in the database (can be counted with fixLinkRecommendationData.php --verbose, we could give it a --statsd option, like for listTaskCounts.php). Maybe also the opposite (tasks present in the database but not in the search index) although that's less immediately problematic.
Mentioned in SAL (#wikimedia-operations) [2021-09-01T23:24:42Z] <urbanecm@deploy1002> Synchronized php-1.37.0-wmf.20/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 3c7d4ecc699b7c68467a372686f5514375d2b74f: fixLinkRecommendationData: Allow --db-table in dry-run mode (T283868) (duration: 01m 06s)
Tested in production, seems to work well, takes about 2 min to run on all wikis. Dashboards: dangling search index records, dangling DB records (they only contain data from the manual runs so not much to see there until the puppet patch is merged). We have a significant but not large amount (hundreds) of both types of dangling records on some wikis.