Page MenuHomePhabricator

[wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed
Closed, ResolvedPublic

Description

Note: The issue seems to be specific to eswiki wmf.26 with "Add links between articles (Machine suggestions)" task type selection

Steps to reproduce:

  • eswiki Special:Homepage select "Add links between articles (Machine suggestions)" task type (it should be the only task type selected). Click the Done button.
  • Refresh the page - "No suggestions found" will be displayed.

detailed Test case steps:

  1. On eswii wmf.26 go to Special:Homepage
  2. Select "Add links between articles (Machine suggestions)" task type and do not select any topics - the counter shows 3,040 available articles (https://es.wikipedia.org/wiki/Especial:NewcomerTasksInfo count is 3,052) Click the Done button.
  3. The counter in SE module for the selection in the above step shows 3,035. Select Mathematics topic the counter 3. Click the Done button.

Refresh the page - "No suggestions found" will be displayed. Refresh the page again - the counter shows 1 of 1 .

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Etonkovidova renamed this task from [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed for filters that do have suggested articles to [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed for filters have suggested articles .Thu, Apr 11, 10:36 PM
Etonkovidova added a project: Regression.
Etonkovidova updated the task description. (Show Details)
Etonkovidova updated the task description. (Show Details)
KStoller-WMF subscribed.
Etonkovidova renamed this task from [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed for filters have suggested articles to [wmf.26 - eswiki] Homepage: task counter issues - "No suggestions found" incorrectly displayed .Fri, Apr 12, 12:32 AM

Mentioned in SAL (#wikimedia-operations) [2024-04-12T09:58:12Z] <urbanecm> mwmaint1002: mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --search-index (T362367)

The more I look at this issue, the more questions this generates in my head. According to the task description, https://es.wikipedia.org/wiki/Especial:NewcomerTasksInfo claims there is 3,052 link-reccomendation tasks in the pool. However, on my screen, I see 1,101:

image.png (309×615 px, 45 KB)

I don't see the 3,052 number anywhere on the board. This is also the same number I saw yesterday (see my Slack message). Why this discrepancy?

According to Grafana, there is ~300 of dangling records in the search index (and 145 thousands of dangling records in the DB). The first number is believable (regardless of how many actual tasks are in the pool), but the latter feels very suspicious (but the DB indeed has 153676 records, so the number has the right order of magnitude at least).

This is not the case for cswiki (15k tasks total, 60 dangling records in the DB). So maybe we have too many dangling stuff for eswiki for some reason, and Add link broke for that reason? I tried runing the fixLinkRecommendationData.php script for eswiki to remove the dangling records in search, and let's see what happens.

After that script ran, I executed the linkTaskCounts.php script to see how many tasks are available:

[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/listTaskCounts.php --wiki=eswiki --tasktype=link-recommendation
Topic                     link-recommendation 
--------------------------------------------------------------------------------
architecture              0          
art                       0          
performing-arts           0          
tv-and-film               0          
comics-and-anime          0          
sports                    0          
entertainment             0          
literature                0          
fashion                   0          
music                     0          
video-games               0          
women                     0          
biography                 0          
food-and-drink            1          
education                 0          
philosophy-and-religion   0          
military-and-warfare      0          
history                   0          
business-and-economics    0          
politics-and-government   0          
society                   0          
transportation            0          
biology                   0          
general-science           0          
computers-and-internet    0          
physics                   0          
engineering               0          
mathematics               0          
medicine-and-health       0          
chemistry                 0          
technology                0          
earth-and-environment     1          
africa                    0          
central-america           1          
north-america             0          
south-america             0          
asia                      1          
europe                    1          
oceania                   0          
[urbanecm@mwmaint1002 ~]$

Assuming this output is correct...then eswiki has almost zero add link tasks in the pool. However, those per-topic numbers are not in sync with the total count at https://es.wikipedia.org/wiki/Especial:NewcomerTasksInfo. Why is that the case? Are some articles not within any topic at all? Why are those tasks not suggested in Homepage?

I checked at the log for the task pool refilling script, and I see the following:

Apr 12 10:06:28 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Thema_Televisión... link recommendation already stored
Apr 12 10:06:28 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Betty_en_NY... link recommendation already stored
Apr 12 10:06:29 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Calamardo_Tentáculos... number of good links too small (1)
Apr 12 10:06:29 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate The_Juice_Is_Loose... link recommendation already stored
Apr 12 10:06:30 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Sonya_Mitchell... number of good links too small (1)
Apr 12 10:06:31 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Fantastic_Duo_(programa_de_televisión)... All of the links in the recommendation have been pruned
Apr 12 10:06:31 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Jared_Daperis... All of the links in the recommendation have been pruned
Apr 12 10:06:31 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate The_Adversary_(Westworld)... link recommendation already stored
Apr 12 10:06:32 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Nobuta_wo_Produce... All of the links in the recommendation have been pruned
Apr 12 10:06:34 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Vix_(servicio_de_streaming)... success, updating index
Apr 12 10:06:34 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Treehouse_of_Horror_XVI... link recommendation already stored
Apr 12 10:06:34 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Riverdale_(serie_de_televisión)... link recommendation already stored
Apr 12 10:06:35 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Lucas_Cruikshank... All of the links in the recommendation have been pruned
Apr 12 10:06:35 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Sarah_Snook... number of good links too small (1)
Apr 12 10:06:36 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Azteca_Honduras... All of the links in the recommendation have been pruned
Apr 12 10:06:36 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Ted_(Buffy_the_Vampire_Slayer)... link recommendation already stored
Apr 12 10:06:36 mwmaint1002 mediawiki_job_growthexperiments-refreshLinkRecommendations-s7[7063]: eswiki:      checking candidate Gareth_(The_Walking_Dead)... link recommendation already stored

All candidates that were checked are either excluded for not meeting the requirements (too few links, all recommendations were excluded, ...). Most of the non-excluded recommendations are not saved, but instead, the script claims they were already saved before (link recommendation already stored). This is supposed to happen only when the script runs twice in a relatively short time period, as updating the database is immediate, but updating the search index takes time (couple of hours, not more). In other words, this message is returned if the script found a dangling database records.

As discussed above, Grafana reports a very high number of dangling DB records. Let's check whether this is a recent issue:

image.png (584×1 px, 66 KB)

The chart indicates that until January 15, the number of dangling DB records was relatively low. From that date, it started to gradually increase, and now the vast majority of link recommendations that are in the DB are dangling. This indicates the search index might not be respecting our requests to update the index at all.

Let's check how the chart looks for all wikis:

image.png (582×1 px, 131 KB)

The blue line (second highest) is eswiki, the line at the top is frwiki. Played with frwiki homepage for a while, and I can observe same symptoms as I can at eswiki, so probably the same issue. The next line is itwiki, which seems to work more or less normally AFAICS.

Based on the information above, it currently seems that the issue is "way too many dangling DB records" (which inherently means we're either not sending any updates to Search or that our updates are ignored by Search). Looking through the logs, I can see an unusually higher number of update requests:

[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations-s7]$ grep 'success, updating index' syslog.log  | wc -l
3020
[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations-s7]$ grep 'success, updating index' syslog.log.1  | wc -l
23424
[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations-s7]$ zgrep 'success, updating index' syslog.log.2.gz  | wc -l
29708
[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations-s7]$ zgrep 'success, updating index' syslog.log.3.gz  | wc -l
27031
[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations-s7]$ zgrep 'success, updating index' syslog.log.4.gz  | wc -l
10025
[urbanecm@mwmaint1002 /var/log/mediawiki/mediawiki_job_growthexperiments-refreshLinkRecommendations-s7]$ zgrep 'success, updating index' syslog.log.5.gz  | wc -l
1834

I think it is more likely that updates sent by us are not honoured on the Search side than the other way around, but I might be totally wrong about this.

Mentioned in SAL (#wikimedia-operations) [2024-04-12T10:55:54Z] <urbanecm> mwmaint1002: mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=frwiki --search-index (T362367)

To test whether updating the index works when there is no dangling record, I did the following:

wikiadmin2023@10.64.32.36(frwiki)> begin;
Query OK, 0 rows affected (0.001 sec)

wikiadmin2023@10.64.32.36(frwiki)> delete from growthexperiments_link_recommendations where gelr_page=11135858;                                                                                                                                     
Query OK, 1 row affected (0.001 sec)

wikiadmin2023@10.64.32.36(frwiki)> commit;
Query OK, 0 rows affected (0.002 sec)

wikiadmin2023@10.64.32.36(frwiki)>

[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=frwiki --page='Helen_Bowater' --verbose
Refreshing link recommendations...
    checking candidate Helen_Bowater... success, updating index
[urbanecm@mwmaint1002 ~]$

https://fr.wikipedia.org/w/index.php?search=pageid%3A+11135858+hasrecommendation%3Alink&title=Sp%C3%A9cial:Recherche&ns0=1 does not have the page just yet, but that is likely because updating index takes time. Will check later.

Seems like this might be search related, so adding that tag.
@dcausse & @EBernhardson do you think this issue could relate to Search or the T359580: CirrusSearch should not send outdated cirrussearch-request events issue?

Hmm, indeed it looks like hourly transfers have been stuck for quite some time. Somehow airflow thinks there are two hours running and it never failed them. It is still waiting for them to complet even though nothing is running. It looks like we never set an SLA value on this dag, so it's failures probably don't get properly recognized. I've reset the two two tasks that were stuck and will see how i can get these all moving again, along with adding an sla so it properly alerts.

Thank you so much, @EBernhardson, we appreciate the help!

Thanks for the info @EBernhardson, and for resetting the dag tasks! Do we need to re-send the updates to CirrusSearch on our end, or are they stored somewhere ("just" waiting to be processed)?

they are stored and processing through now at a rate of something like one hour per minute. It should catchup soon enough.

I re-checked the charts, and the number of dangling DB records is decreasing. It is still very high (a measurement to update the charts for todays numbers is running right now), but it is decreasing. The number of tasks in the pool has also returned to normal levels. I'll keep monitoring this for a while, to ensure the dangling DB returns drop from thousands somewhere lower. I'm unsure what will happen with the task pool once this all finishes – it might be that it skyrockets to thousands and thousands of recommendations, and I'm not sure how our quality assurance features would work (such as preferring underlinked articles – if the suggestions logic exhausts underlinked suggestions, it might follow with the others – I'm not exactly sure how that is implemented, something to watch for I think).

I also noticed the number of dangling search index records is going up – this is likely becuase articles were identified as possible add link tasks at one point (possibly over a month ago), then someone edited them (which invalidates the suggestion) and then search processed the month-old command, generating a dangling record. This can be reasonably easy fixed with the fixing script, so it is not a big problem.

According to Growth's charts (Grafana), the number of dangling DB entries stopped decreasing. For eswiki, it is now at 98k rows.

@EBernhardson Is it possible to say whether all pending updates were already processed? I am wondering whether this high number of dangling records is caused by something on Search's side or Growth's side.

it was backfilling over the weekend but got stuck around feb 6th. It's back to processing hourlies, i expect they will keep decreasing for at least 12 more hours of processing based on the current rates, as long as it doesn't get stuck again. Basically what happened is there is a daily cleanup for old data, and because this is backfilling old data the bits it calculated were deleted in the middle of it working, and it stopped. I've paused the cleanup process for now until it completes.

This looks to be all caught back up from our side

Etonkovidova claimed this task.

Checked on eswiki wmf.2 (and checked Grafana board) - all looks back to normal