Page MenuHomePhabricator

Investigate failed Cirrus index build services on mwmaint2002
Closed, ResolvedPublic

Description

The following units are failed:

● mediawiki_job_cirrus_build_completion_indices_codfw.service loaded failed failed MediaWiki periodic job ci
● mediawiki_job_cirrus_build_completion_indices_eqiad.service loaded failed failed MediaWiki periodic job ci
● mediawiki_job_growthexperiments-fixLinkRecommendationData-dryrun.service loaded failed failed MediaWiki periodic job gr
● mediawiki_job_growthexperiments-userImpactUpdateRecentlyEdited.service loaded failed failed MediaWiki periodic job gr

AC:

  • investigate why the jobs failed
    • OOM issues on the mwmaint machine, does appear to be isolated to october 25
  • cleanup any stale indices
    • dewiki was cleaned up

Details

Other Assignee
bking

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2024-10-28T08:42:10Z] <dcausse> T378227: deleting broken cirrus titlesugest index dewiki_titlesuggest_1729824440

Failures appear to be caused by OOM errors lile:

Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zhwiki mmap() failed: [12] Cannot allocate memory
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zhwiki
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zhwiki mmap() failed: [12] Cannot allocate memory
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zhwiki Fatal error: Out of memory (allocated 233308160) (tried to allocate 10502144 bytes) in /srv/mediawiki/php-1.43.0-wmf.28/vendor/ruflin/elastica/src/Transport/Ht
tp.php on line 162
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: Out of memory!
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: warwiki
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zh_min_nanwiki
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: warwiki mmap() failed: [12] Cannot allocate memory
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zh_min_nanwikiwarwiki mmap() failed: [12] Cannot allocate memory
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]:  
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: warwikizh_min_nanwiki mmap() failed: [12] Cannot allocate memory
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]:  
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zh_min_nanwiki mmap() failed: [12] Cannot allocate memory
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: zh_min_nanwiki Fatal error: Out of memory (allocated 143130624) (tried to allocate 737280 bytes) in /srv/mediawiki/php-1.43.0-wmf.28/vendor/ruflin/elastica/src/Transp
ort/Http.php on line 162
Oct 25 06:53:11 mwmaint2002 mediawiki_job_cirrus_build_completion_indices_eqiad[14668]: warwiki Fatal error: Out of memory (allocated 122159104) (tried to allocate 573440 bytes) in /srv/mediawiki/php-1.43.0-wmf.28/vendor/ruflin/elastica/src/Transport/Htt
p.php on line 162

All these happening on October 25, subsequent runs on the 26, 26 and 28 did not seem to have similar issues.
Some cleanups might still be needed.

Mentioned in SAL (#wikimedia-operations) [2024-10-28T10:36:24Z] <dcausse> T378227: rebuilding dewiki_titlesuggest

Gehel triaged this task as Medium priority.Oct 28 2024, 2:54 PM
Gehel edited projects, added Discovery-Search (Current work); removed Discovery-Search.
bking updated Other Assignee, added: bking.
dcausse renamed this task from Investigate failed Cirrus index build services on mwmaint2002 (WIP) to Investigate failed Cirrus index build services on mwmaint2002.Oct 29 2024, 1:24 PM
dcausse updated the task description. (Show Details)