Page MenuHomePhabricator

Make Cirrus Search dump script more resilient to failures (elasticsearch restarts)
Closed, ResolvedPublic5 Estimated Story Points

Description

Problem: The Cirrus Search dump script is not resilient enough to failures (elasticsearch restarts), causing cirrus dumps for some wikis to be missing for some weeks.

Solution probably involves stopping use of elastic "scroll" API

AC:

  • Make the dump script more resilient, so that cirrus dumps no longer fail when restarting elasticsearch

original log below, and additional logs in comments.

For this week's run:

<13>Oct  7 08:09:36 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20201005/eswiki-20201005-cirrussearch-general.json.gz
<13>Oct  7 08:09:37 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20201005/eswikibooks-20201005-cirrussearch-content.json.gz
<13>Oct  7 08:09:37 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20201005/eswikibooks-20201005-cirrussearch-general.json.gz

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Today's report:

<13>May  4 00:08:29 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210503/commonswiki-20210503-cirrussearch-file.json.gz
MPhamWMF renamed this task from Cirrus Search dumps failed for some wikis to Make Cirrus Search dump script more resilient to failures (elasticsearch restarts).May 6 2021, 2:30 PM
MPhamWMF updated the task description. (Show Details)

Been awhile since I recorded these but, today's report:

<13>Jun 29 02:43:09 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210628/commonswiki-20210628-cirrussearch-file.json.gz

Today's report:

<13>Jul 13 22:18:27 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210712/enwiki-20210712-cirrussearch-general.json.gz

Today's report:

<13>Jul 29 13:55:42 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210726/viwiki-20210726-cirrussearch-general.json.gz

Today's report:

<13>Aug  4 08:24:47 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210802/fiwiki-20210802-cirrussearch-content.json.gz

Today's report:

<13>Sep 13 16:26:42 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arwiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:26:53 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arwiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:27:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arwikisource-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:27:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arwikiversity-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:27:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arwikiversity-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:27:48 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arwiktionary-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:27:49 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arywiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:27:49 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arywiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:16 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arzwiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:16 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/arzwiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:17 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:17 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:17 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwikibooks-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:17 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwikibooks-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:18 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwikiquote-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:18 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwikiquote-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:18 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwiktionary-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:19 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/astwiktionary-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:19 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:19 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:20 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswikibooks-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:20 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswikibooks-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:20 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswikisource-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:20 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswikisource-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:21 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswiktionary-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:21 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aswiktionary-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:22 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/atjwiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:22 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/atjwiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/avkwiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/avkwiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/avwiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/avwiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:24 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/avwiktionary-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:24 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/avwiktionary-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:24 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/awawiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:25 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/awawiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:25 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aywiki-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:25 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aywiki-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:25 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aywikibooks-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:26 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aywikibooks-20210913-cirrussearch-general.json.gz
<13>Sep 13 16:28:26 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aywiktionary-20210913-cirrussearch-content.json.gz
<13>Sep 13 16:28:27 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210913/aywiktionary-20210913-cirrussearch-general.json.gz

Today's report:

<13>Sep 23 21:05:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210920/wikidatawiki-20210920-cirrussearch-general.json.gz
<13>Sep 23 22:06:52 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20210920/zhwiki-20210920-cirrussearch-general.json.gz

Today's report:

<13>Oct 13 14:50:02 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211011/itwiki-20211011-cirrussearch-content.json.gz
<13>Oct 13 14:50:36 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211011/itwiki-20211011-cirrussearch-general.json.gz

From late last week:

<13>Nov  4 08:32:03 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211101/svwiki-20211101-cirrussearch-content.json.gz
<13>Nov  4 17:46:29 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211101/wikidatawiki-20211101-cirrussearch-content.json.gz

From late last week:

<13>Nov 22 16:40:58 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211122/arwiki-20211122-cirrussearch-content.json.gz
<13>Nov 23 03:07:58 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211122/commonswiki-20211122-cirrussearch-file.json.gz
<13>Nov 23 04:10:54 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211122/dewiki-20211122-cirrussearch-general.json.gz
<13>Nov 23 04:42:58 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211122/elwiki-20211122-cirrussearch-content.json.gz

Today's report:

<13>Dec 14 01:57:50 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211213/commonswiki-20211213-cirrussearch-file.json.gz
<13>Dec 14 02:42:39 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211213/dewiki-20211213-cirrussearch-content.json.gz
<13>Dec 14 03:14:27 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211213/dewiki-20211213-cirrussearch-general.json.gz

Change 747879 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/puppet@production] dumps: Move cirrus dumps to friday

https://gerrit.wikimedia.org/r/747879

Change 747879 abandoned by Ebernhardson:

[operations/puppet@production] dumps: Move cirrus dumps to friday

Reason:

moving to friday isn't viable

https://gerrit.wikimedia.org/r/747879

From last week's run:

<13>Dec 29 10:33:08 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20211227/frwiki-20211227-cirrussearch-general.json.gz

Today's report:

<13>Feb  7 23:32:39 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/commonswiki-20220207-cirrussearch-content.json.gz
<13>Feb  7 23:40:38 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/commonswiki-20220207-cirrussearch-general.json.gz
<13>Feb  8 00:02:42 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/commonswiki-20220207-cirrussearch-file.json.gz
<13>Feb  8 00:10:52 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/cswiki-20220207-cirrussearch-content.json.gz
<13>Feb  9 15:57:17 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/ruwiktionary-20220207-cirrussearch-content.json.gz
<13>Feb  9 19:26:37 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/thwiki-20220207-cirrussearch-content.json.gz
<13>Feb  9 19:26:37 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/thwiki-20220207-cirrussearch-general.json.gz
<13>Feb  9 19:26:37 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/thwikibooks-20220207-cirrussearch-content.json.gz
<13>Feb  9 19:26:38 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220207/thwikibooks-20220207-cirrussearch-general.json.gz

Today's report:

<13>Feb 15 19:09:41 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220214/enwiki-20220214-cirrussearch-content.json.gz

Today's report:

<13>Feb 23 14:13:01 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220221/idwiki-20220221-cirrussearch-general.json.gz

Today's report:

 <13>Mar  1 09:32:24 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220228/commonswiki-20220228-cirrussearch-file.json.gz
<13>Mar  1 11:21:16 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220228/dewikibooks-20220228-cirrussearch-content.json.gz
<13>Mar  1 22:37:27 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220228/enwikisource-20220228-cirrussearch-content.json.gz

Today's report:

<13>Mar 14 21:33:11 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220314/cebwiki-20220314-cirrussearch-content.json.gz
<13>Mar 14 21:33:13 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220314/cebwiki-20220314-cirrussearch-general.json.gz

Today's report:

<13>Mar 21 23:40:42 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220321/commonswiki-20220321-cirrussearch-content.json.gz
<13>Mar 22 11:41:07 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220321/commonswiki-20220321-cirrussearch-file.json.gz

Today's report:

<13>Mar 31 17:56:39 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220328/wikidatawiki-20220328-cirrussearch-content.json.gz

Today's report:

<13>Apr 19 19:01:06 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220418/fawiki-20220418-cirrussearch-general.json.gz
<13>Apr 19 21:12:26 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220418/frwiki-20220418-cirrussearch-general.json.gz
<13>Apr 19 21:47:35 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220418/frwikisource-20220418-cirrussearch-content.json.gz

It looks like those dumps are very rarely accessed externally (1 hit over the last 30 days according to turnilo - webrequest_sampled_128).

Gehel lowered the priority of this task from Medium to Low.May 9 2022, 7:12 PM

It looks like those dumps are very rarely accessed externally (1 hit over the last 30 days according to turnilo - webrequest_sampled_128).

They are more likely downloaded from our mirrors than from us, since we aggressively bandwidth cap connections.

Today's report:

<13>Jun  7 06:08:15 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220606/commonswiki-20220606-cirrussearch-file.json.gz

Yesterday's report:

<13>Jun 14 06:00:28 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220613/commonswiki-20220613-cirrussearch-file.json.gz

This week's report:

<13>Jun 29 20:28:04 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220627/kkwiki-20220627-cirrussearch-content.json.gz
<13>Jun 29 20:50:12 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220627/kowiki-20220627-cirrussearch-content.json.gz

Yesterday’s report:

<13>Jul  7 21:43:57 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220704/zhwiki-20220704-cirrussearch-general.json.gz

Yesterday's report:

<13>Jul 26 06:05:20 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220725/commonswiki-20220725-cirrussearch-file.json.gz
<13>Jul 26 15:25:27 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220725/enwiki-20220725-cirrussearch-general.json.gz
<13>Jul 28 14:21:06 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220725/wikidatawiki-20220725-cirrussearch-content.json.gz

Yesterday's report:

<13>Aug  2 18:47:16 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220801/dewiki-20220801-cirrussearch-general.json.gz
<13>Aug  2 19:40:27 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220801/enwiki-20220801-cirrussearch-content.json.gz
<13>Aug  2 20:05:28 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220801/enwiki-20220801-cirrussearch-general.json.gz
<13>Aug  2 20:25:41 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220801/enwikisource-20220801-cirrussearch-content.json.gz
<13>Aug  2 20:46:58 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220801/enwiktionary-20220801-cirrussearch-content.json.gz

Yesterday's report:

<13>Aug  9 14:50:46 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220808/commonswiki-20220808-cirrussearch-file.json.gz
<13>Aug  9 17:00:55 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220808/dewiki-20220808-cirrussearch-general.json.gz
<13>Aug 11 16:28:02 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220808/viwiki-20220808-cirrussearch-general.json.gz
<13>Aug 12 02:08:22 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220808/wikidatawiki-20220808-cirrussearch-content.json.gz

Yesterday's report:

<13>Aug 22 19:49:27 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220822/bjnwiktionary-20220822-cirrussearch-content.json.gz
<13>Aug 22 19:49:28 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220822/bjnwiktionary-20220822-cirrussearch-general.json.gz
<13>Aug 24 14:23:12 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220822/guwwiktionary-20220822-cirrussearch-content.json.gz
<13>Aug 24 14:23:12 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220822/guwwiktionary-20220822-cirrussearch-general.json.gz
<13>Aug 25 00:51:00 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220822/pcmwiki-20220822-cirrussearch-content.json.gz
<13>Aug 25 00:51:00 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220822/pcmwiki-20220822-cirrussearch-general.json.gz

Yesterday's report:

<13>Aug 29 19:49:44 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220829/bjnwiktionary-20220829-cirrussearch-content.json.gz
<13>Aug 29 19:49:44 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220829/bjnwiktionary-20220829-cirrussearch-general.json.gz
<13>Aug 30 00:40:19 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220829/commonswiki-20220829-cirrussearch-general.json.gz
<13>Aug 31 10:12:16 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220829/guwwiktionary-20220829-cirrussearch-content.json.gz
<13>Aug 31 10:12:16 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20220829/guwwiktionary-20220829-cirrussearch-general.json.gz

moving to needs triage, would like to consider bringing this into current work. As part of the cirrus-streaming-updater project we would like to move incoming_links counting from realtime per-update to a batch process calculated weekly. It would be reasonable to import these dumps to yarn and do that batch counting from there.

The main idea for fixing the resiliency is to switch from a scroll request to pagination via scroll_after which elasticsearch now provides and is intended to work around needing to keep state on individual nodes. It offers slightly less guarantees, scroll gave strong gurantees about keeping shard segments open and ensuring we had a complete snapshot of the database, but for our purposes the slightly weaker promises from search_after are likely sufficient.

Change 835277 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Replace scroll in DumpIndex.php with search_After

https://gerrit.wikimedia.org/r/835277

Change 835277 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Replace scroll in DumpIndex.php with search_After

https://gerrit.wikimedia.org/r/835277

Yesterday's report:

<13>Oct  4 16:01:15 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/crhwiki-20221003-cirrussearch-content.json.gz
<13>Oct  4 19:15:51 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/elwikibooks-20221003-cirrussearch-content.json.gz
<13>Oct  4 19:15:51 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/elwikibooks-20221003-cirrussearch-general.json.gz
<13>Oct  5 06:49:34 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/eowikisource-20221003-cirrussearch-content.json.gz
<13>Oct  5 08:07:55 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/eswikivoyage-20221003-cirrussearch-content.json.gz
<13>Oct  5 12:44:42 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/frwikivoyage-20221003-cirrussearch-content.json.gz
<13>Oct  5 20:38:23 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/maiwiki-20221003-cirrussearch-content.json.gz
<13>Oct  6 07:14:51 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/pmswiki-20221003-cirrussearch-content.json.gz
<13>Oct  6 18:11:37 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221003/sewiki-20221003-cirrussearch-content.json.gz

With these errors dated oct 4-6 it's not clear that they would have been running the new search_after code. Hopefully this fails less on this weeks run. I've checked the current log outputs and all seems happy (except bclwikiquote which was a brand new wiki without an index yet), but it's only up to commonswiki_file. Unfortunately this weeks dumps with the new code seem to be running signficantly slower, might have to investigate that.

We've used a similar dumping strategy to import search indices into yarn, there it's able to dump all wikis in ~3.5 hours, although it's issuing 72 shard queries in parallel whereas the cirrus version does 1 index at a time with however many shards it has (from 1 to 32).

With these errors dated oct 4-6 it's not clear that they would have been running the new search_after code. Hopefully this fails less on this weeks run. I've checked the current log outputs and all seems happy (except bclwikiquote which was a brand new wiki without an index yet), but it's only up to commonswiki_file. Unfortunately this weeks dumps with the new code seem to be running signficantly slower, might have to investigate that.

We've used a similar dumping strategy to import search indices into yarn, there it's able to dump all wikis in ~3.5 hours, although it's issuing 72 shard queries in parallel whereas the cirrus version does 1 index at a time with however many shards it has (from 1 to 32).

Yes, you are right. The dumps run is moving very slowly right now, and enwiki just started today; The previous run started on the 3rd, but enwiki started on the 4th and took a day to complete. Based on this statistic from the previous run, we are currently 2 days late because the current run started on the 11th and enwiki just started today. Although they are moving incredibly slowly in comparison to the last run, which took more than a week, we have no idea when this one will complete.

Today's report:

<13>Oct 12 00:52:48 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221011/bclwikiquote-20221011-cirrussearch-content.json.gz
<13>Oct 12 00:52:48 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221011/bclwikiquote-20221011-cirrussearch-general.json.gz
<13>Oct 20 23:55:19 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221011/zhwiki-20221011-cirrussearch-general.json.gz

In response to my previous comment on the runtime, The last run started on Oct 11th and ended on Oct 21st. It took 10 days in total to complete the run.

I mistakenly didn't attach to this ticket, but another patch has been merged to use backoff with retries that will hopefully manage to get past the previous zhwiki failure and stabilize the dumps.

In terms of the runtime, i suspect the only real options are to parallelize the dumping but that's a bit tedious to do in the bash scripting. It's not quite as simple as asking xargs to run the scripts, there is a variety of things happening within the bash script which probably require exporting a function from the bash script and invoking it from xargs. While doable that seems error prone, perhaps we allow the dumps to take whatever time is necessary?

Hey @EBernhardson, so Hannah and I talked a little about the issue of the long run time of the dumps, which we can expect to get longer over time. One issue we have now is that we used to try to schedule any maintenance for the weekends, when all the misc dumps (including CirrusSearch) were complete. There is no longer any such window; in fact, we now would have to choose between breaking two search dumps and breaking various wikibase entity dumps; neither of those is a good choice.

Of course the longer the dumps take to run, the more stale the data gets; I don't know who uses these internally or externally and what impact that may have on their use. Allowing indefinite growth of the runtime makes me uneasy, however.

Lastly, the assumption made in these draft guidelines https://wikitech.wikimedia.org/wiki/Dumps/New_dumps_and_datasets although it was apparently never spelled out, is that one set of a type of dump would complete before the next set of that type started to run. We had expected folks to build in the ability to shard or scale their particular dumps appropriately.

Having said that, I understand that refactoring the existing script may take time and some thought. We can help in the discussion phase, with code review of course, and by doing some limited testing as well. One simpler approach might be to move a few of the larger wikis that take longer, out into a separate list, so: enwiki, wikidatawiki, commonswiki, to start with, and see how that is with two lists being processed at once. commonswiki alone takes nearly 4 days to run.

What do you think?

Weekly report for the record, without the latest fix:

<13>Nov  1 20:17:11 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221021/wikidatawiki-20221021-cirrussearch-general.json.gz

splitting this up to run a few separate groups of wikis seems much more doable than trying to parallelize the bash function, will look into how that can be done. Perhaps we can run one for each db group since thats a reasonably convenient split of wikis by size that already exists.

Change 856654 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[operations/puppet@production] snapshot: Parallelize cirrus dumps by db shard

https://gerrit.wikimedia.org/r/856654

Last Wednesday's report:

<13>Nov 18 12:16:54 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221115/cowiki-20221115-cirrussearch-content.json.gz
<13>Nov 23 12:57:44 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221115/wikidatawiki-20221115-cirrussearch-general.json.gz

The error this time around is:

Elastica\Exception\ClientException from line 26 of /srv/mediawiki/php-1.40.0-wmf.10/vendor/ruflin/elastica/src/Connection/Strategy/Simple.php: No enabled connection
#0 /srv/mediawiki/php-1.40.0-wmf.10/vendor/ruflin/elastica/src/Connection/ConnectionPool.php(86): Elastica\Connection\Strategy\Simple->getConnection(Array)
#1 /srv/mediawiki/php-1.40.0-wmf.10/vendor/ruflin/elastica/src/Client.php(394): Elastica\Connection\ConnectionPool->getConnection()
#2 /srv/mediawiki/php-1.40.0-wmf.10/vendor/ruflin/elastica/src/Client.php(508): Elastica\Client->getConnection()
#3 /srv/mediawiki/php-1.40.0-wmf.10/vendor/ruflin/elastica/src/Search.php(278): Elastica\Client->request('cowiki_content/...', 'POST', Array, Array)
#4 /srv/mediawiki/php-1.40.0-wmf.10/extensions/CirrusSearch/includes/Elastica/SearchAfter.php(90): Elastica\Search->search()
#5 /srv/mediawiki/php-1.40.0-wmf.10/extensions/CirrusSearch/includes/Elastica/SearchAfter.php(70): CirrusSearch\Elastica\SearchAfter->runSearch()
#6 /srv/mediawiki/php-1.40.0-wmf.10/extensions/CirrusSearch/maintenance/DumpIndex.php(163): CirrusSearch\Elastica\SearchAfter->next()
#7 /srv/mediawiki/php-1.40.0-wmf.10/maintenance/includes/MaintenanceRunner.php(309): CirrusSearch\Maintenance\DumpIndex->execute()
#8 /srv/mediawiki/php-1.40.0-wmf.10/maintenance/doMaintenance.php(85): MediaWiki\Maintenance\MaintenanceRunner->run()
#9 /srv/mediawiki/php-1.40.0-wmf.10/extensions/CirrusSearch/maintenance/DumpIndex.php(288): require_once('/srv/mediawiki/...')
#10 /srv/mediawiki/multiversion/MWScript.php(120): require_once('/srv/mediawiki/...')
#11 {main}

This looks to be functionality within Elastica that removes clients from the connection pool if it has a problem opening a connection. Specifically these errors were CURLE_PARTIAL_FILE errors, indicating the connection dropped before the expected response length could be read. I believe these should be retryable and will turn into another error type on the retry if the instance is truly gone. In our case LVS will most likely route the request to a different instance. Will update our connection handling to retry these without disabling the connection.

Change 862336 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/Elastica@master] Retry CURLE_PARTIAL_FILE responses

https://gerrit.wikimedia.org/r/862336

Change 856654 merged by Bking:

[operations/puppet@production] snapshot: Parallelize cirrus dumps by db shard

https://gerrit.wikimedia.org/r/856654

Change 862336 merged by jenkins-bot:

[mediawiki/extensions/Elastica@master] Don't fail connection after CURLE_PARTIAL_FILE response

https://gerrit.wikimedia.org/r/862336

Change 856655 had a related patch set uploaded (by Ryan Kemper; author: Ebernhardson):

[operations/puppet@production] snapshot: Remove absented cirrus dump job

https://gerrit.wikimedia.org/r/856655

Yesterday's report:

<13>Nov 30 15:39:42 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20221123/frwiki-20221123-cirrussearch-general.json.gz

Same top level error as before:

Elastica\Exception\ClientException from line 26 of /srv/mediawiki/php-1.40.0-wmf.10/vendor/ruflin/elastica/src/Connection/Strategy/Simple.php: No enabled connection

And same underlying error:

Unexpected connection error communicating with Elasticsearch. Curl code: 18

Expecting this will be fixed as the patch from dec 6 rolls out with next weeks train

Today's report:

<13>Jan  9 19:14:29 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20230109/wikidatawiki-20230109-cirrussearch-general.json.gz

<13>Jan  9 19:42:50 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20230109/gorwiktionary-20230109-cirrussearch-content.json.gz
<13>Jan  9 19:43:06 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20230109/gorwiktionary-20230109-cirrussearch-general.json.gz
<13>Jan  9 19:43:25 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20230109/guwwikiquote-20230109-cirrussearch-content.json.gz
<13>Jan  9 19:43:41 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20230109/guwwikiquote-20230109-cirrussearch-general.json.gz
<13>Jan  9 20:08:19 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20230109/shnwikibooks-20230109-cirrussearch-content.json.gz
<13>Jan  9 20:08:35 dumpsgen: extensions/CirrusSearch/maintenance/DumpIndex.php failed for /mnt/dumpsdata/otherdumps/cirrussearch/20230109/shnwikibooks-20230109-cirrussearch-general.json.gz

Note that these are from s8 and s5, respectively.

Change 878175 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] SearchAfter: Increase default retries to 12

https://gerrit.wikimedia.org/r/878175

Hmm, gorwiktionary, guwwikiquote and shnwikibooks are all new wikis as of jan 4th, it seems yet again the deployment process didn't create the appropriate indices. Unfortunately this isn't an automated process that generates logs so it's not clear what is going wrong there. Nothing relevant seems to be logged, which should if index creation failed. I've added a note to the next wiki creation for them to record the output of the maintenance script, hopefully can track something down.

wikidatawiki-general failed with a ResponseException that had no message attached to it. Annoyingly the client library used here only attaches messages for well-formed errors. Many different dump's had errors at that moment in time, , but most managed to recover with the retry mechanism only wikidatawiki-general ended up completely bailing. There was a node restarted at 19:13 which seems to have kicked off the errors that led to wikidatawiki failing at 19:14, but the system needs to be resilient enough to continue on through node restarts. Overall this request was retried 5 times with increasing delays over ~20s. That still seems like too little time, so i've put up a patch to increase the number of retries so it will have ~10m of retry failures before bailing.

Change 878175 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] SearchAfter: Increase default retries to 12

https://gerrit.wikimedia.org/r/878175

We haven't seen any failure lately. Let's do one last pass to check logs and if all looks clean, we can call this done.

MPhamWMF set the point value for this task to 5.Feb 6 2023, 4:32 PM

Took a look over the current logs, everything seems reasonable. CirrusSearch should only output Dump done into the log file if it completely succesfully. I ran a quick script to check all existing log files (covering 20230123 - 20230213) on snapshot1008 and they all contain this string.

for i in /var/log/cirrusdump/*.log; do
  if ! tail -n 1 $i | grep -q "Dump done"; then
    echo FAIL: $i
  fi
done

Hello @EBernhardson, The dumps were completed within a few days, and the resources used on the host are within limits. This is better and faster than when they took more than a week to complete. Thank you!