Page MenuHomePhabricator

Understand why BotCancellazioni is stuck (after every forced wiki read-only mode) apparently for cURL reasons
Closed, ResolvedPublic3 Estimated Story Points

Description

What we know:

/data/project/itwiki/itwiki-deletionbot.err
PHP Fatal error:  Uncaught mw\API\ReadOnlyException: API error code: 'readonly' info: 'The wiki is currently in read-only mode.' in /data/project/itwiki/deletionbot/includes/boz-mw/include/mw/API/Exception.php:81
Stack trace:
#0 /data/project/itwiki/deletionbot/includes/boz-mw/include/mw/API.php(328): mw\API\Exception::createFromApiError()
#1 /data/project/itwiki/deletionbot/includes/boz-mw/include/network/HTTPRequest.php(385): mw\API->onFetched()
#2 /data/project/itwiki/deletionbot/includes/boz-mw/include/mw/API.php(147): network\HTTPRequest->fetch()
#3 /data/project/itwiki/deletionbot/includes/boz-mw/include/network/HTTPRequest.php(409): mw\API->fetch()
#4 /data/project/itwiki/deletionbot/includes/boz-mw/include/mw/API.php(240): network\HTTPRequest->post()
#5 /data/project/itwiki/deletionbot/includes/boz-mw/include/mw/StaticSite.php(239): mw\API->login()
#6 /data/project/itwiki/deletionbot/includes/class-Page.php(90): mw\StaticSite->login()
#7 /data/project/itwiki/deletionbot/includes/class-PageTemplated.php(134): itwikidelb in /data/project/itwiki/deletionbot/includes/boz-mw/include/mw/API/Exception.php on line 81
ls -l /data/project/itwiki/itwiki-deletionbot.err
-rw-r--r-- 1 tools.itwiki tools.itwiki 463227 25 set 15.00 itwiki-deletionbot.err
$ tail /data/project/itwiki/itwiki-deletionbot.out 
[2024-09-27 10:00:04][INFO] start
[2024-09-27 10:00:05][INFO] work on 2024/9/27
[2024-09-27 10:00:06][INFO] found 1 PDCs
[2024-09-27 10:00:07][INFO] login with username 'BotCancellazioni@itwiki@toolforge'
[2024-09-27 10:00:08][INFO] writing [[Wikipedia:Pagine da cancellare/Conta/2024 settembre 27]]
[2024-09-27 10:00:08][INFO] writing [[Wikipedia:Pagine da cancellare/Log/2024 settembre 27]]
[2024-09-27 10:00:09][INFO] work on 2024/9/26
[2024-09-27 10:00:11][INFO] found 7 PDCs
[2024-09-27 10:00:14][INFO] writing [[Wikipedia:Pagine da cancellare/Conta/2024 settembre 26]]
[2024-09-27 10:00:14][INFO] writing [[Wikipedia:Pagine da cancellare/Log/2024 settembre 26]]
[2024-09-27 10:00:15][INFO] work on 2024/9/25
[2024-09-27 10:00:16][WARN] unexpected PDC category 'Categoria:Errori del modulo citazione - citazioni con U
[2024-09-27 10:00:16][WARN]  RL nudi'
[2024-09-27 10:00:17][INFO] found 9 PDCs
[2024-09-27 10:00:20][INFO] writing [[Wikipedia:Pagine da cancellare/Conta/2024 settembre 25]]
[2024-09-27 10:00:21][INFO] writing [[Wikipedia:Pagine da cancellare/Log/2024 settembre 25]]
[2024-09-27 10:00:22][INFO] work on 2024/9/24
[2024-09-27 10:00:24][INFO] found 5 PDCs
[2024-09-27 10:00:26][INFO] writing [[Wikipedia:Pagine da cancellare/Conta/2024 settembre 24]]
[2024-09-27 10:00:26][INFO] writing [[Wikipedia:Pagine da cancellare/Log/2024 settembre 24]]
[2024-09-27 10:00:27][INFO] work on 2024/9/23
[2024-09-27 10:00:29][INFO] found 6 PDCs
[2024-09-27 10:00:31][INFO] writing [[Wikipedia:Pagine da cancellare/Conta/2024 settembre 23]]
[2024-09-27 10:00:32][INFO] writing [[Wikipedia:Pagine da cancellare/Log/2024 settembre 23]]
[2024-09-27 10:00:33][INFO] work on 2024/9/22
[2024-09-27 10:00:35][WARN] unexpected PDC category 'Categoria:Errori del modulo citazione - citazioni con U
[2024-09-27 10:00:35][WARN]  RL nudi'
[2024-09-27 10:00:35][INFO] found 5 PDCs
[2024-09-27 10:00:37][INFO] writing [[Wikipedia:Pagine da cancellare/Conta/2024 settembre 22]]
[2024-09-27 10:00:37][INFO] writing [[Wikipedia:Pagine da cancellare/Log/2024 settembre 22]]
[2024-09-27 10:00:38][INFO] work on 2024/9/21
[2024-09-27 10:00:41][INFO] found 8 PDCs
[2024-09-27 10:00:48][INFO] writing [[Wikipedia:Pagine da cancellare/Conta/2024 settembre 21]]
[2024-09-27 10:00:49][INFO] writing [[Wikipedia:Pagine da cancellare/Log/2024 settembre 21]]

Note that the last log line in standard output is very probably this one:

https://gitlab.wikimedia.org/valeriobozzolan/itwiki-deletions-info-bot/-/blob/0d4bff70f40f85ee1f93c0d5a6af88ae4b9c1d89/includes/class-Page.php#L101

ls -la /data/project/itwiki/itwiki-deletionbot.out
-rw-r--r-- 1 tools.itwiki tools.itwiki 340677627 27 set 08.01 itwiki-deletionbot.out
$ toolforge-jobs list
+----------------------------+-------------------------+------------------------------------------+
|         Job name:          |        Job type:        |                 Status:                  |
+----------------------------+-------------------------+------------------------------------------+
|     itwiki-archivebot      | schedule: 0 22,23 * * * | Last schedule time: 2024-09-26T23:00:00Z |
|    itwiki-cleansandbox     | schedule: 23 */2 * * *  | Last schedule time: 2024-09-27T20:23:00Z |
|    itwiki-datiwikipedie    |  schedule: 21 2 * * *   | Last schedule time: 2024-09-27T02:21:00Z |
|     itwiki-deletionbot     |  schedule: */8 * * * *  |          Running for 12h50m46s           |
| itwiki-draftbot-logrotate  |    schedule: @daily     | Last schedule time: 2024-09-27T12:33:00Z |
| itwiki-draftbot-scheduled  | schedule: 0,30 * * * *  | Last schedule time: 2024-09-27T20:30:00Z |
|    itwiki-lavorosporco     | schedule: 0 3 28-31 * * |        Waiting for scheduled time        |
|     itwiki-listdrafts      | schedule: 15,45 * * * * | Last schedule time: 2024-09-27T20:45:00Z |
|     itwiki-markadmins      |   schedule: 5 * * * *   | Last schedule time: 2024-09-27T20:05:00Z |
|    itwiki-orphanizerbot    |  schedule: */3 * * * *  | Last schedule time: 2024-09-27T20:48:00Z |
|   itwiki-rimuoviprotetta   |   schedule: 0 4 * * *   | Last schedule time: 2024-09-27T04:00:00Z |
|   itwiki-wikiquotestats    |  schedule: 58 21 * * *  | Last schedule time: 2024-09-26T21:58:00Z |
| itwiki-draftbot-continuous |       continuous        |                 Running                  |
+----------------------------+-------------------------+------------------------------------------+
IMPORTANT: Note: Running for 12h50m46s

Assumptions

What's happening?It seems the software is "successfully" running, but for hours instead of minutes, probably while doing an HTTP POST request, probably during a readonly wiki switch.
Is it something related to cURL?✅ Yes. It seems related to cURL. Looking at the standard output logs, the last log line is just writing, so it's very probably just stuck on a cURL call. There are no relevant visible crashes so the program is probably very probably just running successfully but stuck on that cURL call.
Is CURLOPT_CONNECTTIMEOUT maybe needed?❌ No. Premising that PHP has not a default timeout https://www.php.net/manual/en/curl.constants.php#constant.curlopt-connecttimeout but the operating system default seems 300 seconds https://manpages.debian.org/testing/libcurl4-doc/CURLOPT_CONNECTTIMEOUT.3.en.html - and that would be OK. So this is very probably not the root cause, since the software was stuck for hours, not for minutes. So it seems the initial part of the connection is successfully established but then the transfer is idle for hours, or something like that. So this configuration (that only affects the first part of the connection) will be left untouched.
Is CURLOPT_CONNECTTIMEOUT_MS maybe needed?❌ No. See above.
Is CURLOPT_DNS_CACHE_TIMEOUT maybe needed?❌ No. See above.
Is CURLOPT_TIMEOUT maybe needed?🔶 This imposes an absolute time execution limit in seconds, whenever the thing is successfully downloading something or not. This is definitely a possible solution, however, I don't like to impose this kind of limits in my framework, that also affects "big transfers" that may take long time. Probably cURL should stop if the connection dies, not if the time gets longer.
Is CURLOPT_TIMEOUT_MS maybe needed?🔶 See above.
Are CURLOPT_LOW_SPEED_LIMIT and CURLOPT_LOW_SPEED_TIME maybe needed?✅ I like this. We can impose a minimum of 1 byte every 120 seconds as bare minimum. Semantically this probably means that if the server does not reply anything in that amount of seconds, then the connection is dropped. This should not impede any meaningful workflow and it would very probably allow cURL to abort the connection in 5 minutes instead of 10 hours.

Event Timeline

valerio.bozzolan created this task.
valerio.bozzolan renamed this task from Understand why BotCancellazioni does not start automagically (apparently after every forced wiki read-only mode) to Understand why BotCancellazioni is stuck (after every forced wiki read-only mode) apparently for cURL reasons.Sun, Oct 13, 5:57 PM
valerio.bozzolan set the point value for this task to 3.

Trying with

diff --git a/include/network/HTTPRequest.php b/include/network/HTTPRequest.php
index 5e93643..45f556f 100644
--- a/include/network/HTTPRequest.php
+++ b/include/network/HTTPRequest.php
@@ -1,6 +1,6 @@
 <?php
 # boz-mw - Another MediaWiki API handler in PHP
-# Copyright (C) 2017-2023 Valerio Bozzolan, contributors
+# Copyright (C) 2017-2024 Valerio Bozzolan, contributors
 #
 # This program is free software: you can redistribute it and/or modify
 # it under the terms of the GNU Affero General Public License as published by
@@ -317,6 +317,12 @@ class HTTPRequest {
                // URL
                curl_setopt( $curl, CURLOPT_URL, $url );
 
+               // Avoid to be stuck forever (!!!) if the connection is dropped for whatever reason.
+               // https://phabricator.wikimedia.org/T375937
+               // https://gitpull.it/T1274
+               curl_setopt( $curl, CURLOPT_LOW_SPEED_LIMIT, 1 );
+               curl_setopt( $curl, CURLOPT_LOW_SPEED_TIME, 120 );
+
                // cURL execution will return the result on success, false on failure
                curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
 
@@ -324,7 +330,9 @@ class HTTPRequest {
                curl_setopt( $curl, CURLOPT_HEADER, true );
 
                // this contains also the headers
+               var_dump("CURL EXEC RUNNING");
                $http_response_raw = curl_exec( $curl );
+               var_dump("CURL EXEC EXECUTED");
 
                // yeah! another one
                $this->queries++;

Aaand...

Reproducing again...

[2024-10-15 16:39:51][INFO] found 5 PDCs
string(17) "CURL EXEC RUNNING"
string(18) "CURL EXEC EXECUTED"
string(17) "CURL EXEC RUNNING"
string(18) "CURL EXEC EXECUTED"
string(17) "CURL EXEC RUNNING"
string(18) "CURL EXEC EXECUTED"
string(17) "CURL EXEC RUNNING"
string(18) "CURL EXEC EXECUTED"

 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ DETACHING MY LAPTOP NETWORK ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑

 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ NOTHING HAPPENS FOR 119 seconds ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
 ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓ then \o/ finally a lovely crash instead of being stuck forever ↓↓↓↓↓↓↓↓↓↓↓↓↓

[2024-10-15 16:41:54][ERROR] unexpected crash:
PHP Fatal error:  Uncaught network\CURLException: Error reported from cURL [error n. 28]: Operation too slow. Less than 1 bytes/sec transferred the last 120 seconds in /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/boz-mw/include/network/CURLException.php:95
Stack trace:
#0 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/boz-mw/include/network/CURLException.php(69): network\CURLException::createFromErrorMsgAndNum()
#1 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/boz-mw/include/network/HTTPRequest.php(344): network\CURLException::throwFromErrorMsgAndNum()
#2 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/boz-mw/include/mw/API.php(147): network\HTTPRequest->fetch()
#3 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/boz-mw/include/mw/StaticSite.php(125): mw\API->fetch()
#4 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-Page.php(156): mw\StaticSite->fetch()
#5 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-Page.php(201): itwikidelbot\Page->fetchFirstRevisionDateByDirection()
#6 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PDC.php(435): itwikidelbot\Page->fetchLasteditDate()
#7 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PDC.php(640): itwikidelbot\PDC->getLasteditDate()
#8 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PDC.php(670): itwikidelbot\PDC->getDurationDays()
#9 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageYearMonthDayPDCsCount.php(153): itwikidelbot\PDC->getHumanDuration()
#10 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageYearMonthDayPDCsCount.php(57): itwikidelbot\PageYearMonthDayPDCsCount->createPDCEntryContent()
#11 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageTemplated.php(116): itwikidelbot\PageYearMonthDayPDCsCount->getTemplateArguments()
#12 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageTemplated.php(37): itwikidelbot\PageTemplated->getTemplatedTitle()
#13 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageYear.php(39): itwikidelbot\PageTemplated->__construct()
#14 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageYearMonth.php(40): itwikidelbot\PageYear->__construct()
#15 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageYearMonthDay.php(43): itwikidelbot\PageYearMonth->__construct()
#16 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageYearMonthDayPDCs.php(42): itwikidelbot\PageYearMonthDay->__construct()
#17 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-PageYearMonthDayPDCs.php(53): itwikidelbot\PageYearMonthDayPDCs->__construct()
#18 /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/class-Bot.php(284): itwikidelbot\PageYearMonthDayPDCs::createFromDateTimePDCs()
#19 /home/boz/repos/itwiki/itwiki-deletions-info-bot/bot.php(90): itwikidelbot\Bot->run()
#20 {main}
  thrown in /home/boz/repos/itwiki/itwiki-deletions-info-bot/includes/boz-mw/include/network/CURLException.php on line 95

I absolutely love this new exception message:

Operation too slow. Less than 1 bytes/sec transferred the last 120 seconds

Committing and landing in production right now.

I sincerely believe that the root problem is now resolved. Let's re-open if I'm wrong. Thanks everyone. Special thanks to Parma1983.

Ah, to apply the update, I've just followed the documentation here:

https://wikitech.wikimedia.org/wiki/Tool:Itwiki

So:

ssh login.toolforge.org
become itwiki

Then:

cd deletionbot
git submodule update --init --recursive

And nothing else

Mentioned in SAL (#wikimedia-cloud) [2024-10-15T16:36:25Z] <wmbot~valeriobozzolan@tools-bastion-13> Update deletionbot to finally fix T375937 since 2019 (just adopting CURLOPT_LOW_SPEED_LIMIT)