Page MenuHomePhabricator

Bot (written in Mono, running on Labs) has lags during stats reading through API
Closed, ResolvedPublic

Description

My bot reads stats through API from all wikis. Fragment of code (C#):

foreach (var c in codes)
{
        apiout = WebClient.DownloadString("https://" + c + ".wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics");
}

When I run it from my PC (Win 8.1, Visual Studio), it works fast, with an approximate rate 1-2 requests per second. But when I run it through mono from Labs, it takes 5-6 seconds for every request. If I add Console.WriteLine(DateTime.Now); to the loop, it displays

10/1/2016 2:27:00 AM
10/1/2016 2:27:05 AM
10/1/2016 2:27:11 AM
10/1/2016 2:27:17 AM
10/1/2016 2:27:22 AM
10/1/2016 2:27:28 AM
10/1/2016 2:27:34 AM
etc.

I have many other C# bots that run with mono from Labs, and none of them have this lag. When I run wget with this urls from Labs, it works fast. When my acquaintance run this program with the same version of mono on linux, it runs without this lags. When I use asynchronous http client, lags remains.

Event Timeline

MBH created this task.Oct 1 2016, 3:02 AM
Restricted Application added subscribers: Base, Aklapper. · View Herald TranscriptOct 1 2016, 3:02 AM

I have many other C# bots that run with mono from Labs, and none of them have this lag. When I run wget with this urls from Labs, it works fast.

What exactly was the intention behind creating this task?

MBH added a comment.Oct 1 2016, 8:20 AM

I want to make this problem, relates with Labs mono or Wikimedia servers, has been fixed, that my bot updates this page https://ru.wikipedia.org/w/index.php?title=Module:NumberOf/data&action=history correcty. Now it works about 40 minutes, i.e. takes data about different wikis with 30 minute intervals, but this data uses in "Pages, created this day" counter (see https://ru.wikipedia.org/wiki/Википедия:Статистика), so counter's data are distorted, because of data about different wikis are collected in significantly different time.

Where is this bot hosted on Labs? Any public URL?

MBH added a comment.Oct 1 2016, 3:24 PM

/data/project/mbh/bots/numof_updater.exe
What "public URL"?

Reedy updated the task description. (Show Details)Oct 1 2016, 8:28 PM
krenair@tools-bastion-03:~$ time curl "https://{en,fr,es,de}.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics"

[1/4]: https://en.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics --> <stdout>
--_curl_--https://en.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics
<?xml version="1.0"?><api batchcomplete=""><query><statistics pages="40375830" articles="5253048" edits="851254321" images="853479" users="29186468" activeusers="120758" admins="1282" jobs="5107" queued-massmessages="0" /></query></api>
[2/4]: https://fr.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics --> <stdout>
--_curl_--https://fr.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics
<?xml version="1.0"?><api batchcomplete=""><query><statistics pages="8332351" articles="1798309" edits="132635321" images="49509" users="2611411" activeusers="14743" admins="161" jobs="4" queued-massmessages="0" /></query></api>
[3/4]: https://es.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics --> <stdout>
--_curl_--https://es.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics
<?xml version="1.0"?><api batchcomplete=""><query><statistics pages="5618274" articles="1287304" edits="93694529" images="0" users="4362173" activeusers="15787" admins="71" jobs="18" queued-massmessages="0" /></query></api>
[4/4]: https://de.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics --> <stdout>
--_curl_--https://de.wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics
<?xml version="1.0"?><api batchcomplete=""><query><statistics pages="5715173" articles="1982931" edits="163307038" images="131278" users="2491829" activeusers="18175" admins="204" jobs="11" queued-massmessages="0" /></query></api>
real	0m0.406s
user	0m0.023s
sys	0m0.009s

I made a file called T147109.cs with this as the contents:

using System;
using System.Net;
namespace HelloWorld
{
    class Hello 
    {
        static void Main() 
        {
		WebClient client = new WebClient();
		foreach (var c in new string[]{"en", "fr", "es", "de"})
		{
		        client.DownloadString("https://" + c + ".wikipedia.org/w/api.php?action=query&format=xml&meta=siteinfo&siprop=statistics");
			Console.WriteLine(DateTime.Now);
		}

        }
    }
}

(Yes, I based it off a hello world program - I do not really do C#/.NET)
Ran openssl x509 -in /etc/ssl/certs/GlobalSign_Root_CA.pem -out GlobalSign_Root_CA.crt -outform der and certmgr -add -c Trust GlobalSign_Root_CA.crt because Mono doesn't trust anything by default.
Then:

krenair@tools-bastion-03:~$ mcs T147109.cs; mono T147109.exe
10/1/2016 10:33:37 PM
10/1/2016 10:33:41 PM
10/1/2016 10:33:45 PM
10/1/2016 10:33:48 PM
Reedy added a subscriber: Reedy.Oct 1 2016, 10:37 PM

Have you tested it with mono locally?

There's too many differences between .NET and Mono... Though, the fact you say it works fine for someone else is more weird

I tried it locally (Mono 4.2.1.102+dfsg2-7ubuntu4 instead of the 3.2.8+dfsg-4ubuntu1.1 package we have on tools-bastion-03, as I run a newer version of Ubuntu) and it was much faster:

01/10/2016 23:36:16
01/10/2016 23:36:17
01/10/2016 23:36:18
01/10/2016 23:36:18

I suspect the problem is either going to be in tools or upstream in mono. I don't know what headers etc. it's sending. strace -cf mono T147109.exe starts with:

tools-bastion-03
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.68    9.326124       13361       698        74 futex
 13.09    1.858183       45322        41         1 epoll_wait
 13.06    1.854909       48813        38         9 nanosleep
  3.33    0.472220          32     14873       370 stat
  3.17    0.449590         220      2040       349 lstat
alex-laptop
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 73.93    1.816126        1381      1315       171 futex
 11.56    0.284000        8353        34           clock_nanosleep
 11.40    0.280000        3333        84           poll
  3.09    0.076000        1652        46        46 rt_sigsuspend
Anomie added a subscriber: Anomie.

This seems very unlikely to have anything at all to do with the API beyond that the program is trying to access it. Removing MediaWiki-API.

Anomie removed a subscriber: Anomie.Oct 3 2016, 2:58 PM
Aklapper renamed this task from I get lags during stats reading through API to Bot (written in Mono, running on Labs) has lags during stats reading through API.Nov 14 2016, 8:37 PM
scfc added a subscriber: scfc.Jan 30 2017, 6:18 AM

My (default) assumption is that this is due to Mono at the start of each web request reading some files from the tool's home directory (certificates?) that are on NFS and thus slow(ish).

I wanted to confirm that by taking @Krenair's example above and straceing it, but these fail now with:

Unhandled Exception:
System.Net.WebException: Error getting response stream (Write: The authentication or decryption has failed.): SendFailure ---> System.IO.IOException: The authentication or decryption has failed. ---> Mono.Security.Protocol.Tls.TlsException: Invalid certificate received from server. Error code: 0xffffffff800b010a
  at Mono.Security.Protocol.Tls.Handshake.Client.TlsServerCertificate.RemoteValidation (Mono.Security.Protocol.Tls.ClientContext context, AlertDescription description) [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.Handshake.Client.TlsServerCertificate.validateCertificates (Mono.Security.X509.X509CertificateCollection certificates) [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.Handshake.Client.TlsServerCertificate.ProcessAsTls1 () [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.Handshake.HandshakeMessage.Process () [0x00000] in <filename unknown>:0 
  at (wrapper remoting-invoke-with-check) Mono.Security.Protocol.Tls.Handshake.HandshakeMessage:Process ()
  at Mono.Security.Protocol.Tls.ClientRecordProtocol.ProcessHandshakeMessage (Mono.Security.Protocol.Tls.TlsStream handMsg) [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.RecordProtocol.InternalReceiveRecordCallback (IAsyncResult asyncResult) [0x00000] in <filename unknown>:0 
  --- End of inner exception stack trace ---
  at Mono.Security.Protocol.Tls.SslStreamBase.AsyncHandshakeCallback (IAsyncResult asyncResult) [0x00000] in <filename unknown>:0 
  --- End of inner exception stack trace ---
  at System.Net.HttpWebRequest.EndGetResponse (IAsyncResult asyncResult) [0x00000] in <filename unknown>:0 
  at System.Net.HttpWebRequest.GetResponse () [0x00000] in <filename unknown>:0 
  at System.Net.WebClient.GetWebResponse (System.Net.WebRequest request) [0x00000] in <filename unknown>:0 
  at System.Net.WebClient.ReadAll (System.Net.WebRequest request, System.Object userToken) [0x00000] in <filename unknown>:0 
  at System.Net.WebClient.DownloadDataCore (System.Uri address, System.Object userToken) [0x00000] in <filename unknown>:0 
[ERROR] FATAL UNHANDLED EXCEPTION: System.Net.WebException: Error getting response stream (Write: The authentication or decryption has failed.): SendFailure ---> System.IO.IOException: The authentication or decryption has failed. ---> Mono.Security.Protocol.Tls.TlsException: Invalid certificate received from server. Error code: 0xffffffff800b010a
  at Mono.Security.Protocol.Tls.Handshake.Client.TlsServerCertificate.RemoteValidation (Mono.Security.Protocol.Tls.ClientContext context, AlertDescription description) [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.Handshake.Client.TlsServerCertificate.validateCertificates (Mono.Security.X509.X509CertificateCollection certificates) [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.Handshake.Client.TlsServerCertificate.ProcessAsTls1 () [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.Handshake.HandshakeMessage.Process () [0x00000] in <filename unknown>:0 
  at (wrapper remoting-invoke-with-check) Mono.Security.Protocol.Tls.Handshake.HandshakeMessage:Process ()
  at Mono.Security.Protocol.Tls.ClientRecordProtocol.ProcessHandshakeMessage (Mono.Security.Protocol.Tls.TlsStream handMsg) [0x00000] in <filename unknown>:0 
  at Mono.Security.Protocol.Tls.RecordProtocol.InternalReceiveRecordCallback (IAsyncResult asyncResult) [0x00000] in <filename unknown>:0 
  --- End of inner exception stack trace ---
  at Mono.Security.Protocol.Tls.SslStreamBase.AsyncHandshakeCallback (IAsyncResult asyncResult) [0x00000] in <filename unknown>:0 
  --- End of inner exception stack trace ---
  at System.Net.HttpWebRequest.EndGetResponse (IAsyncResult asyncResult) [0x00000] in <filename unknown>:0 
  at System.Net.HttpWebRequest.GetResponse () [0x00000] in <filename unknown>:0 
  at System.Net.WebClient.GetWebResponse (System.Net.WebRequest request) [0x00000] in <filename unknown>:0 
  at System.Net.WebClient.ReadAll (System.Net.WebRequest request, System.Object userToken) [0x00000] in <filename unknown>:0 
  at System.Net.WebClient.DownloadDataCore (System.Uri address, System.Object userToken) [0x00000] in <filename unknown>:0

I tried to remedy that by adding the certificate /etc/ssl/certs/GlobalSign_Organization_Validation_CA_-_SHA256_-_G2.pem with openssl x509 -in /etc/ssl/certs/GlobalSign_Organization_Validation_CA_-_SHA256_-_G2.pem -out a.crt -outform der and certmgr -add -c Trust a.crt (I believe that is the certificate missing?), but that didn't change anything.

@MaxBioHazard, you wrote: "I have many other C# bots that run with mono from Labs, and none of them have this lag." Do these other bots run as part of the same tool (mbh) or as separate ones? If the latter, what are the names of tools where this does not happen?

scfc triaged this task as Low priority.Jan 30 2017, 6:25 AM
scfc added a project: Toolforge.
scfc moved this task from Triage to Backlog on the Toolforge board.
MBH raised the priority of this task from Low to Normal.Jan 30 2017, 1:49 PM

All these bots runs under mbh account.

In last 3 months this bot works unstable, occasionally don't save pages (see https://ru.wikipedia.org/wiki/module:NumberOf/today?action=history and https://ru.wikipedia.org/wiki/module:NumberOf/data?action=history , first page should be updated every day, second one - every 2 hours, crontab "0 */2 * * * jsub -N numof_upd -mem 1G mono /data/project/mbh/bots/numof_updater.exe > /dev/null"). numof_upd.err file contains a lot of error types: "Invalid CSRF token", "Login failed. Check your username and password", "The remote server returned an error: (503) Backend fetch failed", "Error getting response stream (Write: BeginWrite failure): SendFailure ---> System.IO.IOException: BeginWrite failure", "The request timed out" etc.

There is important factor, that I don't mention earlier: this problem occurs when process runs from cron, but don't occurs when I run process directly. Last 2 edits on /data page ( :42 and :46 timestamps) done by bot, runned directly.

In T147109#2981777, @MaxBioHazard wrote:

There is important factor, that I don't mention earlier: this problem occurs when process runs from cron, but don't occurs when I run process directly. Last 2 edits on /data page ( :42 and :46 timestamps) done by bot, runned directly.

Where do you mean by run directly? More specifically: do they run on grid with jsub (and if so, is there release=precise?), or do they run on bastion without jsub?

MBH added a comment.Jan 30 2017, 2:00 PM

Without jsub. I never specify release. I run bot directly with jsub now to test.

MBH added a comment.Jan 30 2017, 2:21 PM

Problem occurs, when bot runned directly (not from cron) with jsub.

scfc added a comment.Jan 31 2017, 3:24 AM

If an error occurs when a task is run on the grid (with jsub/cron), but not interactively on the bastion hosts, this often means that the job requires more memory than specified via the -mem option. You could try increasing that request by 1 GByte or so, and look if the error still occurs.

Side note: In general, Mono/C# is rarely used in Toolforge. In the recent survey less than 2 % of the 175 participants stated that they prefer to program in that language, i. e. three people in total (and I'm not one of them :-)). That means almost nobody can help with problems because the probability that they have seen that problem before (and found a solution for it) is very small. If the source code of your bot(s) is not very large (yet), I would consider rewriting the logic for example in Python with Pywikibot. For simple scripts this should be easy, and there are a lot more people who can help if you have questions.

MBH closed this task as Resolved.Aug 27 2019, 1:10 PM
MBH claimed this task.

Looks like it is fixed now (maybe because of updating mono version on Toolforge). Now this bot works very fast.