Page MenuHomePhabricator

IABot is over-ingesting template data
Closed, ResolvedPublicBUG REPORT

Description

IABot is ingesting too much data for its CS1 configuration data. This is causing performance issues and needless bloat.

Event Timeline

From the other ticket, now closed:

Thanks for the work in getting the IABot working again. Several articles I tried this morning worked fine, but for one article: Tellico Dam, it didn't work. The IABot screen went blank, and when I refreshed there was a red banner error message: "Missing token: The required token was missing for this request. Please try your request again.". I have no idea what this means, or if there is anything I can do. Any help would be greatly appreciated.

I am also occasionally (but not always) getting this error. I assume it is related to the same underlying issue.

If the continuing problem relates to running the IABot on larger articles, this is an irony and a frustration, because it is with larger pages that the Bot can deliver the most benefit. These larger articles represent a huge investment of time by the volunteer editor community. Sources have been added to these articles over time, and may not have been protected by archiving. These larger articles often have relatively high page-views, and some may be candidates for GA, making it all the more important we protect them against link rot. Yet, the tedious and repetitive work involved in manually adding archives to a large number of references is a real disincentive.
I recognise that the underlying issue with this Bot is no doubt difficult, and that work is taking place in good faith to address the issues. However, is there a case for bringing in any new expertise to share the load and perhaps provide new insights ? Does WMF officially know about the continuing difficulties with this valuable tool ?

What's happening here is because the bot is taking longer to work on the article, it's hitting the server-side execution time limit and timing out. Ordinarily large articles are blocked from even running on the tool, where the user is instead directed to running a bot job instead with the desired article(s). I can try to push the time limit a little higher to get around this, but this will only be temporary. The limit exists to prevent rogue processes from indefinitely hanging up a connection resource, that would end up blocking someone else from using the tool.

Thanks, I just tried to analyse the article Tellico Dam again, but no good - same message about missing token. If I understand your comment about a Bot job correctly, it refers to scheduling in a queue ? I have set up a queued job for this article to see if that does the trick.

Thanks, I just tried to analyse the article Tellico Dam again, but no good - same message about missing token. If I understand your comment about a Bot job correctly, it refers to scheduling in a queue ? I have set up a queued job for this article to see if that does the trick.

Try one more time. There was a small bug causing a fatal error in the workaround I deployed.

It worked ! It has successfully analysed the article: Tellico Dam where it was previously failing. It analysed 44 links, rescued 40, and added 12 archives. It took 703 minutes to run (longer than I would have expected prior to the recent problems), but it worked :). I have yet to check through the archives, but they look good in a quick inspection. Many thanks, M

It worked ! It has successfully analysed the article: Tellico Dam where it was previously failing. It analysed 44 links, rescued 40, and added 12 archives. It took 703 minutes to run (longer than I would have expected prior to the recent problems), but it worked :). I have yet to check through the archives, but they look good in a quick inspection. Many thanks, M

703 MINUTES???!?!?!? Holy bananas, that is way too long. That's 11.5 hours.

It worked ! It has successfully analysed the article: Tellico Dam where it was previously failing. It analysed 44 links, rescued 40, and added 12 archives. It took 703 minutes to run (longer than I would have expected prior to the recent problems), but it worked :). I have yet to check through the archives, but they look good in a quick inspection. Many thanks, M

703 MINUTES???!?!?!? Holy bananas, that is way too long. That's 11.5 hours.

Oops, it was 703 seconds, sorry. I was thinking about the number of minutes that was when I wrote the comment, and ended up typing minutes instead of seconds. The article was "raw" - none of the references had citations, so that seems like a lot of work. Even if it was slower than previously, I was still really happy to have it work. Many thanks. M

Bot seems to be working pretty quickly on large numbers of links now. It acted oddly with 'Russian occupation of Kharkiv oblast' for me (300+ seconds for 1 link?) but did 52 links on 'Climate change and cities' in only about 120 seconds and also did well on a number of other articles. Thanks for the improvements!

I don't know if this is the right place to discuss this, but now it says 503 error every time I try to open IABot. Is it connected to this over-ingesting template data?

It's now coming up with "DB ERROR : QUERY: CREATE DATABASE IF NOT EXISTS s51059__cyberbot; ERROR - : Error encountered while creating the database. Exiting...".

I have found the same error. Let's hope this valuable tool can be brought back online again soon. I won't notify my local editor community as yet, and will wait to see what happens.

Update: I was able to start the IABot, but when I ran it on a page that needed archives added, it made no changes.

I just ran the bot as well, but (somewhat surprisingly) didn't run into this problem; worked fine for me. Hopefully we can improve it from hit-and-miss to working normally again soon.

I'm not aware of any issues that should prevent the bot from doing what it's supposed to do. At this point, this ticket is merely addressing a major performance penalty the bot is currently incurring as a result of the bug regarding data ingesting.

The 503 errors and DB ERRORs you've been getting were unrelated issues.

I don't know if this is the right place to discuss this, but today I ran the bot twice and it claimed success both times, but no links were analysed. Is this because of over-ingesting template data?

Seems to be working again as I have no problem.

Cyberpower678 claimed this task.

It's recently stopped working for me again, and it's happening consistently; anyone else having the same issue?