Page MenuHomePhabricator

Petscan is being used with excesive parallelism by a user on Wikidata
Closed, ResolvedPublic

Description

See the background: https://www.wikidata.org/wiki/User_talk:GZWDer#GZWDer_.28flood.29

The tool was creating 300-400 new pages per minute, so many that it was creating too many database locks with itself.

According to the user, it run with 5 parallel threads- that is not ok for our API- whose etiquette suggest to do only serial requests.

While it was not causing major infrastructure issues, it creates problems for other users. The users, however, cannot change the source code as this is a hosted tool. Can this be limited in any way? Globally? Per user? Per wiki? Per shard?

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

The five threads are for bot accounts only. Normal user accounts get single thread with delay.
I have used my own bot account a lot over the years, with previous tools (also 5 threads) and PetScan, without issue.
As far as I understand, GZWDer used (at least) two tabs running in parallel. Even if I reduce that to single thread, he could just open 10 tabs. This is not a code issue, but a user issue.

@Magnus, as you can see on the discussion I agreed with you initially, and in no way I am giving you any responsibility for this particular incident. However, the user says he doesn't have the *option* of running it slower, in a single thread, if I understood correctly.

So consider this a *feature request* from that user, which I happen to be indirectly interested in.

Just run it as a normal user and not a bot user!

I will see what the user responds, and act depending on it.

Running it as a normal user will flood recent change, See https://www.wikidata.org/wiki/Wikidata:Administrators%27_noticeboard/Archive/2014/05#Flooding_of_Special:RecentChanges

However the current Recent Changes is worse than in 2014

I try to limit the negative effect of running the tools.

At the beginning at most 6-7 tabs are running. Then I keep only one tab after warning. Now I use (at most) two tabs and I want jcrespo report any issue to my talk page. Currently nothing occurs.

oops this user is not active at phabricator.

" 6-7 tabs are run"

I believe we found the root problem :-)

This is previously, as I didn't know how many tabs should be run at most, and what problem would occur when too many tabs are running. After warning at most two tabs are running.

I think we agreed to use only one "tab" at a time to follow API:Etiquette. I will block all your queries if they continue producing errors in the next 10 minutes, as I have warned you 3 or 4 times.

Currently (and since 20+ minutes ago) only one is running.

jcrespo claimed this task.

Thank you. I see lower amount of errors in the last 20 minutes. I will be monitoring the logs in case the errors return.

jcrespo renamed this task from Petscan is running too fast for Wikidata to Petscan is being used with excesive parallelism by a user on Wikidata.Jul 7 2016, 6:21 PM

This is still ongoing.

As there're issue even if there're only one tab, code must be modified.

Bugreporter changed the task status from Open to Stalled.Jul 11 2016, 9:34 AM

There're nothing I can do other than stop creating items (and probably other semi-automatic work) until https://bitbucket.org/magnusmanske/petscan/issues/48/make-number-of-threads-an-option-for-bot is fixed.

Added option for bot-account users to set concurrent threads (1-5).
https://bitbucket.org/magnusmanske/petscan/commits/dbefbbd3dab0