Upload problems : Slow / timeouts
Closed, ResolvedPublic

Description

I've been having some upload problems recently. I've tested this in different situations

  • Upload Wizard upload is very slow (see also #30027)
  • Uploads using a bot from my home connection using the api will time out with larger files
  • Uploads using a bot from the Toolserver using the api will time out with larger files
  • Uploads from my (fast) work connection using classic special:upload and using the api will time out with larger files

Version: unspecified
Severity: critical

bzimport set Reference to bz30086.
Multichill created this task.Via LegacyJul 27 2011, 6:47 PM
Reedy added a comment.Via ConduitJul 27 2011, 6:53 PM
  • Bug 30027 has been marked as a duplicate of this bug. ***
brion added a comment.Via ConduitJul 28 2011, 11:33 PM

Do you have some sample files and sample code to upload them that regularly reproduces the timeouts?

bzimport added a comment.Via ConduitJul 31 2011, 9:21 PM

mattwj2002 wrote:

It has been very slow uploading to the Wikimedia Commons on both the basic upload and the regular upload.

In addition, I have been having problems with uploading using upload.py I have the newest version of the subversion.

It eventually uploads, but an upload that should take minutes is taking hours.

I have a 22 Mbps / 7 Mbps connection.

Here is a log of an example upload:

http://pastebin.com/A5Upvr31

Please fix this as soon as possible. Some people probably are not uploading because it is so slow. I think this issue is very important.

bzimport added a comment.Via ConduitJul 31 2011, 9:37 PM

mattwj2002 wrote:

Correction, the pastebin should be the following:

http://pastebin.com/A5Upvr31

Sorry!

bzimport added a comment.Via ConduitJul 31 2011, 9:40 PM

mattwj2002 wrote:

Bugzilla is having issues. When I post the link it is changing it. I am trying it with a space.

http://pastebin.com/ A5Upvr31

MarkAHershberger added a comment.Via ConduitAug 1 2011, 1:31 PM

link problem at Bug 30161

brion added a comment.Via ConduitAug 3 2011, 2:20 PM

I'm definitely seeing a verrrry slow upload of a 78mb Ogg file to Commons, though I can't be sure whether it's the server end or the Wikimania network.

It seems to be spiking up briefly, then halting for a while, which could be an indication of lost packets delaying the upload stream as it waits to time out.

Peaks are 60-130 KB/sec, but ongoing rates are often ..... 6, 12, 25.

Prolineserver added a comment.Via ConduitAug 3 2011, 8:55 PM

I cannot upload files ~2-3MB from the toolserver eighter:
Uploading file to commons:commons via API....
<urlopen error timed out>

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 2 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 4 minutes...
bzimport added a comment.Via ConduitAug 5 2011, 7:07 AM

ralf wrote:

in the Commonist 0.4.17 "unexpected response: HTTP/1.0 502 Bad Bateway)

bzimport added a comment.Via ConduitAug 5 2011, 10:51 AM

sumanah wrote:

Just heard another report of this from Martina Nolte, who ten minutes ago tried again to upload via Commonist: "Commonist now starts to upload a second image and then fails with "HTTP/1.0 502 Bad Gateway"."

bzimport added a comment.Via ConduitAug 5 2011, 12:28 PM

kontakt wrote:

(In reply to comment #7)

I can't be sure whether it's the server end or the Wikimania network.

It's not a Wikimania problem. "Homies" have the same bug since 3. Aug.:
http://commons.wikimedia.org/wiki/Commons_talk:Tools/Commonist#Upload_problem
http://commons.wikimedia.org/wiki/Commons:Forum#unexpected_response:_HTTP.2F1.0_502_Bad_Gateway

bzimport added a comment.Via ConduitAug 5 2011, 1:34 PM

sumanah wrote:

Ryan Lane just mentioned to me that this seems like a problem with the Java app (Commonist?); any issues on the Wikimedia side seem fixed.

Reedy added a comment.Via ConduitAug 5 2011, 1:36 PM

(In reply to comment #12)

Ryan Lane just mentioned to me that this seems like a problem with the Java app
(Commonist?); any issues on the Wikimedia side seem fixed.

Mark fixed an issue with one of the API apaches being incorrectly configured earlier today. Waiting to see if that fixes the 502 issues we've been seeing

Prolineserver added a comment.Via ConduitAug 5 2011, 1:41 PM

Still not working with pywikipedia from the toolserver:

Uploading file to commons:commons via API....
HTTPError: 502 Bad Gateway

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server is down. Retrying in 1 minutes...
Reedy added a comment.Via ConduitAug 5 2011, 1:44 PM

HTTPError: 502 Bad Gateway

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'.
bzimport added a comment.Via ConduitAug 5 2011, 4:16 PM

neilk wrote:

For everyone following this bug -- the 502/504 error issue is being separately tracked in #30201.

Reedy added a comment.Via ConduitAug 5 2011, 9:12 PM

The API specific errors have been fixed, I wonder if this has any benefit on the upload issues..

bzimport added a comment.Via ConduitAug 6 2011, 10:46 AM

kontakt wrote:

Commonist upload runs perfectly now. Thanks to all who helped!

bzimport added a comment.Via ConduitAug 7 2011, 1:48 AM

inductiveload wrote:

Upload though the upload form is still pretty slow: 16 minutes for a 40MB file (i.e. average speed of 42kBps). My connection is 7+ Mbps upload, (tested just after uploading), and the Internet Archive uploads are as fast as I expected (about a minute), so it must be a Commons-related problem.

I have heard, but not yet checked myself, that pywikipedia has upload token issues too.

Multichill added a comment.Via ConduitAug 8 2011, 9:29 AM

No guys. This is not resolved at all. I guess we have two problems giving the same result: Upload problems

  1. Bugged proxy giving 502's (solved in #30201)
  2. General slowness of upload

Just take a look at the timeline at https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=BotMultichillT to see how slow it is.
These are pictures uploaded from the toolserver.

MarkAHershberger added a comment.Via ConduitAug 8 2011, 7:21 PM

Just take a look at the timeline at
https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=BotMultichillT
to see how slow it is.
These are pictures uploaded from the toolserver.

I don't know anything about that bot, but, using the API, I charted
the time between uploads against the size of the uploads (the closest
approximation I could think of for speed). I did notice a little
slowdown yesterday but it seems to be back now.

The timeline, AFAICT, does not support your assertion that something
is still unresolved.

Feel free to repen and let me know what I should look for in the timeline of
that bot if you feel like there is still a problem.

bzimport added a comment.Via ConduitAug 8 2011, 7:36 PM

neilk wrote:

Mark: how far back does your chart go? Reedy believes this started to become an issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this from lots of people. It might be localized to Europe, like the last problems were.

MarkAHershberger added a comment.Via ConduitAug 8 2011, 7:38 PM

I'll post a chart for as far back as I can once I've generated it.

Multichill added a comment.Via ConduitAug 9 2011, 12:59 AM

I'm a user. I have a problem. I open an incident. If the user confirms it, you'll close the incident, don't just close it *twice* because you think it's solved.

Commons upload is slow as hell, so yes, this is still an issue. So please, before you close this incident again: Verify with the user who reported this if it's really solved.

MarkAHershberger added a comment.Via ConduitAug 9 2011, 2:12 AM

In reply to comment #22)

Mark: how far back does your chart go? Reedy believes this started to become an
issue around July 23rd.

I have data going back to March, now.

(In reply to comment #24)

Commons upload is slow as hell, so yes, this is still an issue.

I'm just trying to get some numbers to back up these data-less assertions. I know people don't usually keep numbers like this handy, so I'm sympathetic to what you're saying. However, objective numbers are more reliable than user reports of "slowness". I'll work with NeilK to get some.

Prolineserver added a comment.Via ConduitAug 9 2011, 11:55 AM

I still get the following error:
Uploading file to commons:commons via API....
<urlopen error timed out>

WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 1 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 2 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 4 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 8 minutes... <urlopen error timed out> WARNING: Could not open 'http://commons.wikimedia.org/w/api.php'. Maybe the server or your connection is down. Retrying in 16 minutes...
MarkAHershberger added a comment.Via ConduitAug 9 2011, 4:47 PM

(In reply to comment #25)

I have data going back to March, now.

And now, back to 2009 for BotMultichillT. I've posted the raw data at http://mah.everybody.org/chart.zip (8mb). I have also asked a researcher if she could help with visualizing the data.

There are problems with it, so I'm going to see if clean it up some. I also saw problems with the API while generating the report.

Multichill added a comment.Via ConduitAug 9 2011, 6:40 PM

I did some test using my office pc (very fast uplink). I downloaded two +/- 10MB files from http://www.openbeelden.nl/ . That took about 1 second.

Uploading a file through the upload wizard took about 6 minutes = 30 KB/sec
Uploading a file through the old upload seems to take about the same amount of time.

I'm coming from Europe (AS1103 to be exact). I wonder if someone from the USA could do the same test (download a file from http://www.openbeelden.nl/ and upload it to Commons and time it) to see if the problem might be location related.

Reedy added a comment.Via ConduitAug 9 2011, 6:49 PM

(In reply to comment #28)

I'm coming from Europe (AS1103 to be exact). I wonder if someone from the USA
could do the same test (download a file from http://www.openbeelden.nl/ and
upload it to Commons and time it) to see if the problem might be location
related.

I did note that before somewhere, possibly another bug, all the reporters seemed EU based, but it wasn't necessarily a complete survey

bzimport added a comment.Via ConduitAug 9 2011, 7:33 PM

inductiveload wrote:

(In reply to comment #27)

And now, back to 2009 for BotMultichillT.

A simple graph of the data from 2008-2011 with a 1000-element moving average can be see at http://commons.wikimedia.org/wiki/File:Commons_upload_speeds_2008-2011.png

A graph of the data from 2011 with a 100-element moving average can be seen at http://commons.wikimedia.org/wiki/File:Commons_upload_speeds_2011.png

The moving average is very buggy and a couple of very fast outliers distort it badly, but a dramatic reduction can be seen firstly in January this year and again in July.

If it helps, I am based in the UK, but I have heard about this problem from American editors too.

bzimport added a comment.Via ConduitAug 9 2011, 8:40 PM

neilk wrote:

Just tried it from the USA. Uploading an 8.4MB file to Commons took about 5 minutes 20 seconds (320 seconds). So that should be about 26Kb/sec upload.

Download (from http://www.openbeelden.nl/) was much faster, and took about 30 seconds.

Re: the graph -- the reduction in upload speed might coincide with how we introduced UploadWizard. It may be that the API method has always been slower. That would not explain why the upload speed seems to be dramatically slowing down in recent months, since we haven't altered anything about the upload protocol recently.

Multichill -- I would like to see the same graph with outliers removed, if you please?

MarkAHershberger added a comment.Via ConduitAug 10 2011, 2:38 AM

(In reply to comment #31)

Just tried it from the USA. Uploading an 8.4MB file to Commons took about 5
minutes 20 seconds (320 seconds). So that should be about 26Kb/sec upload.

Download (from http://www.openbeelden.nl/) was much faster, and took about 30
seconds.

Upload vs download is, of course, not the same and depends on you provider.

Multichill added a comment.Via ConduitAug 10 2011, 8:54 PM

We did some debugging last night. The chain when uploading is:

Me -> Europe squid -> US squid -> application server (apache) -> NFS -> ms7

Multiple people on different continents have this problem so it's probably not the Europe squids.
NFS copy from the apache to the nfs share on ms7 is fast so that doesn't seem to be the bottleneck either.
Upload to http://test.wikipedia.org is fast, but upload to secure test is very slow (even slower than Commons).
Unsecure test uses different apaches than secure test or secure/unsecure Commons.

Could an operations person please look into this? Bumping this do highest because Commons is becoming unusable. Lot's of reports are coming in

Catrope added a comment.Via ConduitAug 10 2011, 8:56 PM

(In reply to comment #33)

Upload to http://test.wikipedia.org is fast, but upload to secure test is very
slow (even slower than Commons).

Slower than Commons, really? How does it compare to Commons via secure? I just wanna know whether we really are on to something here or whether we're just noticing a 'tax' being added by the secure gateway.

kaldari added a comment.Via ConduitAug 10 2011, 11:02 PM

I just tried uploading a 3.06MB file from the Toolserver to Commons via the API. It took a little over 2 minutes, so roughly equivalent to the speed Neil was reporting.

bzimport added a comment.Via ConduitAug 10 2011, 11:23 PM

juancho2291 wrote:

I'm from colombia and I've the same problem.

Chad added a comment.Via ConduitAug 11 2011, 1:21 AM

We realize this issue is affecting many users and we're looking into various causes of the problem. If people could avoid "me too" style comments that would help keep the signal:noise ratio down.

Multichill added a comment.Via ConduitAug 11 2011, 5:46 AM

Chad: That's what you get if an problem like this stays open for more than two weeks.

Who from the operations team is actually working on getting this fixed right now? The bug is assigned to Sam and AFAIK he's not on it.

Chad added a comment.Via ConduitAug 11 2011, 12:16 PM

(In reply to comment #38)

Chad: That's what you get if an problem like this stays open for more than two
weeks.

I understand the problem is frustrating, but +1s don't help :)

Who from the operations team is actually working on getting this fixed right
now? The bug is assigned to Sam and AFAIK he's not on it.

I was working with Roan, Sam, RobLa, and Asher last night on this (so there's 4 people plus me). We added some additional profiling late last night that today should give us some more insights.

bzimport added a comment.Via ConduitAug 12 2011, 5:04 PM

ebe123_wiki wrote:

Commons Helper is going so slow because of commons. Ebe123

bzimport added a comment.Via ConduitAug 12 2011, 5:09 PM

ebe123_wiki wrote:

Commons Helper is going so slow because of commons. Ebe123(In reply to comment #22)

Mark: how far back does your chart go? Reedy believes this started to become an
issue around July 23rd.

I'm more inclined to believe this is a real issue -- we're hearing about this
from lots of people. It might be localized to Europe, like the last problems
were.

I'm in Halifax, and it is taking forever. Ebe123

Reedy added a comment.Via ConduitAug 12 2011, 5:56 PM

(In reply to comment #41)

Commons Helper is going so slow because of commons. Ebe123(In reply to comment
#22)
> Mark: how far back does your chart go? Reedy believes this started to become an
> issue around July 23rd.
>
> I'm more inclined to believe this is a real issue -- we're hearing about this
> from lots of people. It might be localized to Europe, like the last problems
> were.

I'm in Halifax, and it is taking forever. Ebe123

Halifax, Yorkshire or Halifax Canada?

Also, this is still an issue for you? Operations believe this should be fixed (and was in their tests) as of a couple of hours ago

kaldari added a comment.Via ConduitAug 12 2011, 11:18 PM

Uploading speed via the API seems to be about 3 times faster now. It would be nice if a baseline speed were defined which could be tested against (or a range of speeds), so that we don't have to rely on people deciding that uploading is just "too slow" and filing a bug before anyone takes notice.

Multichill added a comment.Via ConduitAug 13 2011, 10:03 AM

Over the last couple of days several people turned around the whole cluster trying to pinpoint the bottleneck. Squid were ruled out, ms7 and nfs was ruled out. It ended up being a low level problem:

[11:57] mark it was a nasty problem with TSO/GRO being broken with linux 802.1q tagged interfaces
[11:57] multichill So really low level problem?
[11:58] mark yeah
[11:58] mark so, the nic on lvs4 was reassembling tcp packets into jumbo packets before presenting them to the OS
[11:58] mark after which LVS would forward them
[11:58] mark and then they wouldn't be split back up again by the nic after sending out
[11:58] multichill And fragmentation?
[11:58] mark and dropped as jumbo packets
[11:58] mark so, tcp delays, icmp "frag needed" messages being sent
[11:58] mark really hard to see because on the wire, they were < 1500 byte packages as usual
[12:00] mark the fix was disabling GRO on all lvs servers
[12:00] mark no idea why it was on by default anyway, on most servers it isn't
[12:00] mark probably some nic drivers enable it, most don't
[12:01] mark i bet TSO wasn't happening because of the added 802.1q vlan tag

Thanks everyone for debugging this problem. I confirmed on Commons that upload is fast again (17MB file uploaded in less than 10 seconds).

Closing this bug as resolved.

bzimport added a comment.Via ConduitAug 14 2011, 2:15 PM

ebe123_wiki wrote:

(In reply to comment #42)

(In reply to comment #41)
> Commons Helper is going so slow because of commons. Ebe123(In reply to comment
> #22)
> > Mark: how far back does your chart go? Reedy believes this started to become an
> > issue around July 23rd.
> >
> > I'm more inclined to believe this is a real issue -- we're hearing about this
> > from lots of people. It might be localized to Europe, like the last problems
> > were.
>
> I'm in Halifax, and it is taking forever. Ebe123

Halifax, Yorkshire or Halifax Canada?

Also, this is still an issue for you? Operations believe this should be fixed
(and was in their tests) as of a couple of hours ago

Canada, the capital of nova scotia.

bzimport added a comment.Via ConduitAug 14 2011, 2:16 PM

ebe123_wiki wrote:

(In reply to comment #42)

(In reply to comment #41)
> Commons Helper is going so slow because of commons. Ebe123(In reply to comment
> #22)
> > Mark: how far back does your chart go? Reedy believes this started to become an
> > issue around July 23rd.
> >
> > I'm more inclined to believe this is a real issue -- we're hearing about this
> > from lots of people. It might be localized to Europe, like the last problems
> > were.
>
> I'm in Halifax, and it is taking forever. Ebe123

Halifax, Yorkshire or Halifax Canada?

Also, this is still an issue for you? Operations believe this should be fixed
(and was in their tests) as of a couple of hours ago

Canada, the capital of nova scotia. Its still an issue.

Catrope added a comment.Via ConduitAug 14 2011, 2:22 PM

(In reply to comment #46)

Canada, the capital of nova scotia. Its still an issue.

So how slow are uploads for you?

Multichill added a comment.Via ConduitAug 14 2011, 9:29 PM

I doubt this is server side. See for example how fast https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=US+National+Archives+bot is going.

Etienne: How as is you internet connection (up and download)? What file size are you trying to upload and how long did this take?

Etienne: You are https://secure.wikimedia.org/wikipedia/commons/w/index.php?title=Special:ListFiles&user=Ebe123 right? What tool do you use for that? Maybe the tool is just slow (I know commonshelper can be very slow).....

bzimport added a comment.Via ConduitNov 2 2011, 8:13 PM

M8R-udfkkf wrote:

Uploads to https://commons.wikimedia.org/w/api.php timeout (response times out) whereas uploads to https://secure.wikimedia.org/wikipedia/commons/w/api.php work fine.

bzimport added a comment.Via ConduitNov 2 2011, 9:31 PM

neilk wrote:

Smallman: other people are using the API successfully... I think that issue has to be either transient or local to your own situation.

We can't just keep reopening the same bug any time somebody has a network issue connecting to Commons.

bzimport added a comment.Via ConduitNov 2 2011, 9:34 PM

neilk wrote:

I just want to clarify: I'm not saying your problem isn't real. I'm saying that we can't keep abusing Bugzilla so that we keep reopening the same bug for any and all network issues.

Please document your issue in a way we can replicate. Your issue seems to be some asymmetry between secure.wikimedia.org and https://commons, which might be a problem, but it's not THIS problem.

MarkAHershberger added a comment.Via ConduitNov 4 2011, 4:30 PM

(In reply to comment #49)

Uploads to https://commons.wikimedia.org/w/api.php timeout (response times out)
whereas uploads to https://secure.wikimedia.org/wikipedia/commons/w/api.php
work fine.

That is a different problem than the one described here. Please open a new bug.

Multichill added a comment.Via ConduitNov 15 2011, 2:58 PM

Resolved invalid? Don't think so. This was fixed back in August.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.