Page MenuHomePhabricator

First chunk can be smaller than 1024 bytes
Closed, InvalidPublic

Description

When uploading a file in chunked mode it now complains that the chunk is too small, but only with the second chunk. I was able to upload the first chunk with only 512 bytes.

I also checked to upload the first chunk with 1024 bytes and the second with less than 1024 to verify that it's not actually allowing one smaller chunk (instead of just allowing the last one).

I was using the API via Pywikibot and tried to upload a file in 512 byte chunks which did work until recently. Interestingly all tests in Gerrit 234851 worked except the chunked (as of PS5) and others also upload in 512 byte chunks, but just the first chunk.

Event Timeline

XZise raised the priority of this task from to Needs Triage.
XZise updated the task description. (Show Details)
XZise added a subscriber: XZise.
Restricted Application added subscribers: Steinsplitter, Aklapper. · View Herald Transcript

Can you provide the actual requests, or a set of pywikibot or (ideally) curl commands reproducing the issue?

Okay here are the curl commands to reproduce it. If you are logged in you can skip the login section. If you have already an CSRF token you can also skip that section. The login will store the cookie into the cookie.lwp file.

Login

$ PASS=
$ NAME=
$ curl -b cookie.lwp -c cookie.lwp -d action=login -d lgname=$NAME -d lgpassword=$PASS -d format=json https://test.wikipedia.org/w/api.php
{"login":{"result":"NeedToken","token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","cookieprefix":"testwiki","sessionid":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}

Now use the token in the next request and you are logged in:

$ curl -b cookie.lwp -c cookie.lwp -d action=login -d lgname=$NAME -d lgpassword=$PASS -d lgtoken=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx -d format=json https://test.wikipedia.org/w/api.php
{"login":{"result":"Success","lguserid":x,"lgusername":"x","lgtoken":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","cookieprefix":"testwiki","sessionid":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}

CSRF token

Now request the CSRF token (I request the userinfo too, to verify that I'm logged in):

$ curl -b cookie.lwp -d action=query -d meta="userinfo|tokens" -d type=csrf -d format=json https://test.wikipedia.org/w/api.php 
{"batchcomplete":"","query":{"userinfo":{"id":x,"name":"x"},"tokens":
{"csrftoken":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx+\\"}}}
$ TOKEN="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx+\\"

Upload

So now we get to this bug. I'm using the MP_sounds.png test file and split it into three chunks:

$ wget https://raw.githubusercontent.com/wikimedia/pywikibot-core/master/tests/data/images/MP_sounds.png
$ split -a 1 --numeric-suffixes=1 -b 512 MP_sounds.png c512.
$ curl -b cookie.lwp -F format=json -F action=upload -F stash=1 -F filesize=1276 -F filename=File:MP_sounds-pwb.png -F token=$TOKEN -F offset=0 -F chunk=@c512.1 https://test.wikipedia.org/w/api.php
{"upload":{"result":"Warning","warnings":{"badfilename":"MP_sounds-pwb.png","exists":"MP_sounds-pwb.png"},"filekey":"13fzhwpp679k.rt8iip.28718.png","sessionkey":"13fzhwpp679k.rt8iip.28718.png"}}
$ FILEKEY=13fzhwpp679k.rt8iip.28718.png

It is important to store the filekey. We can also verify that something was actually uploaded using query+stashimageinfo:

$ curl -b cookie.lwp -d format=json -d action=query -d prop=stashimageinfo -d siifilekey=$FILEKEY -d siiprop="size|sha1|timestamp" https://test.wikipedia.org/w/api.php
{"batchcomplete":"","query":{"stashimageinfo":[{"timestamp":"2015-09-13T09:40:08Z","size":512,"width":24,"height":27,"sha1":"7c00fe81d1f2318b4311baf99219582ab037e00a"}]}}
$ sha1sum c512.*
7c00fe81d1f2318b4311baf99219582ab037e00a  c512.1
88c60bcac3807becfa4e386382d1e874952804ef  c512.2
2ed6566594d8893d55f5fbbc279f096ee157e28d  c512.3

So the uploaded chunk seems fine, now continue uploading the second chunk:

$ curl -b F format=json -F action=upload -F stash=1 -F filesize=1276 -F filename=File:MP_sounds-pwb.png -F token=$TOKEN -F offset=512 -F filekey=$FILEKEY -F chunk=@c512.2 -F ignorewarnings=1 https://test.wikipedia.org/w/api.php
{"servedby":"mw1017","error":{"code":"chunk-too-small","info":"Minimum chunk size is 1024 bytes for non-final chunks","*":"See https://test.wikipedia.org/w/api.php for API usage"}}

And tada the error.

Anomie claimed this task.
$ curl -b cookie.lwp -F format=json -F action=upload -F stash=1 -F filesize=1276 -F filename=File:MP_sounds-pwb.png -F token=$TOKEN -F offset=0 -F chunk=@c512.1 https://test.wikipedia.org/w/api.php
{"upload":{"result":"Warning","warnings":{"badfilename":"MP_sounds-pwb.png","exists":"MP_sounds-pwb.png"},"filekey":"13fzhwpp679k.rt8iip.28718.png","sessionkey":"13fzhwpp679k.rt8iip.28718.png"}}

You're not ignoring warnings here, so the badfilename warning prevents processing of the chunk. If you ignore warnings (or just don't trigger one) it properly complains about the chunk size.

It is important to store the filekey. We can also verify that something was actually uploaded using query+stashimageinfo:

$ curl -b cookie.lwp -d format=json -d action=query -d prop=stashimageinfo -d siifilekey=$FILEKEY -d siiprop="size|sha1|timestamp" https://test.wikipedia.org/w/api.php
{"batchcomplete":"","query":{"stashimageinfo":[{"timestamp":"2015-09-13T09:40:08Z","size":512,"width":24,"height":27,"sha1":"7c00fe81d1f2318b4311baf99219582ab037e00a"}]}}

That shows that it stashed your chunk, yes. But trying to check the status of the chunked upload shows there isn't actually one in progress:

$ curl -b cookie.lwp -c cookie.lwp -F format=json -F action=upload -F stash=1 -F filekey=$FILEKEY -F checkstatus=1 -F token=$TOKEN https://test.wikipedia.org/w/api.php
{"servedby":"mw1017","error":{"code":"missingresult","info":"No result in status data","*":"See https://test.wikipedia.org/w/api.php for API usage"}}

now continue uploading the second chunk:

$ curl -b F format=json -F action=upload -F stash=1 -F filesize=1276 -F filename=File:MP_sounds-pwb.png -F token=$TOKEN -F offset=512 -F filekey=$FILEKEY -F chunk=@c512.2 -F ignorewarnings=1 https://test.wikipedia.org/w/api.php
{"servedby":"mw1017","error":{"code":"chunk-too-small","info":"Minimum chunk size is 1024 bytes for non-final chunks","*":"See https://test.wikipedia.org/w/api.php for API usage"}}

This time you did ignore warnings, so it does get as far as checking that the chunk is valid.

If you try it with a full-sized chunk you get a different error:

$ curl -b cookie.lwp -F format=json -F action=upload -F stash=1 -F filesize=1276 -F filename=File:MP_sounds-pwb.png -F token=$TOKEN -F offset=512 -F filekey=$FILEKEY -F chunk=@c512.2+3 -F ignorewarnings=1 https://test.wikipedia.org/w/api.php
{"servedby":"mw1017","error":{"code":"stashfailed","info":"No chunked upload session with this key","*":"See https://test.wikipedia.org/w/api.php for API usage"}}

So what is the result from stashimageinfo good for? Afaik it should return that the chunk is invalid as well.

Additionally is there a security concern why it isn't further processed? Or should we just upload a 1 KiB chunk to check if any warnings happen so that we don't need to wait just to get an error?

So what is the result from stashimageinfo good for?

It shows you that the single "file" was stashed and gives you information about it. Which admittedly isn't terribly useful in this case since you can't currently use the stash as input into the chunked-uploading process. Feel free to file a feature request for that if you'd like.

Afaik it should return that the chunk is invalid as well.

Except it doesn't know that the stashed "file" was intended to be a chunk of a chunked upload.

Additionally is there a security concern why it isn't further processed?

Not that I know of, it's just the way the upload API works: if there's a non-ignored warning, it stashes whatever was uploaded and returns the warning without doing further processing.

Or should we just upload a 1 KiB chunk to check if any warnings happen so that we don't need to wait just to get an error?

Note that 1KiB limit might be increased in the future. The limit on any particular wiki may be queried from api.php?action=query&meta=siteinfo (value is in "minuploadchunksize").