Frequent chunk-too-small errors
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	Fae
	Oct 21 2020, 11:49 AM

Description

During the batch upload project UK Legislation, there are a significant number of files rejected by the API with 'chunk-too-small'. Is there a work-around or a fix that could be applied for this Pywikibot based mass upload?

This error has not been a problem for image mimetype uploads, but appears quite likely for document mimetypes.

Example un-uploadable files:
- Edw7-7-17 page/pdf link
- UKSI 1965-1559 page/pdf link
This may be related to T132676.

Related Objects

Mentioned Here: T265690: Please add www.legislation.gov.uk to $wgCopyUploadsDomains
T132676: Impossible to upload a file that ends with the '\r' byte using Pywikibot

Event Timeline

Fae created this task.Oct 21 2020, 11:49 AM

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptOct 21 2020, 11:49 AM

Fae updated the task description. (Show Details)Oct 21 2020, 11:58 AM

Fae added a subscriber: zhuyifei1999.

I think it is related to T132676, I checked the first file and it ends in '\r'.

As my 'personal' work-around, just for UK legislation PDFs that the API flags with chunk-too-small and fails on a second upload, the pdf is trimmed of the final byte and re-attempted. In my view this is a terrible hack rather than a fix.

However, this initially appears to be working with the files both uploading and displaying successfully, though it may later cause unpredictable errors as it's hardly an intelligent fix. Ref to this category for examples.

Code snippet:

rec = uptry(local, fn, dd, comment, False)
if rec in ['chunk-too-small']:
	print "Chunk-too-small, so trying trimming off 1 byte"
	with open(local, 'rb+') as filehandle:
		filehandle.seek(-1, os.SEEK_END)
		filehandle.truncate()
	rec = uptry(local, fn, dd + "\n[[Category:Work around of byte trimmed for chunk-too-small API error]]", comment, False)

Could not reproduce. Please provide at least the following information:

Operating system
Python environment and version (import sys; print(sys.version))
Pywikibot version (import pywikibot; print(pywikibot.__version__))
Relevant code or command used, including the chunk size configuration
Complete logs of upload attempt (VERBOSE-level or lower preferred)
Hash of the file that could not be uploaded

It would also be very useful if you could provide the request and response headers for the failed chunk upload, including the exact size of the chunk. Information about if the file appears in Special:UploadStash would also be helpful.

In T266117#6576394, @AntiCompositeNumber wrote:

Could not reproduce. Please provide at least the following information:

Operating system

Ubuntu Release 18.04.5 LTS (Bionic Beaver) 64-bit

Python environment and version (import sys; print(sys.version))

2.7.17 (default, Sep 30 2020, 13:38:04) [GCC 7.5.0]

Pywikibot version (import pywikibot; print(pywikibot.__version__))

3.1.dev0

Relevant code or command used, including the chunk size configuration

site.upload(pywikibot.FilePage(site, 'File:' + pagetitle),
			source_filename=source_filename,
			source_url=source_url,
			comment=comment,
			text=desc,
			ignore_warnings = False,
			chunk_size = 400000,#1048576,
			#async = True,
			)

Complete logs of upload attempt (VERBOSE-level or lower preferred)

pywikibot.data.api.APIError: chunk-too-small: Minimum chunk size is 1,024 bytes for non-final chunks. [help:See https://commons.wikimedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce&gt; for notice of API deprecations and breaking changes.]

Hash of the file that could not be uploaded

Test file is https://www.legislation.gov.uk/ukpga/1844/61/pdfs/ukpga_18440061_en.pdf

It would also be very useful if you could provide the request and response headers for the failed chunk upload, including the exact size of the chunk. Information about if the file appears in Special:UploadStash would also be helpful.

Can't recall how to dig out the WMF server stash log. This upload was under User:Fæ and would have been at 2020-10-25 10:37 UK time.

@Fae, I proposed a fix/hack at T132676.
As you have several cases, I would appreciate if you could use it and provide feedback.

In T266117#6576230, @Fae wrote:
As my 'personal' work-around, just for UK legislation PDFs that the API flags with chunk-too-small and fails on a second upload, the pdf is trimmed of the final byte and re-attempted. In my view this is a terrible hack rather than a fix.

However, this initially appears to be working with the files both uploading and displaying successfully, though it may later cause unpredictable errors as it's hardly an intelligent fix. Ref to this category for examples.

Code snippet:
rec = uptry(local, fn, dd, comment, False)
if rec in ['chunk-too-small']:
	print "Chunk-too-small, so trying trimming off 1 byte"
	with open(local, 'rb+') as filehandle:
		filehandle.seek(-1, os.SEEK_END)
		filehandle.truncate()
	rec = uptry(local, fn, dd + "\n[[Category:Work around of byte trimmed for chunk-too-small API error]]", comment, False)

I think a better workaround, if it works, is to use source_url in site.upload().
It delegates to API the task of fetching the file. If it works, the file is hopefully the original.
See https://en.wikisource.org/w/api.php?action=help&modules=upload

I think a better workaround, if it works, is to use source_url in site.upload().
It delegates to API the task of fetching the file. If it works, the file is hopefully the original.
See https://en.wikisource.org/w/api.php?action=help&modules=upload

That can only happen when T265690 is complete.

Mpaa closed this task as a duplicate of T132676: Impossible to upload a file that ends with the '\r' byte using Pywikibot.Oct 25 2020, 8:23 PM

Frequent chunk-too-small errorsClosed, DuplicatePublicActions

Description

Related Objects

Event Timeline

Frequent chunk-too-small errors
Closed, DuplicatePublic
Actions