Page MenuHomePhabricator

metadata param in PageGenerator might lead to huge data transfer
Open, Needs TriagePublic

Description

A simple listpages.py query on a category full of djvu files requires a lot of bandwidth.
Moreover, pages are duplicated (this should be solvable with https://gerrit.wikimedia.org/r/#/c/174827/, I think)

metadata should be used with care, I guess.
I cannot recall why it is used by default, there must have been a good reason ...

user@pc:~/python/core {listpages}$ python scripts/listpages.py -cat:'Works of Jules Verne (1911)' -lang:commons -family:commons
WARNING: API warning (result): This result was truncated because it would otherwise  be larger than the limit of 12,582,912 bytes
   1 Works of Jules Verne - Parke - Vol 1.djvu
   2 Works of Jules Verne - Parke - Vol 10.djvu
   3 Works of Jules Verne - Parke - Vol 11.djvu
   4 Works of Jules Verne - Parke - Vol 12.djvu
   5 Works of Jules Verne - Parke - Vol 13.djvu
   6 Works of Jules Verne - Parke - Vol 14.djvu
   7 Works of Jules Verne - Parke - Vol 15.djvu
   8 Works of Jules Verne - Parke - Vol 2.djvu
   9 Works of Jules Verne - Parke - Vol 3.djvu
  10 Works of Jules Verne - Parke - Vol 4.djvu
  11 Works of Jules Verne - Parke - Vol 5.djvu
  12 Works of Jules Verne - Parke - Vol 6.djvu
  13 Works of Jules Verne - Parke - Vol 7.djvu
  14 Works of Jules Verne - Parke - Vol 8.djvu
  15 Works of Jules Verne - Parke - Vol 9.djvu
  16 Works of Jules Verne - Parke - Vol 1.djvu
  17 Works of Jules Verne - Parke - Vol 10.djvu
  18 Works of Jules Verne - Parke - Vol 11.djvu
  19 Works of Jules Verne - Parke - Vol 12.djvu
  20 Works of Jules Verne - Parke - Vol 13.djvu
  21 Works of Jules Verne - Parke - Vol 14.djvu
  22 Works of Jules Verne - Parke - Vol 15.djvu
  23 Works of Jules Verne - Parke - Vol 2.djvu
  24 Works of Jules Verne - Parke - Vol 3.djvu
  25 Works of Jules Verne - Parke - Vol 4.djvu
  26 Works of Jules Verne - Parke - Vol 5.djvu
  27 Works of Jules Verne - Parke - Vol 6.djvu
  28 Works of Jules Verne - Parke - Vol 7.djvu
  29 Works of Jules Verne - Parke - Vol 8.djvu
  30 Works of Jules Verne - Parke - Vol 9.djvu
30 page(s) found

Event Timeline

Mpaa created this task.Oct 7 2015, 7:22 PM
Mpaa raised the priority of this task from to Needs Triage.
Mpaa updated the task description. (Show Details)
Mpaa added a project: Pywikibot.
Mpaa added a subscriber: Mpaa.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptOct 7 2015, 7:22 PM
XZise added a subscriber: XZise.Oct 7 2015, 8:15 PM

If I remember correctly the issue is that the metadata is transferred unsolicited. But I can't find the task for it.

XZise added a comment.Oct 7 2015, 8:22 PM

Okay I looked through my IRC logs and found something from June. The pywikibot.data.api.PageGenerator always adds iiprop=metadata which includes a lot of data in case of djvu files like in File:Alberti - De re aedificatoria, 1541.djvu.