Page MenuHomePhabricator

Circular continue response with generator categorymembers and large category
Open, Needs TriagePublicBUG REPORT

Assigned To
None
Authored By
Bindestrich
Jan 21 2022, 12:31 PM
Referenced Files
F34925660: grafik.png
Jan 21 2022, 12:31 PM
F34925643: grafik.png
Jan 21 2022, 12:31 PM
F34925664: grafik.png
Jan 21 2022, 12:31 PM

Description

steps to reproduce

Make an API request for a large Category and iterate with continue.

Eventually one does hit a loop with the gcmcontinue value . (you get the same continue value from an earlier query)

Sometimes the loop occures after several requests, in this example it is within the same request

Request

{
"action": "query",
"format": "json",
"generator": "categorymembers",
"gcmtitle": "Category:Nazi_symbols_status",
"gcmprop": "ids|title|sortkey",
"gcmtype": "file",
"gcmcontinue": "file|4445555453434845532052454943485347455345545a424c41545420333954322030313320303237392e4a5047|77119651",
"gcmlimit": "500"
}

in ApiSandbox
Response

grafik.png (698×1 px, 65 KB)

{
    "batchcomplete": "",
    "continue": {
        "gcmcontinue": "file|4445555453434845532052454943485347455345545a424c41545420333954322030323220303738302e4a5047|77119651",
        "continue": "gcmcontinue||"
    },
    "query": {
        "pages": {
            "77117729": {
                "pageid": 77117729,
                "ns": 6,
                "title": "File:Deutsches Reichsgesetzblatt 39T2 013 0280.jpg"

The issue exists with and with and without extmetadata

{
"action": "query",
"format": "json",
"prop": "imageinfo",
"generator": "categorymembers",
"iiprop": "extmetadata|url",
"gcmtitle": "Category:Nazi_symbols_status",
"gcmprop": "ids|title|sortkey",
"gcmtype": "file",
"gcmcontinue": "file|4445555453434845532052454943485347455345545a424c41545420333954322030313320303237392e4a5047|77119651",
"gcmlimit": "500"
}

view in Api Sandbox

Python example

import sys
import requests
import json

# enter the number for continue (the last digitas after the last "|")
# here as contval value
contval=""
# or as comand line argument eg. "python circular_api.py  77119651"


url = 'https://commons.wikimedia.org/w/api.php'
title="Category:Nazi_symbols_status"

params = dict(
        action='query',
        format= "json",
#        prop= "imageinfo",
        generator="categorymembers",
 #       iiprop= "extmetadata|url",
        gcmtitle=title,
        gcmprop= "ids|title|sortkey",
        gcmtype= "file",
        gcmcontinue= "file|4445555453434845532052454943485347455345545a424c41545420333954322030313320303237392e4a5047|77119651",
        gcmlimit="500",
    )

if len(sys.argv)>1:
        params["gcmcontinue"]="file|4445555453434845532052454943485347455345545a424c41545420333954322030313320303237392e4a5047|"+sys.argv[1]

if contval:
  params["gcmcontinue"]=str(contval)

print("Used Parameters")
print(params)

resp = requests.get(url=url, params=params)
json_asdict=resp.json()
parsed = json.loads(resp.content)
print("############################################")
print("Response:")
print("############################################")
print(json.dumps(parsed, indent=4, sort_keys=True)[:1000])

Run in Google Collab
I know find this returned a different value in google collab.

But on my server, home computer or the api sandbox , the issue persists

What happens?:
instead of a new value for gcmcontinue being returned, the same value as specified in the api call is returned.

What should have happened instead?:
a different value for gcmcontinue should be returned by the api , to iterate further through the generator.

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

'https://commons.wikimedia.org/w/api.php'

server output

grafik.png (180×1 px, 19 KB)

local execution

grafik.png (252×1 px, 37 KB)

my original code was iterating fine through the responses for a while before the loop occured.
the loops also occured with smaller batch size (I tried 50)