Page MenuHomePhabricator

Snapshot API: unable to download chunks
Open, Needs TriagePublic

Description

User Story: “As an enterprise API user, I want to download snapshot chunks”

I'm trying to download snapshot chunks using the Python SDK.

 filters = [
     Filter(field="is_part_of.identifier", value="enwiktionary")
 ]
 req = Request(filters=filters)
 snapshots = api_client.get_snapshots(req)
# server-side NS filtering doesn't work, see T393198
 mainspace = [s for s in snapshots if s['namespace']['identifier'] == 0]
 for snapshot in mainspace:
     snapshot_identifier = snapshot['identifier']
     for chunk in snapshot['chunks']:
         with open(f"{chunk}", "wb") as f:
             print(f"downloading {chunk}")
             api_client.download_chunk(snapshot_identifier, chunk, f)

This returns an error:

requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.enterprise.wikimedia.com/v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_0/download

It happens on the first HEAD request issued by the SDK.

Running this from curl gives the same result:

$ curl -I -H @header -X HEAD https://api.enterprise.wikimedia.com/v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_0/download
HTTP/2 403
date: Fri, 02 May 2025 17:21:49 GMT
content-type: application/json
content-length: 37
x-request-id: 0298630d-31be-48cd-b8b3-3003a6d02207
x-envoy-upstream-service-time: 2
server: istio-envoy

Event Timeline

See T389548; known issue for free user tier; in sprint now.

Will these requests work when sent from WMCS?

@jberkel It will work if the request is coming from WMCS.
If the request is coming as a free user, it will not work (yet). (see previously mentioned ticket, and also this epic T386058 for status on that fix)

Snapshot chunking endpoints are now enabled for free accounts.

Confirmed, works now, thanks.

jberkel claimed this task.

How many free chunk requests are there? I'm now getting 429 responses on chunk downloads, and the API dashboard confusingly says: "0 / 0 Chunk requests left" (from a free account outside WMCS).

I'm assuming the account has hit the snapshots limit but confirming with ENG about that and the dashboard thing; standby.

Hey @jberkel, on-call ENG confirmed the account has reached 1500 requests for chunks endpoint. I'll flag it so monday the main crew can take a look and see if we can't open some requests before the monthly reset but for now not much I can do.
As for the dashboard - that seems to be a front-end caching/rendering bug that will receive a ticket to sort that out - apologies for that confusion.

@creynolds Thanks for investigating. With those 1500 requests I was only able to download 35 of the 72 chunks of the Wiktionary dump. I suspect this can be explained by the SDK making several requests for each chunk. In my case I had it set to 5 MB (the default is 25MB), and each chunk is ~ 200 MB, so that makes (72 * 200) / 5 = 2880 requests. With the default transfer size of 25 MB this would just be ~ 576 requests and still in the free tier. Given that the chunk size is configurable it's strange to have these accounted for by number of requests sent. Maybe it would make more sense to cap by amount of data transferred instead.

Also, it looks like there's another bug where the SDK doesn't handle the request limit situation properly and keeps on retrying the requests in quick succession before seemingly getting rate-limited for the retries:

HEAD /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
GET /v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download
Traceback (most recent call last):
  File "/Volumes/Case/wme-sdk-python/./example/snapshots/snapshots.py", line 94, in <module>
    main()
  File "/Volumes/Case/wme-sdk-python/./example/snapshots/snapshots.py", line 87, in main
    download(api_client, chunk, snapshot_identifier)
  File "/Volumes/Case/wme-sdk-python/./example/snapshots/snapshots.py", line 47, in download
    api_client.download_chunk(snapshot_identifier, chunk, f)
  File "/Volumes/Case/wme-sdk-python/modules/api/api_client.py", line 271, in download_chunk
    self._download_entity(f"snapshots/{sid}/chunks/{idr}/download", writer)
  File "/Volumes/Case/wme-sdk-python/modules/api/api_client.py", line 147, in _download_entity
    future.result()
  File "/Users/jan/.local/share/mise/installs/python/3.12.3/lib/python3.12/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/jan/.local/share/mise/installs/python/3.12.3/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/jan/.local/share/mise/installs/python/3.12.3/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Volumes/Case/wme-sdk-python/modules/api/api_client.py", line 140, in download_chunk
    res = self._do(req)
          ^^^^^^^^^^^^^
  File "/Volumes/Case/wme-sdk-python/modules/api/api_client.py", line 85, in _do
    response.raise_for_status()
  File "/Volumes/Case/wme-sdk-python/venv/lib/python3.12/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://api.enterprise.wikimedia.com/v2/snapshots/enwiktionary_namespace_0/chunks/enwiktionary_namespace_0_chunk_35/download

Thanks for that PR @jberkel - team appreciates the help; I see @REsquito-WMF merged it in SDK repo.