In order to allow (parallel) downloading chunks (of a snapshot) and querying chunks metada, we need to provide similar endpoints for chunks as that of /snapshots.
Refer to RfC-chuncked-snapshots for details.
To do
- Add Chunks field to the snapshot schema
Note that we will use the same snapshot schema for both snapshot metadata instance and chunk metadata instance.
In the schema/snapshot.go, add the following:
Chunks []string `json:"chunks,omitempty"`
chunks is an array of chunk identifiers, for instance [“enwiki_namespace_0_chunk_0”, “enwiki_namespace_0_chunk_1”, …]
- Create GET/POST /v2/snapshots/{snapshot_identifier}/chunks endpoint. Use proxy.NewGetEntities handler for this.
This will return a list of chunks metadata for the snapshots that have chunks. For the snapshots that do not have chunks, it will return an empty list [].
Resolves from s3 keys as follows:
For enwiki_namespace_0
chunks/enwiki_namespace_0_chunk_0.json
chunks/enwiki_namespace_0_chunk_1.json
.
.
- Create GET/POST /v2/snapshots/{snapshot_identifier}/chunks/{identifier} endpoint. Use proxy.NewGetEntity handler
Allows users to query metadata about a specific chunk using the chunk name.
Make it possible to call it using the full chunk identifier or just the chunk index as follows:
GET /v2/snapshots/enwiki_namespace_0/chunks/enwiki_namespace_0_chunk_2
is same as
GET /v2/snapshots/enwiki_namespace_0/chunks/2
Resolves from s3 keys as follows:
For this example: chunks/enwiki_namespace_0_chunk_2.json
- Create HEAD /v2/snapshots/{snapshot_identifier}/chunks/{identifier}/download. Use proxy.NewHeadDownload handler for this.
Can call this endpoint with full chunk name or chunk index as above.
Resolves from s3 keys as follows:
For example: chunks/enwiki_namespace_0_chunk_2.tar.gz
- Create GET /v2/snapshots/{snapshot_identifier}/chunks/{identifier}/download. Use proxy.NewGetDownload handler.
Just like the get for snapshots, it will return a pre-signed s3 link for a specific chunk of a specific snapshot.
Can call this API with full chunk name or chunk index as above.
Resolves from s3 keys as follows:
For example: chunks/enwiki_namespace_0_chunk_2.tar.gz
- In infrastructure, access/main.csv add the following access for chunks
p, chunks, /v2/snapshots/:identifier/chunks, GET p, chunks, /v2/snapshots/:identifier/chunks, POST p, chunks, /v2/snapshots/:identifier/chunks/:identifier, GET p, chunks, /v2/snapshots/:identifier/chunks/:identifier, POST p, chunks, /v2/snapshots/:identifier/chunks/:identifier/download, GET p, chunks, /v2/snapshots/:identifier/chunks/:identifier/download, HEAD g, group_2, chunks g, group_3, chunks
QA/Acceptance criteria
- All the above endpoints should work using the mock json and mock tar.gz in s3