Page MenuHomePhabricator

Update snapshot schema and build chunks endpoint
Open, Needs TriagePublic3 Estimated Story Points

Description

In order to allow (parallel) downloading chunks (of a snapshot) and querying chunks metada, we need to provide similar endpoints for chunks as that of /snapshots.

Refer to RfC-chuncked-snapshots for details.

To do

  • Add Chunks field to the snapshot schema

Note that we will use the same snapshot schema for both snapshot metadata instance and chunk metadata instance.
In the schema/snapshot.go, add the following:

Chunks      []string    `json:"chunks,omitempty"`

chunks is an array of chunk identifiers, for instance [“enwiki_namespace_0_chunk_0”, “enwiki_namespace_0_chunk_1”, …]

  • Create GET/POST /v2/snapshots/{snapshot_identifier}/chunks endpoint. Use proxy.NewGetEntities handler for this.

This will return a list of chunks metadata for the snapshots that have chunks. For the snapshots that do not have chunks, it will return an empty list [].
Resolves from s3 keys as follows:
For enwiki_namespace_0
chunks/enwiki_namespace_0_chunk_0.json
chunks/enwiki_namespace_0_chunk_1.json
.
.

  • Create GET/POST /v2/snapshots/{snapshot_identifier}/chunks/{identifier} endpoint. Use proxy.NewGetEntity handler

Allows users to query metadata about a specific chunk using the chunk name.
Make it possible to call it using the full chunk identifier or just the chunk index as follows:
GET /v2/snapshots/enwiki_namespace_0/chunks/enwiki_namespace_0_chunk_2
is same as
GET /v2/snapshots/enwiki_namespace_0/chunks/2

Resolves from s3 keys as follows:
For this example: chunks/enwiki_namespace_0_chunk_2.json

  • Create HEAD /v2/snapshots/{snapshot_identifier}/chunks/{identifier}/download. Use proxy.NewHeadDownload handler for this.

Can call this endpoint with full chunk name or chunk index as above.

Resolves from s3 keys as follows:
For example: chunks/enwiki_namespace_0_chunk_2.tar.gz

  • Create GET /v2/snapshots/{snapshot_identifier}/chunks/{identifier}/download. Use proxy.NewGetDownload handler.

Just like the get for snapshots, it will return a pre-signed s3 link for a specific chunk of a specific snapshot.
Can call this API with full chunk name or chunk index as above.

Resolves from s3 keys as follows:
For example: chunks/enwiki_namespace_0_chunk_2.tar.gz

  • In infrastructure, access/main.csv add the following access for chunks
p, chunks, /v2/snapshots/:identifier/chunks, GET
p, chunks, /v2/snapshots/:identifier/chunks, POST
p, chunks, /v2/snapshots/:identifier/chunks/:identifier, GET
p, chunks, /v2/snapshots/:identifier/chunks/:identifier, POST
p, chunks, /v2/snapshots/:identifier/chunks/:identifier/download, GET
p, chunks, /v2/snapshots/:identifier/chunks/:identifier/download, HEAD
g, group_2, chunks
g, group_3, chunks

QA/Acceptance criteria

  • All the above endpoints should work using the mock json and mock tar.gz in s3

Event Timeline

prabhat renamed this task from Update snapshot schema and build chunk endpoint to Update snapshot schema and build chunks endpoint .Jan 19 2024, 5:44 PM
prabhat updated the task description. (Show Details)
prabhat updated the task description. (Show Details)
prabhat updated the task description. (Show Details)
prabhat updated the task description. (Show Details)