Update snapshot schema and build chunks endpoint
Open, Needs TriagePublic3 Estimated Story Points
Actions

Assigned To

None

Authored By

	prabhat
	Jan 19 2024, 5:32 PM

Description

In order to allow (parallel) downloading chunks (of a snapshot) and querying chunks metada, we need to provide similar endpoints for chunks as that of /snapshots.

Refer to RfC-chuncked-snapshots for details.

To do

Add Chunks field to the snapshot schema

Note that we will use the same snapshot schema for both snapshot metadata instance and chunk metadata instance.
In the schema/snapshot.go, add the following:

Chunks      []string    `json:"chunks,omitempty"`

chunks is an array of chunk identifiers, for instance [“enwiki_namespace_0_chunk_0”, “enwiki_namespace_0_chunk_1”, …]

Create GET/POST /v2/snapshots/{snapshot_identifier}/chunks endpoint. Use proxy.NewGetEntities handler for this.

This will return a list of chunks metadata for the snapshots that have chunks. For the snapshots that do not have chunks, it will return an empty list [].
Resolves from s3 keys as follows:
For enwiki_namespace_0
chunks/enwiki_namespace_0_chunk_0.json
chunks/enwiki_namespace_0_chunk_1.json
.
.

Create GET/POST /v2/snapshots/{snapshot_identifier}/chunks/{identifier} endpoint. Use proxy.NewGetEntity handler

Allows users to query metadata about a specific chunk using the chunk name.
Make it possible to call it using the full chunk identifier or just the chunk index as follows:
GET /v2/snapshots/enwiki_namespace_0/chunks/enwiki_namespace_0_chunk_2
is same as
GET /v2/snapshots/enwiki_namespace_0/chunks/2

Resolves from s3 keys as follows:
For this example: chunks/enwiki_namespace_0_chunk_2.json

Create HEAD /v2/snapshots/{snapshot_identifier}/chunks/{identifier}/download. Use proxy.NewHeadDownload handler for this.

Can call this endpoint with full chunk name or chunk index as above.

Resolves from s3 keys as follows:
For example: chunks/enwiki_namespace_0_chunk_2.tar.gz

Create GET /v2/snapshots/{snapshot_identifier}/chunks/{identifier}/download. Use proxy.NewGetDownload handler.

Just like the get for snapshots, it will return a pre-signed s3 link for a specific chunk of a specific snapshot.
Can call this API with full chunk name or chunk index as above.

Resolves from s3 keys as follows:
For example: chunks/enwiki_namespace_0_chunk_2.tar.gz

In infrastructure, access/main.csv add the following access for chunks

p, chunks, /v2/snapshots/:identifier/chunks, GET
p, chunks, /v2/snapshots/:identifier/chunks, POST
p, chunks, /v2/snapshots/:identifier/chunks/:identifier, GET
p, chunks, /v2/snapshots/:identifier/chunks/:identifier, POST
p, chunks, /v2/snapshots/:identifier/chunks/:identifier/download, GET
p, chunks, /v2/snapshots/:identifier/chunks/:identifier/download, HEAD
g, group_2, chunks
g, group_3, chunks

QA/Acceptance criteria

All the above endpoints should work using the mock json and mock tar.gz in s3

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		ROdonnell-WMF	T353881 Create an RFC for chunked snapshots
		Open		None	T355443 Update snapshot schema and build chunks endpoint

Event Timeline

prabhat created this task.Jan 19 2024, 5:32 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 19 2024, 5:32 PM

prabhat renamed this task from Update snapshot schema and build chunk endpoint to Update snapshot schema and build chunks endpoint .Jan 19 2024, 5:44 PM

prabhat updated the task description. (Show Details)

prabhat updated the task description. (Show Details)Jan 19 2024, 5:50 PM

prabhat updated the task description. (Show Details)

prabhat added a parent task: T353881: Create an RFC for chunked snapshots.Jan 19 2024, 5:52 PM

creynolds subscribed.Jan 19 2024, 6:56 PM

JArguello-WMF set the point value for this task to 3.Jan 22 2024, 2:38 PM

JArguello-WMF moved this task from To Be Estimated/To Be Discussed to Sprint 54 on the Wikimedia Enterprise board.

JArguello-WMF edited projects, added Wikimedia Enterprise (Sprint 54); removed Wikimedia Enterprise.

JArguello-WMF edited projects, added Wikimedia Enterprise; removed Wikimedia Enterprise (Sprint 54).Jan 23 2024, 2:30 PM

JArguello-WMF moved this task from Incoming to Estimated /Discussed on the Wikimedia Enterprise board.

dr0ptp4kt subscribed.Mar 22 2024, 4:31 PM

JArguello-WMF moved this task from Estimated /Discussed to API Usability on the Wikimedia Enterprise board.Tue, Apr 9, 3:17 PM