Host API for token persistence dataset
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Halfak
	May 2 2017, 5:04 PM

Description

This task is done when we have a batch process that generates token persistence data and an API for accessing that data in useful ways.

Basic schema: (token, character_offset, rev_id, page_id, user_id, revisions_persisted(rev_id, character_offset, user_id))
Tree structure: user -> page -> edit -> token(s) changed
Size calculation
- 350 MB * 2000 * 1GB / 1000MB * 1TB / 1000GB = 350 * 2 / 1000 = 700/1000 = .7TB

(generated for 2015-06-02)

TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia
- dataset
- paper

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 2 2017, 5:04 PM

Halfak added a project: Analytics.

• DarTar renamed this task from Host API for edit productivity dataset to Host API for token persistence dataset.May 2 2017, 10:59 PM

• DarTar updated the task description. (Show Details)

• DarTar subscribed.

Halfak updated the task description. (Show Details)

• Nuria triaged this task as Medium priority.May 4 2017, 4:26 PM

From team grooming.
Declining, as we haven't had any buy in for this task.
Please, reopen if necessary.