Page MenuHomePhabricator

Add revision ID to ReadingDepth Schema and Data
Closed, DuplicatePublic

Description

Currently we are not logging the revision ID of wiki pages that are viewed. This makes it quite expensive to join revision data to reading depth data because we have to join on page and then find the most recent revision.

If we can save the revision id we can join on revision instead and it will be fast!

Event Timeline

Groceryheist renamed this task from Add revision ID to ReadingDepth Schema to Add revision ID to ReadingDepth Schema and Data.Oct 29 2018, 11:10 PM

For the record: I understand that the imminent performance problem that gave rise to this task has been largely solved, so this task is not timely any more. But adding this field might still facilitate future analysis.

A related problem is that pages can move. Right now we record page_title, but different pages can have the same_page title at different times. It would also make downstream analysis much more convenient to have page_id in the schema.

We've identified this field and the page id as useful fields (see also T208478)

Any other fields we would like to add?

Good question. I don't think so, unless there are additional schemas that we might need to join with that have keys other than page_id or revision_id. We already record namespace.