As categories change, we need to update the contents of the graph database hosting categories. For this, we need to figure out mechanism for updating those.
Description
Details
Event Timeline
Current thinking is:
- Every day, create RDF of updated categories, as SPARQL Update file
- Load it into the blazegraph after it is created.
- This will be done for each wiki that has the functionality enabled.
Reality check:
enwiki seems to have 73662 category updates and 498 category creations on August 19th 2017. Similar numbers show up on other days. This seems to be completely workable number to process daily. Moreover, many category updates will prove on the same categories - seems to be real number of distinct categories update on enwiki is around 25K/day.
On commons, numbers seem to be about 2-3x from this for modifications and about 5x for creations. Still seems to be workable, and commons is probably the upper bound of what we're going to get.
Change 392736 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Create script for automatic reload of categories
Change 394021 had a related patch set uploaded (by Gehel; owner: Guillaume Lederrey):
[operations/puppet@production] wdqs: schedule cronjob to reload categories
Change 392736 merged by Gehel:
[operations/puppet@production] Create script for automatic reload of categories
Change 394021 merged by Gehel:
[operations/puppet@production] wdqs: schedule cronjob to reload categories