Page MenuHomePhabricator

Ingest user similarity data for June 2021
Open, Needs TriagePublic

Description

Similarusers database should be refreshed with June 2021 data.
This is a maintenance ticket to coordinate all parties involved, and set an ETA.

This action requires:

New run of the algorithm that generates user similarity data.
MySQL ingestion.
During ingestion the service will enter a maintenance window of approx 4 to 6 hours. During maintenance,
recommendations won't be served.

References;

Event Timeline

June run has successfully completed on 2021-07-13 at 1600UTC/1800CEST.

Model=Temporal	Read=19932894	Skipped=0	Inserted=19932894
Model=UserMetadata	Read=8898300	Skipped=0	Inserted=8898300
Model=Coedit	Read=120067390	Skipped=0	Inserted=120067390

@Marostegui @hnowlan would you object to having this ingestion process automated (=scheduled by a cron job) moving forward? What kind of safeguards would you need in place?

I am fine with that as long as we use the same parameters we've been using for the previous runs.
We should probably try to report or graph these runs if possible in order to detect anomalies (ie: super short or long runtimes might indicate problems). How would this automation deal with errors? (ie: the host being down for maintenance).