Page MenuHomePhabricator

Ingest user similarity data for April 2021
Closed, ResolvedPublic1 Estimated Story Points


Similarusers database should be refreshed with April 2021 data.
This is a maintenance ticket to coordinate all parties involved, and set an ETA.

This action requires:

New run of the algorithm that generates user similarity data.
MySQL ingestion.
During ingestion the service will enter a maintenance window of approx 4 to 6 hours. During maintenance,
recommendations won't be served.


Event Timeline

gmodena renamed this task from Ingest user similarity data for March 2021 to Ingest user similarity data for April 2021.May 17 2021, 7:18 AM
gmodena created this task.
gmodena set the point value for this task to 1.

@Marostegui today we ran Similarusers ingestion of April data. Some stats:

Loading /home/gmodena/similar-users-private/data/2021-04/temporal.tsv: 18742350rows [57:18, 5450.29rows/s]
Loading /home/gmodena/similar-users-private/data/2021-04/metadata.tsv: 8370839rows [29:25, 4742.03rows/s]
Loading /home/gmodena/similar-users-private/data/2021-04/coedit_counts.tsv: 112013275rows [5:29:18, 5669.14rows/s]
Model=Temporal  Read=18742350   Skipped=0       Inserted=18742350
Model=UserMetadata      Read=8370839    Skipped=0       Inserted=8370839
Model=Coedit    Read=112013275  Skipped=0       Inserted=112013275

I kept the process monitored, but from what I could tell, all seemed fined on the database I/O front.

Excellent! Thanks for the ping :)

Excellent! Thanks for the ping :)

Hey @Marostegui; is it ok with you if we ingest May data without giving an heads up upfront (like we did for April)?

gmodena triaged this task as Low priority.