Page MenuHomePhabricator

Ingest user similarity data for February 2021
Closed, ResolvedPublic


Sockpuppet's database should be refreshed with February 2021 data.
This is a maintenance ticket to coordinate all parties involved, and set an ETA.

This action requires:

  • New run of the algorithm that generates user similarity data.
  • MySQL ingestion.

During ingestion the service will enter a maintenance window of approx 4 hours. During maintenance,
recommendations won't be served.


Event Timeline

Hey @Marostegui; would it work for us to schedule this data ingestion in the week of March 15? Do you have any preference for a date/time, in case?

That should be ok, anytime within 06:00 UTC and 15:00 UTC should be fine.
I am off Friday 19th though.

Hey @Marostegui: the ingestion is scheduled for today. We expect the import to start around 1200CEST.
We are using the following parameters:

  • batch size (num rows): 7000
  • throttle between batch insertions: 1000ms

Me and @hnowlan will be monitoring the process.

An update on this:
we are ~35% into ingesting the final (largest dataset), but we are experiencing a slightly lower throughput than anticipated (based on previous runs). At this rate the import is expected to complete between 1700 and 1800CEST (if things stay as is, we ETA is closer to 1700CEST).

Looking at grafana, it seems to me that MySQL is doing ok. Write rate seems consistent, and I did not spot any i/o or latency spike.

Ingestion started at 2021-03-16 12:19:55,924CEST, and successfully completed at 2021-03-16 18:35:00,545CEST.
Some stats:

Model=Temporal  Read=17407009   Skipped=0       Inserted=17407009
Model=UserMetadata      Read=7783760    Skipped=0       Inserted=7783760
Model=Coedit    Read=103019281  Skipped=0       Inserted=103019281

Thanks @gmodena - it doesn't seem to have caused anything on the master/slave so the values at T276948#6916816 looked good.