Page MenuHomePhabricator

Create cron tasks for archiving aggregates
Closed, ResolvedPublic3 Estimated Story Points

Description

There are 3 management commands created for T370980: Export aggregates to static files that archive aggregates. This is a further effort to keep the database on the small side and to improve performance in the UI. Once the two pending PRs of the mentioned ticket are merged, we should add the following management commands to the production cron tab:

  • 'archive_link_aggregates' command
  • 'archive_user_aggregates' command
  • 'archive_pageproject_aggregates' command

These commands should be run on the 10th of every month.

Details

Other Assignee
Scardenasmolinar

Event Timeline

Kgraessle triaged this task as Medium priority.Jun 17 2025, 4:39 PM
Kgraessle moved this task from To be estimated to Kanban on the Moderator-Tools-Team board.
Kgraessle set the point value for this task to 3.

We also need to run the new fill commands as well so the program totals continue to render properly. How far back do we want to go for archiving aggregates? Would aggregates older than a year work?

Amdrel changed the task status from Open to In Progress.Jun 24 2025, 11:13 PM
Amdrel claimed this task.
Kgraessle changed the task status from In Progress to Open.Jun 25 2025, 2:23 PM
Kgraessle subscribed.

@Amdrel I'm looking through the order of operations for reviewing your PRs and just wanted to make sure I was reviewing them in the order you intended:

Is this the correct order?

  1. https://github.com/WikipediaLibrary/externallinks/pull/429
  2. https://github.com/WikipediaLibrary/externallinks/pull/431
  3. https://github.com/WikipediaLibrary/externallinks/pull/440
  4. https://github.com/WikipediaLibrary/externallinks/pull/439

@Amdrel https://phabricator.wikimedia.org/p/Amdrel/ I'm looking through
the order of operations for reviewing your PRs and just wanted to make sure
I was reviewing them in the order you intended:

Is this the correct order?

  1. https://github.com/WikipediaLibrary/externallinks/pull/429
  2. https://github.com/WikipediaLibrary/externallinks/pull/431
  3. https://github.com/WikipediaLibrary/externallinks/pull/440
  4. https://github.com/WikipediaLibrary/externallinks/pull/439

Yes, that is correct!

@Amdrel Ok perfect!

I went ahead and merged #429 yesterday.
It looks like #431 will depend on having the aggregates populated in object storage, but we don't actually start doing that until #440 is merged in.
My suggestion is to pull out the cron tab for the aggregates from #440 and create a separate PR for adding the cron commands for filling the top aggregates later.

Sounds good to me. I split the PR in two:

Kgraessle changed the task status from Open to Stalled.Jun 30 2025, 3:56 PM

Stalling this ticket out as https://github.com/WikipediaLibrary/externallinks/pull/431 is blocked by aggregates being populated in object storage.

Kgraessle changed the task status from Stalled to Open.Jul 10 2025, 2:26 PM
Kgraessle moved this task from Eng review to In Progress on the Moderator-Tools-Team (Kanban) board.
Kgraessle updated Other Assignee, added: Scardenasmolinar; removed: Kgraessle.
Kgraessle changed the task status from Open to Stalled.Jul 14 2025, 5:40 PM

Stalling because container endpoints aren't available.

Kgraessle changed the task status from Stalled to Open.Jul 15 2025, 1:03 PM
Kgraessle changed the task status from Stalled to Open.Jul 22 2025, 5:54 PM
Kgraessle moved this task from In Progress to QA on the Moderator-Tools-Team (Kanban) board.
Scardenasmolinar moved this task from QA to Done on the Moderator-Tools-Team (Kanban) board.

Aggregates are now being archived correctly