Page MenuHomePhabricator

Wikilink: Improve Database performance by introducing additional indices
Closed, ResolvedPublic3 Estimated Story Points

Description

As an admin of wikilink, I should have more performant database queries to display data in a timelier fashion on the website.

Improving indexes on our existing schema may improve our database performance.

Per Django's documentation:

Consider adding indexes to fields that you frequently query using filter(), exclude(), order_by(), etc. as indexes may help to speed up lookups. Note that determining the best indexes is a complex database-dependent topic that will depend on your particular application. The overhead of maintaining an index may outweigh any gains in query speed.

Acceptance Criteria:

  • Investigate which fields should be indexed and add them.
  • Add additional indexes to fields we frequently use fitler, exclude, or order by, etc...
  • Investigate on the write performance of the database.

Event Timeline

Kgraessle renamed this task from Wikilink improve indexes to Wikilink: Improve Database performance by introducing additional indices.Jul 19 2024, 3:04 PM
Kgraessle updated the task description. (Show Details)
Scardenasmolinar moved this task from To be estimated to Up next on the Moderator-Tools-Team board.
Scardenasmolinar updated the task description. (Show Details)
Amdrel changed the task status from Open to In Progress.Feb 21 2025, 7:07 PM

Hi. I have a little time but haven’t contributed before. I’m well experienced in index tuning so feel free to reach out if I can help.

I have a work in progress PR here that adds a couple: https://github.com/WikipediaLibrary/externallinks/pull/417

I've had difficulty finding performance improving indexes for couple of our slowest endpoints, those being project 'top_projects' and 'top_users'. Since the data that these endpoints use only changes daily from the cron jobs we can cache it in memcached with a long expiration to workaround the performance issues with those endpoints, though that's out of scope for this ticket.

I've marked the PR as ready to review. I briefly looked at write performance and haven't identified any unused indexes or any other issues with the writes themselves causing slowdowns. We're still bottlenecked by SELECT operations that precede writes to both aggregates and link events.

jsn.sherman moved this task from QA to Done on the Moderator-Tools-Team (Kanban) board.
jsn.sherman subscribed.

verified that the migration completed and loaded up pageproject and user aggregates; we're still having timeouts on both, but we can assume that we just can't make it over the line of request timeouts. Accepting this as an improvement.

verified that the migration completed and loaded up pageproject and user aggregates; we're still having timeouts on both, but we can assume that we just can't make it over the line of request timeouts. Accepting this as an improvement.

Those were the queries that I had difficulty improving. By their nature they need to scan almost everything in their respective tables (for program 1) so an alternative solution would probably be needed to speed those up.