Page MenuHomePhabricator

Identify bottlenecks in Hashtags tool data collection performance
Open, Needs TriagePublic

Description

In T343104 we have recently been having issues with the Hashtags tool being unable to keep up with live data as it monitors and collects Hashtag-tagged edits from the recent changes event stream. It's unclear why these slowdowns are happening - we don't know what takes the most time about checking and recording a new edit.

We could do some profiling to understand what exactly is taking so long.

Event Timeline

Some notes from our meeting:

  • The current process is very thread bound. Fixing this would be a big project.
  • Starting with a smaller project to improve the current state would be the better idea.
  • Stealing from Wikilink-Tool is probably the best first step. Database writing was the bottleneck there.
  • Since Quickstatements is the biggest contributor to the current backlog, and it seems no one cares about it, we could just extend the list of excluded hashtags to include quickstatements. This may help resolve the immediate issue. temporary_batch is a corresponding hashtag for each of these edits which we could exclude too (unless we already throw the edits away by removing quickstatements). I'll file a new ticket for this.