Page MenuHomePhabricator

Performance review of checkuser database queries [NOT READY]
Open, Needs TriagePublic

Description

Description

As part of the work CheckUser improvements, the tool now allows to search for multiple targets, and retrieve large sets of data. We are working to improve the performance of these queries:

Preview environment

Because of the sensitive nature of CheckUser, the tool is only available in limited spaces, and to limited number of people. Preview is even more limited;

At the moment the new CheckUser queries are available on testwiki and only for users with the staff right: https://test.wikipedia.org/wiki/Special:Investigate

[@Niharika do we have test users we can provide for an initial query?]

Which code to review

Queries built by PreliminaryCheckService, CompareService and TimelineService, in particular by these methods (or methods that they call):

  • CompareService::getQueryInfo (paginated)
  • CompareService::getTotalEditsFromIp
  • PreliminaryCheckService::preprocessResults
  • TimelineService::getQueryInfo (paginated)

Note that the paginated queries are built between two classes, so limits and offsets are added by IndexPager (via ComparePager and TimelinePager).

Performance assessment

Please initiate the performance assessment by answering the below:

  • What work has been done to ensure the best possible performance of the feature?
  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?
  • Are there potential optimisations that haven't been performed yet?
  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org.

Event Timeline

ARamirez_WMF updated the task description. (Show Details)
ARamirez_WMF updated the task description. (Show Details)
ARamirez_WMF added a subscriber: Tchanders.
ARamirez_WMF added a subscriber: dbarratt.

Assuming this is about CheckUser hence adding tag so this task can be found.

@Tchanders How is this ticket related to T248588? Should we be doing this first?

@Niharika I'm not entirely sure, but I think T248588 was created first, before we decided to contact the Performance team. While working on the Compare and Timeline tabs the engineers raised a few queries in code review about whether what we're doing is scaleable, but we didn't have the means to test it out before recently. I think ideally we'd have done the testing before merging the code, so we're sort of catching up with T248588.

Note: We should contact DBA's regarding this and general advice.

We will put together the information about which queries need a review, what the scale of typical results is, what the outlier number of results might be, and what application features/details we are using the data for. We hope that will help the DBAs more efficiently review the queries and their impact.

Additionally, we've identified a few places where queries might be batched or the code structured otherwise in an effort to provide overall performance benefits. We think that the Performance team would be best suited to help work through our questions there. We will provide more direct pointers to those areas in the code.

Note: We should contact DBA's regarding this and general advice.

If you think DBAs input is needed, feel free to tag DBA once ready (or create a dedicated subtask), I'd say. :) In general, as the task summary says "Performance Review", see https://www.mediawiki.org/wiki/Wikimedia_Performance_Team/Performance_Review how to request from Performance team (not DBA).