Page MenuHomePhabricator

Cannot filter Phase 2 export list
Closed, ResolvedPublic

Description

In "Reports/Phase 2", when I try to filter the list by a combination of Global North/South and a specific language community size, I receive a "504 Gateway Timeout message". However, I do not receive an error message when filtering by just Global North/South or a language community.

I have tried this multiple times on both Safari (Version 8.0) and Chrome (Version 41.0.2272.104 (64-bit)).

Event Timeline

Shouston_WMF raised the priority of this task from to Needs Triage.
Shouston_WMF updated the task description. (Show Details)
Shouston_WMF added a subscriber: Shouston_WMF.
Niharika set Security to None.

I think the problem is triggered by the language community size option independent of any other settings for the report.

I haven't been able to replicate this on my local instance. Could it be because of large amount of data? (Seems unlikely)

In T94373#1167980, @NiharikaKohli wrote:

I haven't been able to replicate this on my local instance. Could it be because of large amount of data? (Seems unlikely)

Yeah I think that is exactly the problem. The SQL that is being used for the report is horribly inefficient and apparently always has been. I'll take a crack at making it sane since I copy-pasted the original bad query which has just been made worse and worse as we have added more features following the original bad patterns.

Hideous SQL:

SELECT s.id,
  s.fname,
  s.lname,
  s.email,
  s.residence,
  s.exclude,
  s.gender,
  YEAR(NOW()) - YEAR(s.dob) AS age,
  c.country_name,
  COALESCE(p1score, 0) AS p1score,
  COALESCE(nscorers, 0) AS nscorers,
  rkrexp.relexp AS relexp,
  rkexps.expshare AS expshare,
  (COALESCE(:relexp * rkrexp.relexp, 0) + COALESCE(:expshare * rkexps.expshare, 0)) as p2score
FROM scholarships s 
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      AVG(rank) AS relexp
    FROM rankings
    WHERE criterion = 'relexp'
    GROUP BY scholarship_id
  ) rkrexp ON s.id = rkrexp.scholarship_id
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      AVG(rank) AS expshare
    FROM rankings
    WHERE criterion = 'expshare'
    GROUP BY scholarship_id
  ) rkexps ON s.id = rkexps.scholarship_id
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      SUM(rank) AS p1score
    FROM rankings
    WHERE criterion = 'valid'
    GROUP BY scholarship_id
  ) p1 ON s.id = p1.scholarship_id
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      COUNT(DISTINCT user_id) AS nscorers
    FROM rankings
    WHERE criterion <> 'valid'
    GROUP BY scholarship_id
  ) ns ON s.id = ns.scholarship_id
  LEFT OUTER JOIN iso_countries c ON s.residence = c.code
  LEFT OUTER JOIN language_communities l ON s.community = l.code
  INNER JOIN iso_countries c1 ON c.region = :region
  INNER JOIN iso_countries c2 ON c.globalns = :globalns
  INNER JOIN language_communities l1 ON l.size = :languageGroup
  GROUP BY s.id, s.fname, s.lname, s.email, s.residence
  HAVING p1score >= :int_phase1pass
    AND s.exclude = 0
  ORDER BY p2score DESC, s.id ASC

I really don't know if this even ever worked as hoped. The inner joins being used for filtering rather than these being constraints of the having clause is horrible.

Better SQL:

SELECT s.id,
  s.fname,
  s.lname,
  s.email,
  s.residence,
  s.exclude,
  s.gender,YEAR(NOW()) - YEAR(s.dob) AS age,
  c.country_name,COALESCE(p1score, 0) AS p1score,
  COALESCE(nscorers, 0) AS nscorers,
  rkrexp.relexp AS relexp,
  rkexps.expshare AS expshare,
  (COALESCE(:relexp * rkrexp.relexp, 0) + COALESCE(:expshare * rkexps.expshare, 0)) as p2score,
  c.region,
  c.globalns,
  l.size
FROM scholarships s 
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      AVG(rank) AS relexp
    FROM rankings
    WHERE criterion = 'relexp'
    GROUP BY scholarship_id
  ) rkrexp ON s.id = rkrexp.scholarship_id
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      AVG(rank) AS expshare
    FROM rankings
    WHERE criterion = 'expshare'
    GROUP BY scholarship_id
  ) rkexps ON s.id = rkexps.scholarship_id
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      SUM(rank) AS p1score
    FROM rankings
    WHERE criterion = 'valid'
    GROUP BY scholarship_id
  ) p1 ON s.id = p1.scholarship_id
  LEFT OUTER JOIN (
    SELECT scholarship_id,
      COUNT(DISTINCT user_id) AS nscorers
    FROM rankings
    WHERE criterion <> 'valid'
    GROUP BY scholarship_id
  ) ns ON s.id = ns.scholarship_id
  LEFT OUTER JOIN iso_countries c ON s.residence = c.code
  LEFT OUTER JOIN language_communities l ON s.community = l.code
GROUP BY s.id, s.fname, s.lname, s.email, s.residence
HAVING (p1score >= :int_phase1pass)
  AND (s.exclude = 0)
  AND (c.region = :region)
  AND (c.globalns = :globalns)
  AND (l.size = :languageGroup)
ORDER BY p2score DESC, s.id ASC

Patch coming soon.

Change 201070 had a related patch set uploaded (by BryanDavis):
Improve performance of phase2 report query

https://gerrit.wikimedia.org/r/201070

Change 201070 merged by jenkins-bot:
Improve performance of phase2 report query

https://gerrit.wikimedia.org/r/201070

bd808 moved this task from Needs Review to Done on the Wikimedia-Wikimania-Scholarships board.

Update deployed to production host