Page MenuHomePhabricator

Requesting SQL code review for application on Toolforge
Closed, ResolvedPublic

Description

Short version

Please help me determine if the SQL queries below violate Toolforge rule #6 - Do not provide direct access to Cloud Services resources to unauthenticated users.

Long version

Technical documentation dashboard (T334839, code, app) will soon be able to query replicas for information about edits, edit sizes, and top editors. This is implemented in replica_client.clj. I want to make sure my implementation does not violate Toolforge rule #6 (Do not provide direct access to Cloud Services resources to unauthenticated users).

The dashboard does not authenticate users and is publicly available. When this feature is rolled out, users will be able to specify a page pile (https://pagepile.toolforge.org) with a list of pages they want to query the replicas about. Each request will issue three parametrized SQL SELECT queries like below. The date range is currently hard-coded to the last 90 days. Page titles and namespaces are taken from the page pile.

Results are cached and SQL queries are rate limited.

Please evaluate whether these SQL queries and implementation details violate Toolforge rule #6.

Page edits

SELECT DATE(LEFT(r.rev_timestamp, 8)) AS d, COUNT(r.rev_id) AS c
FROM page AS p
LEFT JOIN revision AS r ON r.rev_page = p.page_id
LEFT JOIN actor AS a ON r.rev_actor = a.actor_id
WHERE (r.rev_deleted = ?) AND DATE(r.rev_timestamp) BETWEEN ? AND ?
AND ((p.page_title IN (?, ?)) OR (((p.page_title = ?) AND (p.page_namespace = ?)) OR ((p.page_title = ?) AND (p.page_namespace = ?))))
GROUP BY d
ORDER BY r.rev_timestamp ASC

Top editors

SELECT a.actor_name AS name, COUNT(r.rev_id) AS edits
FROM page AS p
LEFT JOIN revision AS r ON r.rev_page = p.page_id
LEFT JOIN actor AS a ON r.rev_actor = a.actor_id
WHERE (r.rev_deleted = ?) AND DATE(r.rev_timestamp) BETWEEN ? AND ?
AND ((p.page_title IN (?, ?)) OR (((p.page_title = ?) AND (p.page_namespace = ?)) OR ((p.page_title = ?) AND (p.page_namespace = ?))))
GROUP BY name
ORDER BY edits DESC
LIMIT ?

Big/small and major/minor edits

SELECT q.big_edit AS big_edit, q.minor_edit AS minor_edit, COUNT(*) AS number_of_edits
FROM
(SELECT IF(ABS(CAST(r.rev_len AS SIGNED) - CAST(IFNULL(pr.rev_len, ?) AS SIGNED)) >= ?, TRUE, FALSE) AS big_edit, r.rev_minor_edit AS minor_edit
FROM page AS p
LEFT JOIN revision AS r ON r.rev_page = p.page_id
LEFT JOIN revision AS pr ON pr.rev_id = r.rev_parent_id
WHERE (r.rev_deleted = ?) AND DATE(r.rev_timestamp) BETWEEN ? AND ?
AND ((p.page_title IN (?, ?)) OR (((p.page_title = ?) AND (p.page_namespace = ?)) OR ((p.page_title = ?) AND (p.page_namespace = ?)))) AS q
GROUP BY big_edit, minor_edit

Event Timeline

taavi claimed this task.
taavi subscribed.

Hi. We talked about this in our team meeting yesterday. This kind of usage is fine (and relatively common thing to do in a tool) - that rule is meant to apply to tools that just have a free-text SQL input or something similar which does not seem to be the case here.