Page MenuHomePhabricator

Request for access for user dr0ptp4kt for 'admin' tool
Open, Needs TriagePublic

Description

This is an access request to follow up https://wm-bot.wmcloud.org/logs/%23wikimedia-cloud-admin/20240118.txt:

[15:45:41] <dr0ptp4kt>	 i'm looking to correlate some toolforge.org web access and if i understand the https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#Webserver_statistics and https://phabricator.wikimedia.org/T178963 i'd need to shell into tools-proxy-05.tools.eqiad1.wikimedia.cloud and tools-proxy-06.tools.eqiad1.wikimedia.cloud . do i have that right? what would be the best way for me to gain that access?
[15:46:17] <dr0ptp4kt>	 i was thinking about whether i should be filing a phabricator task or be putting in for a puppet posix role thing or something else...if i should ask on another channel LMK!
[15:55:58] <taavi>	 dr0ptp4kt: you would need to be a maintainer of the 'admin' tool to do that. the process is outlined at https://wikitech.wikimedia.org/wiki/Help:Access_policies, although not all of that applies to you since it's primarly written with volunteers in mind and not non-WMCS foundation staff
[15:58:46] <taavi>	 i guess you should file a phab task with your use case, just to make sure what you're looking for isn't already available via https://toolviews.toolforge.org/ or similar
[15:59:39] <andrewbogott>	 dr0ptp4kt: I'd support adding you to that group as long as you accept the spider-man rule. Otherwise I'm ok acting as your remote hands if it's not hours and hours of poking around.
[16:00:13] <taavi>	 spider-man rule?
[16:01:35] <andrewbogott>	 https://en.wikipedia.org/wiki/With_great_power_comes_great_responsibility
[16:02:16] <dr0ptp4kt>	 :) - thanks, will circle back / around / er...i mean, scale the side of a building...a little later

The use is for correlation of request patterns in the data lake and those originating from tools.

Event Timeline

The use is for correlation of request patterns in the data lake and those originating from tools.

I don't have any objections to @dr0ptp4kt becoming a Toolforge admin, but I am curious about his stated intentions. Adam, is there a task that gives more information on what you are hoping to do here? On the surface I'm not sure what information having the access logs to the Toolforge front proxy will give you that relates to the production traffic recorded in the data lake.

The immediate term thing I'm checking is query density for WDQS, in particular for scholarly article oriented queries as part of the WDQS graph split.

For example, I'm trying to align page load scoped-200s contained in these:

curl -X GET "https://toolviews.toolforge.org/api/v1/tool/scholia/day/2024-05-05" -H "accept: application/json"

and query.wikidata.org SPARQL query counts like those in https://superset.wikimedia.org/superset/explore/p/N9kQ7ZanEZD/ which uses this query:

SELECT "http_method" AS "http_method",
       "content_type" AS "content_type",
       "http_status" AS "http_status",
       "cache_status" AS "cache_status",
       "access_method" AS "access_method",
       "agent_type" AS "agent_type",
       "referer_class" AS "referer_class",
       COUNT(*) AS "count"
FROM "wmf"."webrequest"
WHERE "year" = 2024
  AND "month" = 5
  AND "day" = 5
  AND "uri_host" = 'query.wikidata.org'
  AND "referer" LIKE 'https://scholia.toolforge.org/%'
GROUP BY "http_method",
         "content_type",
         "http_status",
         "cache_status",
         "access_method",
         "agent_type",
         "referer_class"
ORDER BY "count" DESC
LIMIT 10000;

In this tool's case there can be multiple queries per page load, which is most likely to scale with JS-capable clients making such requests.

This is a subset of the "Tool root" permissions [1] that are usually assigned to users who need to do administrative work in Toolforge. Given Adam's needs are more limited, I don't think we need to add any other permission other than the membership of the "admin" tool.

I am fairly certain he will also need to be added to the "roots" sudoers group via Horizon to be able to read the nginx logs from the front proxies that he is interested in data mining. https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin#What_makes_a_root/Giving_root_access