Page MenuHomePhabricator

Flame graphs don't seem to be collected on auth.wikimedia.org
Open, LowPublic

Description

I wanted to check what time is spent on during a login, but https://performance.wikimedia.org/arclamp/svgs/daily/2025-10-30.excimer-wall.index.svgz?x=591.8&y=869 only shows method calls which happen on the local leg of login.

Event Timeline

It also only shows the SUL login stack (private wikis should call LocalPasswordAuthenticationProvider::beginPrimaryAuthentication()) but I assume that's just how sampling works out?

@Krinkle suggested this might just be sampling. It certainly works when manually enabled for single requests. (E.g. https://performance.wikimedia.org/excimer/profile/3a2a986048c0239d Also, wow, 30% of that login attempt's time is spent in wmfGetPrivilegedGroups...)

I don't think that matches the data though. There are 1,8K samples for CentralAuthRedirectingPrimaryAuthenticationProvider (the main handler for the local leg of login) calling CentralAuthTokenManager::tokenize(), which is only done before redirecting to the central domain. But there are zero calls to CentralAuthPrimaryAuthenticationProvider (the main handler for the central leg of login). The user would have to submit the central login page for that to happen, so it will happen less often than the tokenize calls. But zero out of 1800 redirects to the central login page leading to a login attempt seems suspect.

OK. I'll do some deeper tracing to see if it gets lost for different reasons. In the Arc Lamp client instrument, we set the main frame based on $_SERVER['SCRIPT_FILENAME'] when collecting trace samples. Then in arclamp-log.py we use the same first frame to set a tag, which determines the log filename, and thus the svg filename.

Given that auth.wikimedia has a special RewriteRule, maybe thats getting messed up? I don't see why it would be very different from any other rewrite like /wiki/: and such, but let's have a look.

Also, wow, 30% of that login attempt's time is spent in wmfGetPrivilegedGroups...

Filed as T410878: wmfGetPrivilegedGroups is slow.

[…] In the Arc Lamp client instrument, we set the main frame based on $_SERVER['SCRIPT_FILENAME'] when collecting trace samples. […] Given that auth.wikimedia has a special RewriteRule, maybe thats getting messed up? […]

diff --git a/src/Profiler.php b/src/Profiler.php
index 11b074a342..e9ab9de795 100644
--- a/src/Profiler.php
+++ b/src/Profiler.php
@@ -278,6 +279,9 @@ class Profiler {
 			// from there, if this causes a problem.
 			$publicLink = $client->getUrl();
 			header( 'excimer-ui-link: ' . $publicLink );
+
+			header( 'X-Krinkle-SCRIPT_FILENAME_0_raw: ' . $_SERVER['SCRIPT_FILENAME'] );
+			header( 'X-Krinkle-SCRIPT_FILENAME_1_realpath: ' . realpath( $_SERVER['SCRIPT_FILENAME'] ) );
 		}
 	}

The auth domain does make a small changer to this value, but that difference is gone after the realpath normalization. I could not find a notable difference in other _SERVER values used by Excimer / Arc Lamp instrumentation.

URL = https://test.wikipedia.org/w/load.php?xyz

server: mw-experimental.eqiad.pinkllama-…
x-powered-by: PHP/8.3.29
x-krinkle-script_filename_0_raw: /srv/mediawiki/docroot/wikipedia.org/w/load.php
x-krinkle-script_filename_1_realpath: /srv/mediawiki/w/load.php
URL = https://auth.wikimedia.org/testwiki/w/load.php

server: mw-experimental.eqiad.pinkllama-…
x-powered-by: PHP/8.3.29
x-krinkle-script_filename_0_raw: /srv/mediawiki/docroot/auth/w/load.php
x-krinkle-script_filename_1_realpath: /srv/mediawiki/w/load.php