Page MenuHomePhabricator

Phonos links to an unauthorized URL
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Description

Current issue

Steps to replicate the issue (include links if applicable):

What happens?:
The phonos play button shows an error, "Unable to play audio. Refresh the page and try again." β€” directly going to the inspected URL (e.g. https://upload.wikimedia.beta.wmflabs.org/phonos/0/h/0hp7eif2wwbuhif94n42bzm95o71z9i.mp3) gives the error "Unauthorized. This server could not verify that you are authorized to access the document you requested."

What should have happened instead?:
The audio plays.


Old issue

Steps to replicate the issue (include links if applicable):

What happens?:
The play button shows the "loading crosshatching" indefinitely (or until it maybe times out), as it's trying to play http://deployment-ms-fe03.deployment-prep.eqiad.wmflabs/v1/AUTH_mw/global-data-phonos/4b67c6b189cdf4ad2d46a9fa6bac2813.mp3?temp_url_sig=[snip]&temp_url_expires=1662820524. The domain deployment-ms-fe03.deployment-prep.eqiad.wmflabs is not publicly routable.

What should have happened instead?:
The file URI is publicly routable, and the audio plays. I believe this should be upload.wikimedia.beta.wmflabs.org?

Software version (skip for WMF-hosted wikis like Wikipedia):

  • Beta cluster
$wgPhonosEngine = 'google';
$wgPhonosFileBackend = 'global-multiwrite';

Other information (browser name/version, screenshots, etc.):

image.png (454Γ—590 px, 69 KB)

Event Timeline

I suspect the issue is at Engine.php#L119-L121 which has never been tested. FileBackend::getFileHttpUrl() sounded like what we wanted, so I just put it there... hehe. Extension:Score uses a different system. Perhaps getFileHttpUrl() isn't what we want at all. I'll try to investigate.

Yeah, the URL in the description is the internal Swift URL, not what's publicly exposed.

I know @MusikAnimal is working on this (thank you!), so this is more of a general question β€” from what I've found on wikitech and elsewhere, we have little to no documentation on setting up a local Swift cluster for development. Am I correct, or is there a tutorial that I've missed?
If and when we look back on this in the future, this thread on Slack had some good discussion about this issue.

Change 831934 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Manually construct storage paths

https://gerrit.wikimedia.org/r/831934

Change 831934 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Manually construct storage paths

https://gerrit.wikimedia.org/r/831934

Change 831941 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[operations/mediawiki-config@master] InitialiseSettings-labs.php: Set $wgPhonosPath

https://gerrit.wikimedia.org/r/831941

Change 831941 merged by jenkins-bot:

[operations/mediawiki-config@master] InitialiseSettings-labs.php: Set $wgPhonosPath

https://gerrit.wikimedia.org/r/831941

Mentioned in SAL (#wikimedia-operations) [2022-09-13T18:28:49Z] <TheresNoTime> deploying a beta cluster only config change, T317417

Mentioned in SAL (#wikimedia-operations) [2022-09-13T18:31:48Z] <samtar@deploy1002> Synchronized wmf-config/InitialiseSettings-labs.php: Config: [[gerrit:831941|InitialiseSettings-labs.php: Set $wgPhonosPath (T317417)]] (duration: 03m 45s)

Change 831955 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[operations/puppet@production] rewrite.py: changes for Phonos deployment

https://gerrit.wikimedia.org/r/831955

Mentioned in SAL (#wikimedia-releng) [2022-09-21T18:49:51Z] <TheresNoTime> cherry-picked [[gerrit:833839]] to deployment-puppetmaster04, testing T317417

^ that above is a cherry pick of 831955, which has given us a bit of progress!

Unfortunately on visiting the generated URL (e.g. https://upload.wikimedia.beta.wmflabs.org/phonos/0/h/0hp7eif2wwbuhif94n42bzm95o71z9i.mp3) the error has changed from Regexp failed to match URI [...] to This server could not verify that you are authorized to access the document you requested.

Noting that T316845: deployment-ms-fe03 puppet failure could be preventing the puppet changes being fully applied, as this is the server these changes are directed at (?)

The proxy-access log shows

Sep 21 19:41:46 deployment-ms-fe03 proxy-server: 86.143.146.126 172.16.1.160 21/Sep/2022/19/41/46 GET /v1/AUTH_mw/global-data-phonos-render/6/i/6imkecj6i6uq78j1h3t0rvw2ytlncvp.mp3 HTTP/1.0 401 https://en.wikipedia.beta.wmflabs.org/ [snip]

Reading the discussion on https://gerrit.wikimedia.org/r/c/operations/puppet/+/831955/ I'm left a bit confused - is the aim to use swift as essentially a cache for these sound files? If so, how is expiring them being managed? If not, is the problem with writing a unit test that currently this is only deployed on beta and so there just aren't any present in prod-swift as yet?

Reading the discussion on https://gerrit.wikimedia.org/r/c/operations/puppet/+/831955/ I'm left a bit confused - is the aim to use swift as essentially a cache for these sound files? If so, how is expiring them being managed?

Correct (though we've been reminded "persistent storage" is more appropriate, as we kept referring to it as "cache"). Due to the small size of these files, and the relative inexpense of storage space, expiring is being handled by a maintenance script run infrequently (yearly?)

is the problem with writing a unit test that currently this is only deployed on beta and so there just aren't any present in prod-swift as yet?

The problem from my understanding is that we are unsure of how to write a unit test which doesn't "check against response codes" (see @MusikAnimal's comment on 831955 from September 19th) β€” if there is any documentation or similar unit tests you could point us to, that would be very much appreciated.

For what it's worth, I've deployed this change to the beta cluster (see above comment) and we are now facing a This server could not verify that you are authorized to access the document you requested. error.
T316845 seems to be preventing a clean puppet change from occurring on deployment-ms-fe03, so perhaps that could be what is causing the above error?

Change 837107 had a related patch set uploaded (by Samtar; author: Samtar):

[operations/puppet@production] hieradata, beta cluster: Add phonos to `shard_container_list`

https://gerrit.wikimedia.org/r/837107

Reading the discussion on https://gerrit.wikimedia.org/r/c/operations/puppet/+/831955/ I'm left a bit confused - is the aim to use swift as essentially a cache for these sound files? If so, how is expiring them being managed?

Correct (though we've been reminded "persistent storage" is more appropriate, as we kept referring to it as "cache"). Due to the small size of these files, and the relative inexpense of storage space, expiring is being handled by a maintenance script run infrequently (yearly?)

How big are the files themselves (min, max, average)? How many do we expect; What's the anticipated total storage? Read & write rates?

How big are the files themselves (min, max, average)? How many do we expect; What's the anticipated total storage? Read & write rates?

We aren't setting a maximum file size currently, but we do have a maximum amount of IPA that can be passed to the parser tag. That is currently set to 300 bytes (T316641). The generated MP3 in that case is somewhere in the neighborhood of 40-50kb, but on average files are going to be maybe be 3-5kb. The number of files generated depends on how well the communities adopt this feature. If fully rolled out across all wikis (we use Phonos everywhere we show IPA), we're probably looking at many hundreds of thousands of files, but I think it will be quite a while before we reach that point. Read rates are estimated to at around ~1.8 million a month (T307625). Write rates is harder to estimate, again depending on how communities adopt this feature. Since most wikis have a Template:IPA or something similar, the Phonos parser tag will likely be put there. Thus, the initial rollout will see a very high number of writes (directly proportional to the number of transclusions of Template:IPA), but we are building our own type of job and making it go slower than the normal job queue rate (T318086). After the rollout, writes will likely be by comparison relatively rare. Note however some communities might prefer to opt-in to using Phonos on a case-by-case basis, making the rollout longer and hence lower write rates.

TheresNoTime renamed this task from Phonos links to unroutable domain/URL for the MP3 file to Phonos links to an unauthorized URL.Oct 10 2022, 12:13 PM
TheresNoTime updated the task description. (Show Details)
TheresNoTime changed the point value for this task from 3 to 5.Oct 10 2022, 12:15 PM

@MatthewVernon continuing from T316845, (and I know I'm pushing my luck here, sorry!)... I ended up fixing that stray .

I think I've narrowed this down to a part of our config being incorrect somewhere.

If I wget the file as requested (seen in /var/log/swift/proxy-access.log), I replicate the "401 unauthorized" error

samtar@deployment-ms-fe03:~$ wget localhost/v1/AUTH_mw/global-data-phonos-render/r/m/rm8vqwp931szd29vm2xo3ctknptsp0d.mp3
--2022-10-11 11:24:53--  http://localhost/v1/AUTH_mw/global-data-phonos-render/r/m/rm8vqwp931szd29vm2xo3ctknptsp0d.mp3
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:80... failed: Connection refused.
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 401 Unauthorized

Username/Password Authentication Failed.

However, if I change /v1/AUTH_mw/global-data-phonos-render/... to /v1/AUTH_mw/global-data-phonos/..., I can wget the file just fine

samtar@deployment-ms-fe03:~$ wget localhost/v1/AUTH_mw/global-data-phonos/r/m/rm8vqwp931szd29vm2xo3ctknptsp0d.mp3
--2022-10-11 11:24:58--  http://localhost/v1/AUTH_mw/global-data-phonos/r/m/rm8vqwp931szd29vm2xo3ctknptsp0d.mp3
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:80... failed: Connection refused.
Connecting to localhost (localhost)|127.0.0.1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5088 (5.0K) [audio/mpeg]
Saving to: β€˜rm8vqwp931szd29vm2xo3ctknptsp0d.mp3’

rm8vqwp931szd29vm2xo3ctknptsp0d.mp3            100%[===================================================================================================>]   4.97K  --.-KB/s    in 0s

2022-10-11 11:24:58 (839 MB/s) - β€˜rm8vqwp931szd29vm2xo3ctknptsp0d.mp3’ saved [5088/5088]

Running swift list:

samtar@deployment-ms-fe03:~$ swift list | grep phonos
global-data-phonos

shows the container is called global-data-phonos and not global-data-phonos-render...

Where have we set this? πŸ™ƒ


I see it's (maybe) set in https://gerrit.wikimedia.org/r/c/operations/puppet/+/831955, but I tried removing zone (caused errors) and setting zone to '' (did not help). It feels like we're infuriatingly close to figuring this out.

So that render is coming from the zone setting in your rewrite.py change. But it defaults (I think - rewrite.py is a horrible minefield) to public (rather than an empty string), and in any case, that container doesn't exist:

mvernon@deployment-ms-fe03:~$ swift list | grep phon
global-data-phonos
mvernon@deployment-ms-fe03:~$ swift list global-data-phonos
0/h/0hp7eif2wwbuhif94n42bzm95o71z9i.mp3
0435883e6f8142bced9ce621ef74233ac971bfc6.mp3
4b67c6b189cdf4ad2d46a9fa6bac2813.mp3
6/i/6imkecj6i6uq78j1h3t0rvw2ytlncvp.mp3
l/i/li7anc18p009fwl9u7ovnoae7lzx0pd.mp3
l/u/lupe69kf885492csbqusm0r8i7amht1.mp3
p/s/ps5wmw93grzjpn4n84el8ehd0hqdf59.mp3
q/8/q862dkpbsvwgsx29kzsf1789p0wv8uk.mp3
r/m/rm8vqwp931szd29vm2xo3ctknptsp0d.mp3

So I think your writer is writing to the wrong location? I think global-data-phonos cannot be correct; but I'm rather out of my comfort zone here...

I think global-data-phonos-render is likely the correct location (per https://wikitech.wikimedia.org/wiki/Media_storage#File/object_structure ), so its your writer that needs to be updated (and that container will need creating).

Change 841488 had a related patch set uploaded (by Samtar; author: Samtar):

[mediawiki/extensions/Phonos@master] Engine: Set STORAGE_PREFIX to `phonos-render`

https://gerrit.wikimedia.org/r/841488

Change 841488 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Engine: Set STORAGE_PREFIX to `phonos-render`

https://gerrit.wikimedia.org/r/841488

Change 837107 abandoned by Samtar:

[operations/puppet@production] hieradata, beta cluster: Add phonos to `shard_container_list`

Reason:

https://gerrit.wikimedia.org/r/837107

With the utmost thanks to @MatthewVernon and everyone else who has commented & given suggestions elsewhere, this now works.

E.g. https://en.wikipedia.beta.wmflabs.org/wiki/User:TheresNoTime/sandbox

Outstanding work, as I see it, is:

@TheresNoTime View a page on the Beta Cluster with a Phonos parser function -Audio now plays successfully! Moving to production sign-off, thanks!

Tested link: https://en.wikipedia.beta.wmflabs.org/wiki/User:TheresNoTime/sandbox

T317417_Phonos_UnauthorizedURL.png (1Γ—3 px, 239 KB)

How big are the files themselves (min, max, average)? How many do we expect; What's the anticipated total storage? Read & write rates?

We aren't setting a maximum file size currently, but we do have a maximum amount of IPA that can be passed to the parser tag. That is currently set to 300 bytes (T316641). The generated MP3 in that case is somewhere in the neighborhood of 40-50kb, but on average files are going to be maybe be 3-5kb. The number of files generated depends on how well the communities adopt this feature. If fully rolled out across all wikis (we use Phonos everywhere we show IPA), we're probably looking at many hundreds of thousands of files, but I think it will be quite a while before we reach that point. Read rates are estimated to at around ~1.8 million a month (T307625). Write rates is harder to estimate, again depending on how communities adopt this feature. Since most wikis have a Template:IPA or something similar, the Phonos parser tag will likely be put there. Thus, the initial rollout will see a very high number of writes (directly proportional to the number of transclusions of Template:IPA), but we are building our own type of job and making it go slower than the normal job queue rate (T318086). After the rollout, writes will likely be by comparison relatively rare. Note however some communities might prefer to opt-in to using Phonos on a case-by-case basis, making the rollout longer and hence lower write rates.

Thanks for the info; Apologies for not following up on this sooner. I asked about utilization because we (Data-Persistence) are trying to engage with projects that need storage earlier, understand the requirements, be in a position to offer feedback, and plan accordingly. Ideally, earlier would be earlier than where we are now, but it would still be great to run through your storage requirements.

Since this seems out of scope for this ticket (sorry about that) -and since I didn't find a suitable existing issue- I've stubbed out T320675 for this.

TheresNoTime changed the task status from Open to Stalled.Nov 11 2022, 3:17 PM

This operations/puppet patch requires +2 β€” it is currently cherry-picked to the beta cluster. This needs to happen prior to any production rollout of Phonos

Change 831955 merged by MVernon:

[operations/puppet@production] rewrite.py: changes for Phonos deployment

https://gerrit.wikimedia.org/r/831955

Mentioned in SAL (#wikimedia-operations) [2022-11-17T12:06:11Z] <Emperor> restart swift proxies to deploy phonos changes to rewrite.py T317417

There's nothing to QA here I don't think? The patch was for production and is identical to what Sammy already cherry-picked on Beta. The real test comes when we get Phonos deployed to our first production wiki (T321084). I'm going to be bold and mark this as resolved. Thanks to all who helped with this task!