Page MenuHomePhabricator

Add page properties for Phonos usage data
Closed, ResolvedPublic3 Estimated Story Points

Description

As a follow-up to T324233: Create maintenance script to count/delete orphaned Phonos files, it would be more scalable to leverage the page_props table to store which Phonos files are used on a page. These properties are regenerated on every reparse, which is exactly when Phonos files are created/changed.

This still wouldn't allow us to easily find unused Phonos files, but the countOrphanFiles.php maintenance script could be dramatically faster if we go off of page properties rather than looping through the tracking category and scraping the HTML.

Acceptance criteria

  • Have Phonos store a single page property that records all Phonos files that are used on the page
  • Update the countOrphanFiles.php script to go off of page props rather than the tracking category.

QA notes

  • If you're testing on your local that already has Phonos pages, make sure to run maintenance/run refreshLinks --category='Pages that use Phonos' after pulling in the new code.
  • Changes to page props require an actual page edit, or null edit (essentially what the refreshLinks maintenance script does)
  • Your testing is probably more easily done using the newly updated countOrphanFiles.php script (T324233)
  • You can query which Phonos page props exist against the database with SELECT * FROM page_props WHERE pp_propname = 'phonos-files'

Event Timeline

Restricted Application added a subscriber: Aklapper. Β· View Herald Transcript

Have Phonos store a page property, one for each Phonos file that is used

How would this look like?

How would this look like?

One possibility: every time we save a file, we would also store the filename as a page property. There could be multiple of course, so we'd number them, e.g. {phonos-audio-1: a9djqcrc40bj921favxipg3987kqshe.mp3}, {phonos-audio-2: ayg9wa1waee3zzn1fesbpu84wviqyuy.mp3}, …. (The .md3 could be left out.)

Then, for counting orphaned files we'd be able to do something like

  1. count all files in storage;
  2. count all files in use, e.g. SELECT COUNT(DISTINCT pp_value) FROM page_props WHERE pp_propname LIKE "phonos-audio-%" ;
  3. subtract one from the other.

And to delete orphaned files, I guess it'd be a matter of looping through all stored files, and for each one checking if it's in the page_props table.

One complication is that usage would be stored on all the separate wikis, and so the querying would have to be repeated on all databases.

It may also be quicker to go the other way around, and first query for all in-use, and then use that list to check the stored files.

I'm doing this as part of T324233: Create maintenance script to count/delete orphaned Phonos files since it will make that work much easier and more efficient.

Change 893838 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Store usage of Phonos files as page properties

https://gerrit.wikimedia.org/r/893838

Change 893838 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Store usage of Phonos files as page properties

https://gerrit.wikimedia.org/r/893838

Mentioned in SAL (#wikimedia-releng) [2023-03-07T17:48:38Z] <TheresNoTime> (deployment-prep) samtar@deployment-mwmaint02:~$ mwscript maintenance/refreshLinks.php --wiki enwiki --category='Pages that use Phonos' for T326163

Mentioned in SAL (#wikimedia-releng) [2023-03-07T17:52:04Z] <TheresNoTime> (deployment-prep) Ctrl+C'd mwscript maintenance/refreshLinks.php --wiki enwiki --category='Pages that use Phonos', taking "a long time", saw GlobalVarConfig::get: undefined option: 'PhonosStoreFilesAsMp3', T326163

Moving this back to "In development" as well, because for QA purposes it will be easier to test this alongside T324233.

Mentioned in SAL (#wikimedia-releng) [2023-03-14T07:17:26Z] <dwalden> dwalden@deployment-mwmaint02:~$ mwscript maintenance/refreshLinks.php --wiki enwiki --category='Pages that use Phonos' for T326163

Mentioned in SAL (#wikimedia-releng) [2023-03-14T07:24:47Z] <dwalden> dwalden@deployment-mwmaint02:~$ mwscript maintenance/refreshLinks.php --wiki enwiktionary --category='Pages that use Phonos' for T326163

Mentioned in SAL (#wikimedia-releng) [2023-03-14T07:25:14Z] <dwalden> dwalden@deployment-mwmaint02:~$ mwscript maintenance/refreshLinks.php --wiki en_rtlwiki --category='Pages that use Phonos' for T326163

dom_walden closed this task as Resolved.EditedMar 14 2023, 2:33 PM
dom_walden subscribed.

I used a script to check that all the Phonos files which appear on pages on enwiki beta also appear in the phonos-files property of the page (via the API:Pageprops).

EDIT I also tested that Phonos files transcluded in templates also appear in the phonos-files property of the page (e.g. https://en.wikipedia.beta.wmflabs.org/wiki/Phonos_Template_Usage and https://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&titles=Phonos_Template_Usage&prop=pageprops&format=json)

Test environment: https://en.wikipedia.beta.wmflabs.org Phonos 0.1.0 (feea481) 09:53, 13 March 2023.