Page MenuHomePhabricator

Treat phonos parser function as expensive
Closed, ResolvedPublic3 Estimated Story Points

Description

Background

To guard against too many API requests to Google, and to prevent colossal misuse of Phonos in general, it can be registered as an expensive parser function. What this does is prevent too many uses on a single page, as specified by $wgExpensiveParserFunctionLimit. When this is exceeded, we're supposed to skip over whatever is the "expensive" part, in our case doing any sort of API call. Pages that have too many expensive parser function calls are categorized into Category:Pages with too many expensive parser function calls. The "expensive parser function count" may also be influenced by other functions, such as {{#ifexist:}}, related to Phonos. Regardless, this doesn't happen too often in the wild and when it does communities generally know how to debug and handle it.

Note that this doesn't matter very much at the time of writing since the Google API call only happens on-click, but we're slated to make all the calls to the engine from the parser hook (T315481).

Acceptance criteria

  • If the limit is exceeded after we increment the count, Phonos shouldn't render the Phonos button, unless the file already exists.

Other notes

  • Unfortunately we can't determine if there's a cache hit without making a trip to Shellbox, so we may have to increment the counter even if we aren't actually making external API calls. However at least if the file= parameter is set, we should bypass calling Parser::incrementExpensiveFunctionCount() since we know that's cheap.
  • A better design would probably be to show an informative error. This could be shown instead of the play button.

Event Timeline

MusikAnimal renamed this task from Set IPA parser func as expensive to Treat phonos parser function as expensive.Aug 17 2022, 11:45 PM

Change 824318 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Treat phonos parser function as expensive

https://gerrit.wikimedia.org/r/824318

Change 824318 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Treat phonos parser function as expensive

https://gerrit.wikimedia.org/r/824318

MusikAnimal subscribed.

I'm going to rework this a tiny bit for T315481 so I'll keep it in the "In Development" column for now.

QA notes: Can be tested as part of T315481; namely, the "expensive parser function count" should not be incremented when a cached file is already available.

An easy way to test locally is to have a page with >1 uses of the parser function, and set $wgExpensiveParserFunctionLimit to just 1. Mind the note above; on the first render you should see no output from the second parser function since the limit was exceeded. But if you then increase $wgExpensiveParserFunctionLimit, reload the page, then change $wgExpensiveParserFunctionLimit back to 1, the second parser function should still have output since the file is now cached.

@MusikAnimal How is this supposed to work?

I've set $wgExpensiveParserFunctionLimit = 10; and saved a page with > 20 {{#phonos}} templates.

Only 10 of the templates produce output on the page, but I see 20 files in the local cache.

EDIT: Actually, I think this only happens when you click "Show preview".

Testing notes
  1. In LocalSettings.php, enter: $wgExpensiveParserFunctionLimit = 10;
  2. Enter a large number of {{#phonos}} templates on an article (e.g. you can copy and paste this P33287)
  3. After you save the article, only 10 instances of the phonos template will render on the page
  4. Also look in images/my_wiki-phonos/ and you will see 10 more audio files

Every time you save the article, look at the preview or diff while editing, or purge the article, 10 more phonos templates will be generated.

@MusikAnimal How is this supposed to work?

I've set $wgExpensiveParserFunctionLimit = 10; and saved a page with > 20 {{#phonos}} templates.

Only 10 of the templates produce output on the page, but I see 20 files in the local cache.

EDIT: Actually, I think this only happens when you click "Show preview".

Sorry I missed your ping! I think this logic might be partially reworked as part of T318086, but either way I'll get this out of the QA column for now until we know what we're doing.

Also FYI, in case you didn't see the updates on Slack, Phonos now uses a parser tag (docs), and the storage location is different too. We're now doing two hash level directories (to match what Swift wants in production), but it's always the first 2 characters of the file name. I.e. for a file named akz3ngftxhvamgs9p0hfttybgavgig6.mp3, it should be in the /images/phonos/a/k/ directory. There's a chance this will change again, as we're still trying to figure out why things aren't working with Swift (T317417).

I believe this can be moved to QA now(?)

Apologies for taking so long to get back to this!

Testing notes
  1. In LocalSettings.php, enter: $wgExpensiveParserFunctionLimit = 10;
  2. Enter a large number of {{#phonos}} templates on an article (e.g. you can copy and paste this P33287)
  3. After you save the article, only 10 instances of the phonos template will render on the page
  4. Also look in images/my_wiki-phonos/ and you will see 10 more audio files

Every time you save the article, look at the preview or diff while editing, or purge the article, 10 more phonos templates will be generated.

Things have changed now, following T318086. Now when we exceed the expensive parser function limit, a job is spawned to create the file. Here's the steps I took:

  • Remove all the old images first (rm -rf images/phonos-render)
  • Copy/paste P33287 on to a page and save (do not preview first)
  • Very quickly run find images/phonos-render/ -name "*.mp3" | wc -l and you should get 10 (or whatever $wgExpensiveParserFunctionLimit is set to)
  • Continually rerun find images/phonos-render/ -name "*.mp3" | wc -l and you should see the number go up (assuming your job queue is set up properly, which should be the case out-of-the-box for MediaWiki-Docker).
  • I end up with 63, the same number of <phonos> tags in P33287.

What you described made it sound like something was broken before, but it seems to be working as expected now. Thus, moving back to QA.

Acceptance criteria

  • If the limit is exceeded after we increment the count, Phonos shouldn't render the Phonos button, unless the file already exists.

This behaviour has been superseded in favour of T318979. The phonos button always displays, even if the file does not exist.

When a user clicks on it, they see an error: Unable to play audio. Refresh the page and try again.

Other notes

  • Unfortunately we can't determine if there's a cache hit without making a trip to Shellbox, so we may have to increment the counter even if we aren't actually making external API calls. However at least if the file= parameter is set, we should bypass calling Parser::incrementExpensiveFunctionCount() since we know that's cheap.

When a page has more than $wgExpensiveParserFunctionLimit Phonos tags, the remainder get generated by jobs in the job queue. I checked that this did not delay the saving of a page and that the remaining phonos files get generated in the background (by doing much the same as suggested in T315483#8309803).

This might mean that after saving a page some of the phonos buttons will not play (instead returning the error noted above). Once the background jobs have completed the buttons will play (you don't even need to refresh the page, unless you have already clicked a button and seen an error condition).

I believe $wgExpensiveParserFunctionLimit is 500, so in practice it is unlikely to be hit for one page.

It should be noted that there are several events other than saving an article which will cause phonos tags to be generated. These include:

  • preview (Show preview)
    • including RTP
  • diff (Show changes)
  • purging a page (action=purge)
  • stashedit (this is called in the background while a user is using the source editor, and can be called many times during an editing session)
  • adding a template in VE (although only for the template added)

(and there are probably more events which I am unaware of)

Each time one of these events happen the expensive parser function limit is reset. So, for example, if I am editing a page with 1000 phonos tags on it and stashedit is fired it will "immediately" generate 500 files and 500 jobs will be added to the job queue. Then, if I refresh RTP a further 500 files will be "immediately" generated. The 500 jobs in the job queue will continue to be executed one-by-one but, on finding the file has already been generated, each will return early.

  • A better design would probably be to show an informative error. This could be shown instead of the play button.

See above.

Test environment: