Page MenuHomePhabricator

[EPIC] Support Moderator Tools with using Codex PHP for Special:Nuke
Open, MediumPublic

Description

Once a early release of Codex PHP is available for use in Composer, a WMF feature team should try to use it in a project to provide some initial feedback and validation of our assumptions. A good pilot project would be something that is small, discreet (maybe housed in an extension rather than in core), and fairly simple.

Based on a recent conversation I had with @jsn.sherman, @Scardenasmolinar, and @Kgraessle, it sounds like the Special:Nuke feature could be a promising pilot project – it aligns with the criteria above and Moderator Tools is already interested in migrating a lot of older community-facing features to Codex for easier maintenance.

Acceptance Criteria
  • Identify a pilot project for the introduction of Codex PHP which fits the above criteria
  • Align schedules so that DST engineers (and possibly @Dogu) have availability to provide some support, code review, etc. during initial implementation
  • Summarize any feedback here to inform future adoption of this library

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We'll keep an eye on this since GrowthExperiments also has ad-hoc code for Codex-like markup generation on its Special homepage. We'd be happy to move it to codex-php. Also because I believe it would fixes issues along the way. Maybe we can collaborate on a second round of adoption as right now we have little capacity for it.

Our parent task for implementing Codex in the Nuke extension: T153988: Migrate Special:Nuke to Codex
@egardner, we had an alignment session on our team, and this has bumped up a bit in priority vs our initial discussion. We still think this is the right path to take vs an ad hoc approach, even if it is blocked for a bit. We now have @Chlod aboard to help us knock out work on the extension, so we're on track to dig into Codex implementation whenever that initial Codex PHP library becomes available.

Our parent task for implementing Codex in the Nuke extension: T153988: Migrate Special:Nuke to Codex
@egardner, we had an alignment session on our team, and this has bumped up a bit in priority vs our initial discussion. We still think this is the right path to take vs an ad hoc approach, even if it is blocked for a bit. We now have @Chlod aboard to help us knock out work on the extension, so we're on track to dig into Codex implementation whenever that initial Codex PHP library becomes available.

@jsn.sherman Codex PHP is available in packagist now (meaning that you can use it in Composer):

https://packagist.org/packages/wikimedia/codex

But please be advised that this is still early alpha stage software and will be for some time. I will do another release tomorrow, and we'll probably do another one in the next 1-2 weeks after that.

The project is here in Gerrit too: https://gerrit.wikimedia.org/r/admin/repos/design/codex-php,general

Feel free to start using this in local prototyping at least, and if you post questions (here or in Slack) we'll try to help out.

Our parent task for implementing Codex in the Nuke extension: T153988: Migrate Special:Nuke to Codex
@egardner, we had an alignment session on our team, and this has bumped up a bit in priority vs our initial discussion. We still think this is the right path to take vs an ad hoc approach, even if it is blocked for a bit. We now have @Chlod aboard to help us knock out work on the extension, so we're on track to dig into Codex implementation whenever that initial Codex PHP library becomes available.

@jsn.sherman Codex PHP is available in packagist now (meaning that you can use it in Composer):

https://packagist.org/packages/wikimedia/codex

But please be advised that this is still early alpha stage software and will be for some time. I will do another release tomorrow, and we'll probably do another one in the next 1-2 weeks after that.

The project is here in Gerrit too: https://gerrit.wikimedia.org/r/admin/repos/design/codex-php,general

Feel free to start using this in local prototyping at least, and if you post questions (here or in Slack) we'll try to help out.

Awesome!

Just pushed up v0.2.0 to Packagist, which may help you if you were experiencing issues with Intuition and namespace conflicts in MediaWiki.

Noting that we plan to work on T153988 over the coming month using Codex PHP.

CCiufo-WMF renamed this task from Codex PHP: Use the new library in a pilot project and evaluate to Support using Codex PHP for Special:Nuke.Dec 12 2024, 2:28 PM
CCiufo-WMF renamed this task from Support using Codex PHP for Special:Nuke to Support Moderator Tools with using Codex PHP for Special:Nuke.
CCiufo-WMF updated the task description. (Show Details)

Thanks for the heads up @Samwalton9-WMF

CCiufo-WMF triaged this task as Medium priority.Dec 12 2024, 2:29 PM
CCiufo-WMF moved this task from Now to Supporting on the Design-System-Team (Roadmap) board.

@Dogu just mentioned to me elsewhere that he's currently working on version 0.3.0 and would like us to wait until that is ready. Do you have a timeline estimate for when this version would be ready?

IMO, once the pending patches are merged, version 0.3.0 will be ready for release. They’re currently awaiting code review.

IMO, once the pending patches are merged, version 0.3.0 will be ready for release. They’re currently awaiting code review.

This is something we can wrap up this week.

CCiufo-WMF renamed this task from Support Moderator Tools with using Codex PHP for Special:Nuke to [EPIC] Support Moderator Tools with using Codex PHP for Special:Nuke.Dec 12 2024, 6:53 PM
CCiufo-WMF moved this task from Supporting to Next on the Design-System-Team (Roadmap) board.

IMO, once the pending patches are merged, version 0.3.0 will be ready for release. They’re currently awaiting code review.

This is something we can wrap up this week.

Is there a Phab ticket or similar where we can track this?

Hi @Samwalton9, I just published v0.3.0 – you can see the latest version live on Packagist here: https://packagist.org/packages/wikimedia/codex

We don't have a dedicated Codex PHP phab board set up yet, but we can create on in the new year. Until then, feel free to tag any issues you file with Design-System-Team.

Good luck and let me know if you encounter problems! This is a very new project so I expect there will be growing pains.

For any developers trying to get up to speed with Codex PHP, I recommend spinning up the developer sandbox locally – install the composer deps and run composer start-sandbox and then navigate to localhost:8000 – you'll see a demo page with a bunch of usage examples. There is also auto-generated documentation available at https://doc.wikimedia.org/design-codex-php/main/.

Noting @Chlod's first look:

Taking an early look at this, there's some things we're currently missing in Codex that we need to have to be 1:1 equivalent to the old form:

  • There's no DateTimeInputWidget equivalent, which is used for the date filters. There also doesn't seem to be a PopupWidget equivalent (besides Tooltip?) where the "calendar" would appear. We might have to roll our own implementation of this.
  • There's no TagMultiselectWidget/MultiselectWidget equivalent, which is used for the namespace filters. Having this be a radio group temporarily would be too much clutter, and making it take in only one namespace would be a regression of the work done in T376379.

These are all preliminary looks at what we have; I could be entirely wrong. If any of these exist in some form, or can be implemented from what we have now, please do tell!

For TagMultiSelectWidget, we recently added a similar component to Codex called MutliselectLookup. Take a look at that and see if it can cover your use-case.

For DateTimeInputWidget, Codex does not have any dedicated calendar component. Our philosophy here has been to try to rely on the built-in tools that browsers provide (which are generally quite good at this point) for date pickers. If you are working with dates in a Codex-based form, here's what I'd recomnend:

Hopefully this will let you cover most of what you need. If you have specific needs that are still not covered let us know and we can look at adding some additional functionality.

Also, if you are trying to do everything in PHP, you can call setType( 'date' ) on the TextInput builder – see the docs here. That will give you access to the built-in browser date-picker on the client.

Hello! I've just finished up the first pass on getting Codex PHP integrated into Nuke. The patch can be found here, and the relevant file is SpecialNukeCodexUIRenderer.php. Note that it's currently mysteriously failing in CI, but it's working on my local machine when testing. More info in T153988#10425286.

Firstly, some general comments: Codex PHP has been very easy to use from an engineering perspective. I didn't run into any situation where the components didn't let me do something that should have been supported. Wonderful work from Dogu, the Design team, and everyone involved. :D

For TagMultiSelectWidget, we recently added a similar component to Codex called MutliselectLookup. Take a look at that and see if it can cover your use-case.

This looks perfect! Currently, it doesn't seem to be supported on Codex PHP though (or at least I can't find it in the sandbox). This is currently required for the namespace selector. For now (since the patch is still WIP anyway), I've used a text area, matching our current no-JavaScript experience.

For DateTimeInputWidget, Codex does not have any dedicated calendar component. Our philosophy here has been to try to rely on the built-in tools that browsers provide (which are generally quite good at this point) for date pickers. If you are working with dates in a Codex-based form, here's what I'd recomnend:

Hopefully this will let you cover most of what you need. If you have specific needs that are still not covered let us know and we can look at adding some additional functionality.

Also, if you are trying to do everything in PHP, you can call setType( 'date' ) on the TextInput builder – see the docs here. That will give you access to the built-in browser date-picker on the client.

Thanks for the explanation! For now, I haven't added in the date filters, pending merging of the patch that introduces them in the old form. However, I'll keep this in mind for when I'm implementing the new fields!

One concern that I have is that we're currently using a user search box for the "Username, IP address or blank" field. This automatically suggests usernames to the user for selection. I'm unsure if there's currently a way to extend TextInput in a way that would allow fields like this to be built, especially since I'm also unsure how JavaScript would work for the field as there doesn't seem to be any JavaScript loaded for Codex PHP currently. I'm assuming this is something that needs to be done on the client side, but I'm also wondering if that's something that should be a part of Nuke, MediaWiki core, or Codex. If it's something that should be a part of Nuke, I'm unsure of what approach to take here, so I'll leave it to discussion for now.

Hi there! We've started work on changing Nuke's page list to a table (T381660), and I've encountered the following issues while trying to build that interface. I've documented it on the task mentioned, but I'll copy it over here for reference.

Looks like there's currently no way to feed in raw HTML input into the Table component. As a result, all of the links, checkboxes, and buttons do not render properly.

Screenshot_13.png (853×1 px, 136 KB)

Two other component-related issues in trying to get this off the ground:

  • The Checkbox component also requires a label and an ID, which the design does not have. It's probably good to have a label for the checkbox, but it should probably be hideable (similar to the "invisible label" approach that OOUI has).
  • The Table component also has its own <form>, which conflicts with the existing <form> that we have wrapping the entire UI for form handling.

On the topic of components, something came up in our meeting last Thursday. For a lot of components, we usually need to insert some level of rich text formatting or add in some other elements (buttons, thumbnails, etc.). Currently, most components run input data through a sanitizer which, although useful when Codex is being used as a standalone library, acts as a second layer of sanitization on top of what's already being done inside of MediaWiki's Message class. This ends up being an issue, since the restrictiveness prevents us from placing HTML elements where we want them to be placed, while also being essentially moot due to sanitization that already happens earlier (inside of MediaWiki core).

Dogu filed a patch allowing raw HTML to be passed into the Table class (see T381660#10462214), but we probably need this behavior in general across many other components, rather than just on Table. One such example is on the "SQL LIKE pattern" label, where a non-breaking space added in by MediaWiki ends up becoming an HTML entity, that in turn is re-escaped by Codex.

image.png (105×796 px, 6 KB)

A way to tell Codex PHP that sanitization is being done prior to input (and that it doesn't need to do extra sanitization) would be useful, though I understand there could be security implications from this. Perhaps further discussion is required on this?

Hey @Chlod, I've just released Codex PHP v0.4.0, which includes a commit that should address this problem. Please update and let me know when you have a chance.

Hey @Chlod, I've just released Codex PHP v0.4.0, which includes a commit that should address this problem. Please update and let me know when you have a chance.

Could you speak to the ability for us to insert arbitrary elements as mentioned here?

On the topic of components, something came up in our meeting last Thursday. For a lot of components, we usually need to insert some level of rich text formatting or add in some other elements (buttons, thumbnails, etc.). Currently, most components run input data through a sanitizer which, although useful when Codex is being used as a standalone library, acts as a second layer of sanitization on top of what's already being done inside of MediaWiki's Message class. This ends up being an issue, since the restrictiveness prevents us from placing HTML elements where we want them to be placed, while also being essentially moot due to sanitization that already happens earlier (inside of MediaWiki core).

Dogu filed a patch allowing raw HTML to be passed into the Table class (see T381660#10462214), but we probably need this behavior in general across many other components, rather than just on Table. One such example is on the "SQL LIKE pattern" label, where a non-breaking space added in by MediaWiki ends up becoming an HTML entity, that in turn is re-escaped by Codex.

image.png (105×796 px, 6 KB)

A way to tell Codex PHP that sanitization is being done prior to input (and that it doesn't need to do extra sanitization) would be useful, though I understand there could be security implications from this. Perhaps further discussion is required on this?

Hey @Chlod, I've just released Codex PHP v0.4.0, which includes a commit that should address this problem. Please update and let me know when you have a chance.

Could you speak to the ability for us to insert arbitrary elements as mentioned here?

The 0.4.0 release includes this change, which allows HTML content in Table cells: https://gerrit.wikimedia.org/r/c/design/codex-php/+/1111614

Most other components allow HTML content in certain places too. Let me know if you are still having issues rendering HTML content inside Codex PHP components after updating to 0.4.0 and we'll follow up on it.

The 0.4.0 release includes this change, which allows HTML content in Table cells: https://gerrit.wikimedia.org/r/c/design/codex-php/+/1111614

Most other components allow HTML content in certain places too. Let me know if you are still having issues rendering HTML content inside Codex PHP components after updating to 0.4.0 and we'll follow up on it.

We currently also need this functionality on labels. The label for the "pattern" field is still double-escaped, leading to odd output. I've edited the MediaWiki:Nuke-pattern file to be [[SQL LIKE]] pattern (e.g. %) for the page name: here as a test, but even without doing that, this affects the non-breaking space before the % sign (output as an HTML entity: &#160;):

Screenshot_16.png (109×1 px, 9 KB)

The table header also probably needs some way to insert normal HTML elements other than just labels. In following the design in F58023559, there needs to be a button next to the "X pages selected" label. TableBuilder::setHeaderContent advertises that functionality: "This method allows custom content to be added to the table's header, such as actions or additional text." But attempting to insert HTML content here does not work (also ends up getting sanitized, see https://gerrit.wikimedia.org/r/plugins/gitiles/design/codex-php/+/refs/tags/v0.4.0/src/Renderer/TableRenderer.php#120).

Screenshot_17.png (103×545 px, 3 KB)

The 0.4.0 release includes this change, which allows HTML content in Table cells: https://gerrit.wikimedia.org/r/c/design/codex-php/+/1111614

Most other components allow HTML content in certain places too. Let me know if you are still having issues rendering HTML content inside Codex PHP components after updating to 0.4.0 and we'll follow up on it.

We currently also need this functionality on labels. The label for the "pattern" field is still double-escaped, leading to odd output. I've edited the MediaWiki:Nuke-pattern file to be [[SQL LIKE]] pattern (e.g. %) for the page name: here as a test, but even without doing that, this affects the non-breaking space before the % sign (output as an HTML entity: &#160;):

Screenshot_16.png (109×1 px, 9 KB)

The table header also probably needs some way to insert normal HTML elements other than just labels. In following the design in F58023559, there needs to be a button next to the "X pages selected" label. TableBuilder::setHeaderContent advertises that functionality: "This method allows custom content to be added to the table's header, such as actions or additional text." But attempting to insert HTML content here does not work (also ends up getting sanitized, see https://gerrit.wikimedia.org/r/plugins/gitiles/design/codex-php/+/refs/tags/v0.4.0/src/Renderer/TableRenderer.php#120).

Screenshot_17.png (103×545 px, 3 KB)

Update on this: me and Dogu had a meeting today to discuss these and he's been informed of what parts need changing to avoid the double sanitization issue. The latter issue already has a patch filed (see https://gerrit.wikimedia.org/r/1114724), it just needs review.

Change #1116494 had a related patch set uploaded (by Abaris; author: Abaris):

[design/codex-php@main] Label: Do not sanitize label text and add taint annotations

https://gerrit.wikimedia.org/r/1116494

Change #1116494 merged by jenkins-bot:

[design/codex-php@main] Label: Do not sanitize label text and add taint annotations

https://gerrit.wikimedia.org/r/1116494

@Chlod Codex PHP 0.5.0 is now available on Packagist – please file any other bug fixes as they come up and we'll do more releases as needed.

Hi, @egardner! Thanks for the update. I've tested v0.5.0 out and resolved some of the issues, though the following still remain:

  • Codex PHP's LabelBuilder::setLabelText() taints argument #1 with exec_html, but MediaWiki's Message::text() taints its return type with tainted. This is causing Phan SecurityCheck-XSS errors.
  • We also need to add raw HTML into table headers, to implement the "select all" checkbox found at the top of the Nuke table in F58023559/T381660: Change Nuke's page list to a table.
  • Table injects a <form> into the page, likely to support pagination. This is conflicting with the <form> that Nuke generates to submit both prompt input and selected pages. Under Firefox, the inner <form> tag will be ignored as nested form tags are banned, but the inner </form> closing tag won't be ignored, and causes some of our fields to get dropped from the form altogether. We need a way to disable the <form> tag from being included (i.e. allow the caller to handle everything related to pagination, instead of handling that inside of Codex PHP). Though an option to add form fields could also work, this would be incredibly cumbersome as the entire prompt section of the Nuke form will need to be passed into TableBuilder.

On a personal note, the sanitization that comes with most of the library has proved to be incredibly difficult to work with so far. I'm unsure why this exists, considering Codex PHP is a server-side library and the onus of ensuring that input is clean should be on the calling software (in this case, the Nuke extension) and not the UI library. This is especially the case when MediaWiki itself already sanitizes input that goes through the Message class.

the sanitization that comes with most of the library has proved to be incredibly difficult to work with so far

Would it make sense to create a toggle that turns off all sanitization in the library? Then the library could default to sanitizing, but be turned off by power users that know what they're doing and need to do complex things.

Quick comments re escaping:

  • Codex PHP's LabelBuilder::setLabelText() taints argument #1 with exec_html, but MediaWiki's Message::text() taints its return type with tainted. This is causing Phan SecurityCheck-XSS errors.

These would be genuine XSSs, assuming that the annotations are correct. However, I'm seeing the return value of getLabelText being sanitized in a few places. I think r1116494 might have been an incomplete fix.

This is especially the case when MediaWiki itself already sanitizes input that goes through the Message class.

Not when you use Message::text() (or plain()).

the sanitization that comes with most of the library has proved to be incredibly difficult to work with so far

Would it make sense to create a toggle that turns off all sanitization in the library? Then the library could default to sanitizing, but be turned off by power users that know what they're doing and need to do complex things.

A toggle wouldn't scale easily. There should be a way to bypass escaping for single calls. For example, OOUI achieves this via HtmlSnippet, and MW core via HtmlArmor. Codex-PHP also has a HtmlSnippet class, but it seems to have different semantics, and it cannot be used interchangeably with strings.

Change #1119210 had a related patch set uploaded (by Abaris; author: Abaris):

[design/codex-php@main] Do not sanitize label text in renderer files

https://gerrit.wikimedia.org/r/1119210

Quick comments re escaping:

  • Codex PHP's LabelBuilder::setLabelText() taints argument #1 with exec_html, but MediaWiki's Message::text() taints its return type with tainted. This is causing Phan SecurityCheck-XSS errors.

These would be genuine XSSs, assuming that the annotations are correct. However, I'm seeing the return value of getLabelText being sanitized in a few places. I think r1116494 might have been an incomplete fix.

This is especially the case when MediaWiki itself already sanitizes input that goes through the Message class.

Not when you use Message::text() (or plain()).

In that case I'll change these to use ::parse() instead. This also has the advantage of being able to use wikitext inside of message strings, which would be useful for linking to some of the more odd terms in the field labels (like "SQL LIKE", which I don't think the average admin would know about).

the sanitization that comes with most of the library has proved to be incredibly difficult to work with so far

Would it make sense to create a toggle that turns off all sanitization in the library? Then the library could default to sanitizing, but be turned off by power users that know what they're doing and need to do complex things.

A toggle wouldn't scale easily. There should be a way to bypass escaping for single calls. For example, OOUI achieves this via HtmlSnippet, and MW core via HtmlArmor. Codex-PHP also has a HtmlSnippet class, but it seems to have different semantics, and it cannot be used interchangeably with strings.

This would probably be the best approach. It's also familiar to me as someone who's worked with OOUI before. There's some sort of wheel reinventing that's happening here, but that's to be expected when we're shifting from a UI library that closely resembles its predecessor (OOUI).

Change #1119210 merged by jenkins-bot:

[design/codex-php@main] Do not sanitize label text in renderer files

https://gerrit.wikimedia.org/r/1119210

Hey @Chlod, just a heads up that I've opened https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1122263 to add Codex PHP to MW Vendor (the library has passed security review) – this might simplify your work because you won't need to treat this as an optional dependency once this merges.

Hey @Chlod, just a heads up that I've opened https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1122263 to add Codex PHP to MW Vendor (the library has passed security review) – this might simplify your work because you won't need to treat this as an optional dependency once this merges.

Indeed, it will! This sounds great, thanks! :D

Change #1122960 had a related patch set uploaded (by Abaris; author: Abaris):

[design/codex-php@main] table.mustache: Remove <form> wrapper from template

https://gerrit.wikimedia.org/r/1122960

Change #1122960 merged by jenkins-bot:

[design/codex-php@main] table.mustache: Remove <form> wrapper from template

https://gerrit.wikimedia.org/r/1122960

@egardner Just checking in: where do we stand on multiselect support in for this library?

@egardner Just checking in: where do we stand on multiselect support in for this library?

Can you clarify what you mean by multiselect?

Oh I now see the activity on T376492, but not sure what exactly HTMLMultiSelectField is. Is this about providing a wrapper field for binary inputs like radios and checkboxes?

@egardner Just checking in: where do we stand on multiselect support in for this library?

@jsn.sherman are you asking about the CodexPHP support for the MutliSelectLookup component, or are you asking if the standard Select component in Codex PHP can take accept something like the multiple attribute from HTML?

Currently we don't really support either of these things. If we need dropdowns that allow multiple selection of items without JS, then we could probably look into implementing something similar to the multiple attribute from the native <select> element, but we'd want to also ensure this gets reflected in the Vue version of the Select component. We'd need to settle on a new design/UX for this first (I believe @bmartinezcalvo has explored this in the past).

If this is what you need, I'd recommend filing a task to request support for selection of multiple items in the existing Codex select component – including info about your use-case would be helpful too.

Adding @OTichonova to provide details about requirements from Mod Tools side.

@egardner Just checking in: where do we stand on multiselect support in for this library?

@jsn.sherman are you asking about the CodexPHP support for the MutliSelectLookup component, or are you asking if the standard Select component in Codex PHP can take accept something like the multiple attribute from HTML?

Currently we don't really support either of these things. If we need dropdowns that allow multiple selection of items without JS, then we could probably look into implementing something similar to the multiple attribute from the native <select> element, but we'd want to also ensure this gets reflected in the Vue version of the Select component. We'd need to settle on a new design/UX for this first (I believe @bmartinezcalvo has explored this in the past).

If this is what you need, I'd recommend filing a task to request support for selection of multiple items in the existing Codex select component – including info about your use-case would be helpful too.

MutliselectLookup support in CodexPHP would be a great path forward if feasible (see T377494#10425289).
I'll file a task. Thanks for the followup!

Maybe this is a non issue, per T153988#10603930? Looping you into that work to see.