Page MenuHomePhabricator

[EPIC] Create a PHP library for Codex markup generation
Closed, ResolvedPublic

Description

The Codex component library is primarily developed as a set of Vue.js components, meaning that users must have JavaScript enabled in their browser in order to interact with them properly. However, Codex will ultimately need to support a variety of use-cases where JS is either not available at all, or at least is not available immediately.

For these situations, our solution thus far has been to provide a collection of CSS-only components. For most Codex components, we ensure that the styles can be used independently from the Vue.js component code in order to create a "static" component in HTML and CSS that matches the "dynamic" Vue-based version. This is useful for presenting the user with a server-rendered interface that matches what they might see when the full JS-based UI initializes (which may happen some time after the initial page load, depending on various factors). This also allows developers to use Codex components in places where they cannot (or don't wish to) use JS for whatever reason.

Thus far, the developer experience of using CSS-only components has been less than ideal. We provide example markup on the Codex docs site that users must copy and paste into their own applications (potentially with some modifications, especially where things like HTML form submission is involved). This has been okay for small-scale adoption of CSS-only components in some parts of the MediaWiki ecosystem, but we must clearly do better here if we want to enable wider usage of Codex components in PHP-driven user interfaces. In the past we've discussed this problem (see T326850 and T330469) but so far we have not invested much time and energy into addressing it.

We have an opportunity now to make serious progress in this area. Volunteer developer @Dogu has put together a comprehensive prototype of a stand-alone PHP library that generates most of the existing Codex CSS-only components. After some discussion (T372811), the design system team has decided to adopt this prototype as another piece of the Codex design system. We will work with @Dogu and other members of the WMF developer community to move this code into the official Codex repo, conduct review and testing of the code-base, and (eventually) publish this code on Composer as a library that any PHP project can include. We will also maintain this code going forward (though support from the wider community will be particularly welcomed in this area, as the DST team skews more to the front-end in terms of experience and expertise).

We hope that this project will greatly simplify the process for both WMF and community developers to use Codex components in their projects, regardless of whether or not they are presenting rich JS applications to their users.

Engineering driver: @egardner

Steps

  • Finalize the draft ADR about Codex PHP (T372811)
  • Set up a new workspace within the Codex mono-repo in Gerrit suitable for @Dogu's initial Codex PHP code (https://gerrit.wikimedia.org/r/c/1069297)
  • Prepare this codebase for the initial round of code review
  • Initial round of code review (T373939)
  • Put out a call for feedback internally
  • Make necessary revisions, adding tests, demos, and documentation where necessary
  • Publish an initial release in Composer (T376059)
  • Announce initial release (T378363)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I have completed the preparation of the Codex PHP codebase for the initial round of code review as outlined in this task. The code has been organized, and initial documentation has been added to facilitate the review process.

Documentation Note:
If you would like to view the documentation, which includes usage examples and details based on the source code, please run the following script:

php phpDocumentor.phar -d src -t docs

This will generate the documentation in the docs folder.

We've fully transitioned to using Mustache templates for all components, replacing PHP-based markup generation entirely.

We've fully transitioned to using Mustache templates for all components, replacing PHP-based markup generation entirely.

Hey @Dogu, this is great. I have one last request that is hopefully straightforward.

In the Codex Vue workspace, we have a "sandbox" file that is very useful for development. This isn't published as part of the normal documentation site, nor is it a test exactly – instead, it's a page that you can see when you run a development script locally, and there is some scaffolding there to set up the various Codex components. This is useful during development because it gives us a way to test things out in a playground environment before we have 100% specified the behavior of a given component.

I was wondering if it would be possible to introduce a similar page in the codex-php codebase. The ideal workflow would be to run composer dev in the project, navigate to localhost:8000 or similar, and see a page with various components, with appropriate styles already loaded. This would give us a good way to test things visually as we add or update components going forward. Having an index.php or demo.php file where we can just drop simple examples into would be useful.

Hey @egardner, yes, this is definitely possible. The unit tests are almost finished (except for the create and build methods), and once I complete them, I can add a command that generates and lists the output of the components, just as you've requested.

I’ve added a script to launch a sandbox. You can run composer start-sandbox to start the server and view the sandbox at localhost:8000.

@egardner asked me to take a look from the perspective of PHP best practices and comparison to OOUI PHP.


I like the use of the builder pattern – it's common in modern PHP and increasingly common in MediaWiki. OOUI allowed it too, although for some reason it never gained much adoption, with users preferring to pass everything as config parameters to the constructor. I don't know why that was the case, maybe it's just that builders were less common in PHP (and in our PHP code) circa 2014 than they are circa 2024, and people cared a lot less about type hints and static analysis.

I think it's very important to design the PHP library so that it helps you avoid XSS vulnerabilities, and I think the current proposal doesn't do enough towards this. For example, the Accordion component has setTitle() and setContent() methods, and at a glance it's not obvious that the first takes plaintext and the second takes HTML. Message also has a setContent(), which takes plaintext! This is a bug waiting to happen. There are several things you could do to mitigate this (IMO you really need to do at least one of them):

  • Rename all of the methods to make it clear what arguments they expect, e.g. setContentText() and setContentHtml(). This still requires quite a bit of diligence from the authors and reviewers, but it at least draws attention to the fact that they should think of what is passed to the method.
  • Require all HTML values to be provided through a wrapper class. For example, OOUI has a HtmlSnippet, and MediaWiki's LinkRenderer has a HtmlArmor. If you add appropriate type hints, passing plaintext to a method that expects HTML will fail Phan static analysis checks and at runtime. (As a bonus, this makes it very easy for any method that accepts plaintext to also accept HTML, a.k.a. formatted text, which will come in handy when your users will want to include a link, or bidi markup, or italics, or who knows what else in some text where you didn't plan for that.)
  • Add our custom Phan annotations @param-taint $text escapes_html and @param-taint $html exec_html to all such methods, which will allow Phan to detect at least some cases where user input (a.k.a. tainted values) is passed to a method that outputs HTML, or where already escaped HTML is escaped again.

By the way, using Mustache makes it quite difficult to avoid XSS and double-escaping bugs in your own code. You lose all such semantic information described above when all of your data is passed to the template in a giant string array, we have no way to statically analyze them, and the difference between escaping and not-escaping HTML is just a single {}. If you must use it (I understand that it has upsides), I suggest adding -text or -html as a suffix to all template variables, to make it easier to notice when the wrong kind of data is passed and when the wrong kind of output is used.

I saw some comments about using inheritance on the patch, and I have to say I like the current approach of not using inheritance much. We overused it in OOUI, and it led to problems like different subclasses of InputWidget having different types on the value property. etc. We had designed some of the widgets to be inherited from, and hardly anyone made use of that, and when they did, they've done in different ways than we anticipated, complicating maintenance. Just say no and use composition (or traits, they're not too annoying).

I also saw you wondering how to connect messages from MediaWiki into Codex PHP. I feel like it would be easiest for Codex to just have its own localisation messages, which could be synced with Translatewiki like everything else. But it seems you don't want that, so my suggestion is: an adapter with an interface similar to MessageLocalizer (or if that seems too heavyweight, just a well-documented callback method, kind of like we did for OOUI JS). Since you already have the Codex giga object that everything has to go through, you could just pass it when constructing that object (or give it a setter), and it would then pass it to all the other builders, so that you wouldn't have to pass it manually.

You should decide early on how to generate unique HTML id attributes (which you'll need sooner or later for aria-* or for attributes), and how to ensure that they actually are unique within a single page view (you might want to guarantee this even when using separate Codex giga objects). Also, make sure they don't conflict with those generated by Codex JS :)

I would also think about testing, and particularly making sure that the PHP and JS implementations don't drift away. There is a lot of test code in the patch, but the tests are very low value (mostly just testing setters and getters), and there are no tests for the generated HTML. I would at least want to set up snapshot tests (this article explains the idea well, I'm not necessarily suggesting using the library) to define what HTML is supposed to be output with some common builder methods. In OOUI we have JS/PHP comparison tests (which are the same idea as snapshot tests, but comparing the two implementations to each other, rather than an implementation to expected output), and they've caught many mistakes before they shipped – I'm not sure if Codex JS and Codex PHP/CSS-only are close enough for this to be viable.

One last thought: what's your idea for doing progressive enhancement with Codex JS on top of the interfaces generated by Codex PHP? I assume server-side Vue is still just a dream. That leaves you with either sticking to Codex CSS-only components, and manipulating the HTML using plain DOM methods or jQuery (which is something we've already had enough of in 2014 when designing OOUI PHP), or with duplicating the PHP code in Codex JS components that replace the original ones (which seems like a pain to keep in sync).


Sorry, that got a bit long. The only important paragraph is the one about designing to avoid XSS vulnerabilities. You can ignore everything else :)

Hey @matmarex, thank you for all of your comments. Yes, I'm already aware of some of them and have been working on them. In the meantime, I’ve split some of the logic for objects, creating a new structure with DTOs and using them with Builder objects. I'll then apply all the penetration tests to check for possible XSS vulnerabilities. For the message keys, I’ve already found a solution using the Intuition library. I'll ping you when I upload my new patch. Thanks for all of your reviews!

I suggest adding -text or -html as a suffix to all template variables, to make it easier to notice when the wrong kind of data is passed and when the wrong kind of output is used.

This is also how the Web team uses Mustache, I believe. Maybe it should be codified in the coding style docs.

@matmarex, I uploaded a new patch. Could you please take a look?

I suggest adding -text or -html as a suffix to all template variables, to make it easier to notice when the wrong kind of data is passed and when the wrong kind of output is used.

This is also how the Web team uses Mustache, I believe. Maybe it should be codified in the coding style docs.

Yeah, I think I got that idea from Jon some time.

For the record, I talked with @Dogu on Telegram about the specifics of the builder pattern, and some of the results we arrived at were:

  • The builder should not be used directly as the HTML result; it should have a method (e.g. ->build()) which returns a separate type. (For instance, an AccordionBuilder would return an Accordion.) Normally, each builder chain should end with this method.
  • The separate type should be an immutable value object, wrapping the different parameters that can be set in the builder. (To create it, the builder would itself have mutable instance variables to keep track of the parameters, and only construct the e.g. Accordion in the ->build() method.) Instances of this type can, if needed, be passed around between methods (“here’s an Accordion to render later”), or they can be turned into HTML immediately.
  • The separate type should have another method that returns the HTML, e.g. ->render() or ->getHtml(). (I’m not a fan of doing this implicitly in ->__toString(), though if someone else talks Doğu into it again I won’t object ^^)

A simple use of the builder would be:

$accordionHtml = $codex->accordion()
	->setTitle( 'my title' )
	// ->setXYZ()...
	->build()
	->getHtml();

Or you can use the intermediate values if needed:

public function showPage() {
	// ...
	$accordionBuilder = $codex->accordion()
		->setTitle( 'my title' );
	if ( $someCondition ) {
		$accordionBuilder = $accordionBuilder->setDescription( 'my description' );
	}
	$this->showAccordion( $accordionBuilder->build() );
	// ...
}

private function showAccordion( Accordion $accordion ) {
	// TODO more code
	$this->getOutput()->addHTML( $accordion->toHTML() );
}

(There’s also an ongoing poll in the Wikimedia Hackathon Telegram channel about which version people prefer, so this is still subject to change I guess ^^)

I agree with @LucasWerkmeister, so after the discussion, I started a refactoring process again. I can finish this refactoring within this week and discuss it again. Anyway, refactoring doesn't change the API; on the contrary, it allows us to create more API approaches.

I agree with @LucasWerkmeister, so after the discussion, I started a refactoring process again. I can finish this refactoring within this week and discuss it again. Anyway, refactoring doesn't change the API; on the contrary, it allows us to create more API approaches.

I support an explicit build() method at the end – I might advocate for calling it render() since that's also the language we use on the Vue.js side. "Render" or "Build" seems better than "toHTML" or "toString" IMO.

I've completed the refactoring and uploaded the patch. The builder pattern has been implemented as shown below, but we can adjust it if needed.

$codex->InfoChip()->setText( 'Info Chip' )->build()->getHtml();

->build()->getHtml() seems like extra typing. Would it make sense to combine that into just ->getHtml()?

When you want to generate the HTML for an InfoChip using the InfoChipBuilder, after configuring the related object, calling the ->build() method returns the constructed InfoChip object. You can then call ->getHtml() on that object to retrieve its output. The final method doesn’t necessarily have to be ->getHtml(); it could be named ->render() or something else.

@Dogu would it make sense to have a build method to return the constructed object as well as a render method which 1) constructs the object and 2) returns the final HTML in a single method? Then we could provide a simple approach for convenience as well as a possibility to do more involved things (pass around a constructed component with all data before rendering) where necessary?

@egardner When the InfoChip is built using the ->build() method, we could technically get the output directly using the __toString() method, as it would allow us to implicitly render the object.

For instance:

$codex->InfoChip()->setText('Info Chip')->build();

In this case, the __toString() method would return the HTML output directly. However, I prefer to use ->build()->getHtml() for a clearer separation between object construction and rendering, which aligns more closely with the builder pattern principles:

$codex->InfoChip()->setText('Info Chip')->build()->getHtml();

This approach provides better modularity, allowing us to treat object creation and rendering as distinct steps, offering more flexibility for further manipulation before rendering.

Change #1076232 had a related patch set uploaded (by Abaris; author: Abaris):

[design/codex-php@main] Introduce the Codex PHP library

https://gerrit.wikimedia.org/r/1076232

Change #1069297 abandoned by Abaris:

[design/codex@main] Introduce the Codex PHP library

Reason:

per: https://gerrit.wikimedia.org/r/c/design/codex-php/+/1076232

https://gerrit.wikimedia.org/r/1069297

Change #1076232 merged by Eric Gardner:

[design/codex-php@main] Introduce the Codex PHP library

https://gerrit.wikimedia.org/r/1076232

From my comment on the initial change at https://gerrit.wikimedia.org/r/c/design/codex-php/+/1076232 and

This change has introduced a 15MBytes opaque phar file which cause several problems:

  • the 15 MBytes will be kept forever in the git repository
  • being a zip file, it is not auditable by code review short of unpacking it
  • that third party library will not be detected by our updating stack https://www.mediawiki.org/wiki/LibUp or security scans
  • any update will add 15 MBytes more forever

Given the repository is brand new and barely has any history, the easiest is probably to overwrite the current history or rewrite it without the large phar file. It might be possible to keep some of the existing code review, but it is noticeable amount of work and I am not sure there is much values in the existing changes.

This also deviate from our standard of using Doxygen for MediaWiki PHP code, and introducing yet another tool has a noticeable overhead. I'd stick to Doxygen as the other MediaWiki libraries.

From my comment on the initial change at https://gerrit.wikimedia.org/r/c/design/codex-php/+/1076232 and

This change has introduced a 15MBytes opaque phar file which cause several problems:

  • the 15 MBytes will be kept forever in the git repository
  • being a zip file, it is not auditable by code review short of unpacking it
  • that third party library will not be detected by our updating stack https://www.mediawiki.org/wiki/LibUp or security scans
  • any update will add 15 MBytes more forever

Given the repository is brand new and barely has any history, the easiest is probably to overwrite the current history or rewrite it without the large phar file. It might be possible to keep some of the existing code review, but it is noticeable amount of work and I am not sure there is much values in the existing changes.

This also deviate from our standard of using Doxygen for MediaWiki PHP code, and introducing yet another tool has a noticeable overhead. I'd stick to Doxygen as the other MediaWiki libraries.

@hashar Yes, the phpDocumentor.phar file does take up quite a lot of space, and I can't say that I'm not a bit uncomfortable with that as well. However, the documentation interface and ease of use are more modern and cleaner compared to Doxygen. Could we perhaps consider an option like this: downloading the file via a CI job and then using it to generate the documentation? If the only real solution is to stick to Doxygen, then there's nothing more we can do.

@hashar I am okay with re-writing the history to remove this file if necessary. I'm also open to changing the way that documentation for this library is produced if what we have is too unconventional.

We have a dedicated task for discussing how to handle documentation over at T375939, perhaps we should move this discussion there.

Change #1079381 had a related patch set uploaded (by Krinkle; author: Krinkle):

[labs/codesearch@master] write_config: Add design/codex-php to Codesearch "Libraries" index

https://gerrit.wikimedia.org/r/1079381

Change #1079381 merged by jenkins-bot:

[labs/codesearch@master] write_config: Add design/codex-php to Codesearch "Libraries" index

https://gerrit.wikimedia.org/r/1079381

Version 0.1.0 of Codex PHP is now installable in Composer projects. The project is not yet in Packagist, so you'll need to manually specify the git URL in order to use it. IMO this is fine for now, we can reduce friction once we're ready for wider usage.

To install, add the following information to your composer.json file:

{
    "repositories": [
        {
            "type": "vcs",
            "url": "https://gerrit.wikimedia.org/r/design/codex-php"
        }
    ],
    "require": {
        "wikimedia/codex": "0.1.0"
    },
}

Codex PHP 0.1.0 is now available on Packagist.

It can be installed directly with Composer:

composer require wikimedia/codex

Change #1087213 had a related patch set uploaded (by Abaris; author: Abaris):

[mediawiki/core@master] composer.json: Add wikimedia/codex to dependencies

https://gerrit.wikimedia.org/r/1087213

Change #1087213 abandoned by Abaris:

[mediawiki/core@master] composer.json: Add wikimedia/codex to require

Reason:

Per: https://www.mediawiki.org/wiki/Manual:External_libraries

https://gerrit.wikimedia.org/r/1087213

Change #1087213 restored by Abaris:

[mediawiki/core@master] composer.json: Add wikimedia/codex to require

https://gerrit.wikimedia.org/r/1087213

CCiufo-WMF updated the task description. (Show Details)

The initial version of Codex PHP has been released and is available for use through Composer. Follow-up work to make it available in MediaWiki core is at T379662.