Page MenuHomePhabricator

Teach LibUp how to migrate to new phan using ConfigBuilder
Open, Needs TriagePublic

Description

https://gerrit.wikimedia.org/r/#/c/mediawiki/tools/phan/+/589603/ changed mediawiki-phan-config to use a new ConfigBuilder class. This new class is used in place of direct array manipulation (details on gerrit). The new config is not compatible with the previous one, so we need to migrate it. I think it's trivial to add a couple regexps to LibUp for common fixes (detailed below), that should work for 99% of the config files (manual review will do the rest).


(Note, the following are just high-level ideas)

Find: return require __DIR__ . '/../vendor/mediawiki/mediawiki-phan-config/src/config.php';
Replace: $cfg = require ...;\n return $cfg->make();

Find: $varname['directory_list'] = array_merge( $varname['directory_list'], [ /* only extension PATHS here */ ] );
Replace: $varname->addExtensionDependencies( /*List of extension NAMES extracted from the match above */ );

Find: $varname['exclude_analysis_directory_list'] = array_merge( $varname['exclude_analysis_directory_list'], [ /* only extension PATHS here */ ] );
Replace: ''

Find: $varname['file_list'][] = 'Something';
Replace: $varname->addFiles( 'Something' );

Find: $varname['suppress_issue_types'][] = 'Something';
Replace: $varname->suppressIssueTypes( 'Something' );

Find: $varname['null_casts_as_any_type'] = true;
Replace: $varname->allowNullCastsAsAnyType( true )

Find: $varname['scalar_implicit_cast'] = true;
Replace: $varname->allowScalarImplicitCasts( true )

Find: return $varname;/*end of file*/
Replace: return $varname->make();


The checks above should work fine most of the times, with a few exceptions. Manual review is always necessary, and is not limited to fixing those exceptions. We should also simplify code like

$varname->addFiles( 'File1' );
$varname->addFiles( 'File2' );
$varname->addFiles( 'File3' );

to

$varname->addFiles( 'File1', 'File2', 'File3' );

Event Timeline

If we're going to redo how we configure phan, can we adopt a machine-readable format like JSON/yaml/TOML? (JSON is my pick, but people like their comments...). Long-term I think it will make migrations much easier because regexes dont' cut it after a while, with people doing their own customizations, etc.

The main advantage of sticking with upstream's config.php array format is that we can just point people to the upstream documentation, but if we're going to walk away from that, I'd rather go to a machine-readable format instead of some other custom PHP format.

Maybe ConfigBuilder could learn a ->readFromJsonFile() or something, which handles 95% of extensions, and anything needing more sophisticated stuff can use the PHP way?

If we're going to redo how we configure phan, can we adopt a machine-readable format like JSON/yaml/TOML?

In theory, yes, but would there be any additional (practical) benefit? I went for the Builder approach because it fixes the most obvious problems of array manipulation (e.g. typos, and IDEs can autocomplete stuff). As you noted, machine-readable means less expressive power (e.g no way to conditionally load stuff), so we should understand whether we can afford that. I don't have an answer right now.

The main advantage of sticking with upstream's config.php array format is that we can just point people to the upstream documentation

Good point. Perhaps all methods in ConfigBuilder should link to the respective piece of docs in phan. Note, I strongly suggest not to copy the docs verbatim from phan (as we previously would) because that has a bad tendency of getting outdated.

Maybe ConfigBuilder could learn a ->readFromJsonFile() or something, which handles 95% of extensions, and anything needing more sophisticated stuff can use the PHP way?

That's true, but if a config file is simple enough to be easily migrated to JSON/yaml, I also expect it to be easily readable with regexps (as you mentioned, 95% of extensions are probably in this category).

If we're going to redo how we configure phan, can we adopt a machine-readable format like JSON/yaml/TOML?

In theory, yes, but would there be any additional (practical) benefit?

Migrations are usually 100% possible to do automatically, like all the composer.json manipulations we've done. Compare that with us trying to manipulate Gruntfile.js, which has been not that possible (e.g. removing grunt-jsonlint, automatically migrating to grunt-eslint). I can't predict how often

I went for the Builder approach because it fixes the most obvious problems of array manipulation (e.g. typos, and IDEs can autocomplete stuff). As you noted, machine-readable means less expressive power (e.g no way to conditionally load stuff), so we should understand whether we can afford that. I don't have an answer right now.

Based on this search it doesn't look like conditionally loading stuff is a use-case that we should be primarily supporting, it's just used by the few massive/big projects that are already outliers.

The main advantage of sticking with upstream's config.php array format is that we can just point people to the upstream documentation

Good point. Perhaps all methods in ConfigBuilder should link to the respective piece of docs in phan. Note, I strongly suggest not to copy the docs verbatim from phan (as we previously would) because that has a bad tendency of getting outdated.

Maybe ConfigBuilder could learn a ->readFromJsonFile() or something, which handles 95% of extensions, and anything needing more sophisticated stuff can use the PHP way?

That's true, but if a config file is simple enough to be easily migrated to JSON/yaml, I also expect it to be easily readable with regexps (as you mentioned, 95% of extensions are probably in this category).

The main trouble I've run into with using regexes has been slight changes to spacing and order that throw things off. So far all of the phan config files were mostly created by a few people using scripts/copy&paste, but it'll all diverge over time. Of course, it's all workaroundable, but testing regexes against all MW extensions/skins constantly gets annoying. That's why I just prefer machine readable formats which are more predictable.

Migrations are usually 100% possible to do automatically, like all the composer.json manipulations we've done. Compare that with us trying to manipulate Gruntfile.js, which has been not that possible (e.g. removing grunt-jsonlint, automatically migrating to grunt-eslint). I can't predict how often

Yeah, sorry, I meant something like "how often might we need that?". And it's certainly not easy to estimate that.

I went for the Builder approach because it fixes the most obvious problems of array manipulation (e.g. typos, and IDEs can autocomplete stuff). As you noted, machine-readable means less expressive power (e.g no way to conditionally load stuff), so we should understand whether we can afford that. I don't have an answer right now.

Based on this search it doesn't look like conditionally loading stuff is a use-case that we should be primarily supporting, it's just used by the few massive/big projects that are already outliers.

I think rather than conditional loading, we should add built-in ways (methods in the ConfigBuilder, a special key for JSON, etc.) to cover the existing use cases.

The main trouble I've run into with using regexes has been slight changes to spacing and order that throw things off. So far all of the phan config files were mostly created by a few people using scripts/copy&paste, but it'll all diverge over time. Of course, it's all workaroundable, but testing regexes against all MW extensions/skins constantly gets annoying. That's why I just prefer machine readable formats which are more predictable.

Regexps usually get annoying pretty quick, yes. Just to be clear, I'm not opposed to the machine readable format; my main concern is that we would lose one of the advantages of the ConfigBuilder class, i.e. autocompletion. However, it's still better than the "old" approach with direct array manipulation.

@Daimona: is this ready to be implemented with the current version of mediawiki-phan-config? What's the minimum version that phan needs to be at before this can be used?

@Daimona: is this ready to be implemented with the current version of mediawiki-phan-config? What's the minimum version that phan needs to be at before this can be used?

Almost. There's a one-line BC hack for now, we can remove it and tag a release once LibUp is ready. I think a sensible approach might be to merge the patch against LibUp, then tag a release of mw-phan and adjust the minimum version in LibUp to whatever the release will be.