Page MenuHomePhabricator

The Great Namespaceization and Reorg
Open, LowPublic

Description

Objective

  1. MediaWiki should use namespaces in a consistent way with low maintenance overhead.
  1. Extension registration should be fast.

Problem statement

  • The use of namespaces in MediaWiki is inconsistent. Directory names are not necessarily related to namespace names.
  • The convention of listing the path of all classes in autoload.php has become unwieldy.
  • Extension registration is slow, due to the need to merge enormous class maps. Class names in the top level use generic names which risk conflicting with the PHP core.

Proposal

I propose moving all MW core classes to the MediaWiki\ namespace, with class_alias() for backwards compatibility. Autoloading would be done using PSR-4, which maps namespaces to directory names.

To work around bugs in PHP, class_alias() calls will be inserted in the file scope at the end of the file defining the new class. The autoloader will have an array mapping each legacy class name to the path of the new file. This is how class_alias() is already used in MW core. Whether this legacy autoloader is integrated with the PSR-4 autoloader or is registered separately is an implementation detail.

A notable difference between the current layout and that described by PSR-4 is that the directory names must match the namespace names, including capitalisation. We already have a MediaWiki\Linker namespace, but the files would be moved from includes/linker to MediaWiki/Linker.

Another difference is that there must only be one class per file.

MediaWiki has more directories than namespaces, and this makes sense given how directories are used in editors. Additional namespaces only really become essential when there is a risk of class name conflicts. To use PSR-4, we will have to trade off between our desire to limit the number of namespaces for usability, while still making files easy to find. The plan I detail below generally leans towards introducing additional namespaces.

There is the question of plural names. We (inconsistently) use plurals to indicate that a directory contains multiple things of a certain type, e.g. each member of includes/ is an include, each member of includes/specials is a special. This makes more sense for directories than for namespaces, given the way each is accessed: the class name MediaWiki\Specials\Activeusers would be strange to see in calling code or an error message. The namespace hierarchy is more of an ontology than a collection. I propose to use singular namespace names where the members of the namespace are are singular instance of the parent concept. So: specials -> Special, jobs -> Job, skins -> Skin. But perhaps jobqueue/utils -> JobQueue\Utils since each class also contains utils, not a single util. And perhaps services -> Services, since it has container classes, the members are not themselves services.

There is the question of whether to retain class name prefixes and suffixes. For example, we have 116 special pages, certainly enough to deserve a separate directory. But should we have MediaWiki\Special\SpecialActiveusers or MediaWiki\Special\Activeusers? I think redundant prefixes should be removed, except when the resulting name becomes very ambiguous. We should keep in mind that text editors often only display the base name.

Some directories do not make sense as modules or namespaces, and exist only out of a desire to reduce the number of PHP files in includes/. These should probably be moved into their respective parent directories. Specifically:

  • cache
  • clientpool
  • compat
  • debug
  • exception
  • json

But in the other direction, I would propose:

  • Sanitizer, MagicWord, MagicWordArray -> Parser\
  • ForkController -> Maintenance\
  • MessageCache -> Language\
  • Message, RawMessage -> Language
  • cache/localisation -> Language\LocalisationCache
  • A directory for non-class code: WebStart.php, Setup.php, DefaultSettings.php, Defines.php, GlobalFunctions.php, OutputHandler.php, NoLocalSettings.php. Namespacing of global functions and constants might be easier with PHP 7 group use, I'm not proposing that at the moment.
    • Also shell scripts, config files?
  • A namespace for the web app setup, request routing and response (Request?) MediaWiki, PathRouter, AjaxDispatcher, AjaxResponse, WebRequest, WebRequestUpload, FauxRequest, DerivativeRequest, WebResponse, OutputPage, HeaderCallback, exception/*
  • Watchlist: WatchedItem, WatchedItemQueryService, WatchedItemQueryServiceExtension
  • StubObject: StubObject, DeprecatedGlobal
  • A namespace for revision, page storage (ArticleStore?): Revision, RevisionList, MergeHistory, MovePage, HistoryBlob, LinkBatch, LinkCache,
  • A namespace for links storage: BacklinkCache, LinksUpdate, LinksDeletionUpdate
  • UserCache -> User\UserCache
  • Feed: FeedItem, ChannelFeed, RSSFeed, AtomFeed, FeedUtils

The classes in libs and utils would be better placed under the Wikimedia\ namespace, corresponding to the Composer vendor namespace, reflecting our aspirations for them to be separate from MediaWiki.

Other special cases:

  • jobqueue/jobs -> JobQueue\Job
  • includes/libs/rdbms/defines.php: migrate to namespaced constants.
  • specials -> Special
  • specials/helpers -> Special\Helper
  • specials/pagers -> Special\Pager
  • languages/classes -> MediaWiki\Language
  • languages/data/ZhConversion -> MediaWiki\Languages\Data\ZhConversion
  • maintenance: we can map MediaWiki\Maintenance to this directory for now
  • tests:
    • Some test classes are currently in the namespace they cover, I don't think that works with PSR-4. They should be in MediaWiki\Test\PHPUnit instead.
    • Test classes in MediaWiki\Test\PHPUnit, parser test runner in MediaWiki\Test\Parser
    • Classes under libs/ will be under the Wikimedia namespace, so associated tests should also be in the Wikimedia namespace
    • Map namespace MediaWiki\Test to directory tests\MediaWiki

New directory names will match namespace names, except for maintenance and tests. So for example, the language class files will not stay in languages/.

Existing namespaced extensions mostly use the top level, which I think is fine as long as the name is a distinctive product name. For generic descriptive names, I would prefer MediaWiki\Extension over MediaWiki\Extensions.

The transition should be fully scripted so that it can be used to rebase open changesets in Gerrit.

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedLegoktm
ResolvedLegoktm
Resolvedtstarling
ResolvedLucas_Werkmeister_WMDE
ResolvedTheresNoTime
ResolvedJdforrester-WMF
ResolvedJoe
ResolvedDzahn
Resolvedhashar
ResolvedJdforrester-WMF
Resolved Ladsgroup
ResolvedMoritzMuehlenhoff
Resolvedjijiki
ResolvedMoritzMuehlenhoff
ResolvedTrizek-WMF
ResolvedDzahn
Resolved Gilles
ResolvedDzahn
ResolvedRequestPapaul
Resolvedjijiki
DeclinedNone
ResolvedDzahn
ResolvedDzahn
ResolvedPapaul
ResolvedCmjohnson
ResolvedRequestCmjohnson
ResolvedRequestPapaul
ResolvedAndrew
ResolvedArielGlenn
ResolvedDzahn
ResolvedLegoktm
ResolvedPapaul
ResolvedDzahn
Declined Gilles
ResolvedVolans
ResolvedDzahn
ResolvedLegoktm
ResolvedPleaseStand
ResolvedJoe
Resolvedtstarling
ResolvedArielGlenn
ResolvedJoe
Resolvedtstarling
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedLegoktm
ResolvedJdforrester-WMF
ResolvedDaimona
ResolvedDaimona
ResolvedJdforrester-WMF
ResolvedJoe
ResolvedJMeybohm
ResolvedJoe
ResolvedJoe
ResolvedJoe
ResolvedJoe
ResolvedKrinkle
ResolvedJoe
ResolvedClement_Goubert
ResolvedClement_Goubert
ResolvedClement_Goubert
ResolvedMainframe98
ResolvedJoe
ResolvedZabe
Open Ladsgroup

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

One concern I have with this is that it may make eval.php/shell.php more annoying to use. I’m already noticing this when working with Wikibase (having to add \Wikibase, \Wikibase\Repo etc. all the time), and I fear it will become much worse when core classes are namespaced too. For a while, the backwards compatibility aliases would solve that, but I assume we don’t want to keep those around forever. Perhaps we can have some kind of auto-use in PsySH (assuming many class names would still be unambiguous), or pre-import the most common names during shell bootstrap?

(This isn’t meant as an argument against the namespaceization, nor does it need to block it as long as we keep the compatibility aliases, but I’m curious what others think about it.)

Probably could be handled with a custom tab completer in PsySH, I'd like to look into that at some point (tab completion is frustratingly random currently - it probably doesn't help that PHP is using a crippled cléone of readline by default).

I boldly propose implanting this at least partially in Wikimania hackathon if anyone wants to help out.

One concern I have with this is that it may make eval.php/shell.php more annoying to use. I’m already noticing this when working with Wikibase (having to add \Wikibase, \Wikibase\Repo etc. all the time), and I fear it will become much worse when core classes are namespaced too. For a while, the backwards compatibility aliases would solve that, but I assume we don’t want to keep those around forever. Perhaps we can have some kind of auto-use in PsySH (assuming many class names would still be unambiguous), or pre-import the most common names during shell bootstrap?

(This isn’t meant as an argument against the namespaceization, nor does it need to block it as long as we keep the compatibility aliases, but I’m curious what others think about it.)

I would personally make a new php file in my IDE and copy paste the code back and forth most of the time. This is safer specially when you want to run code against production (things can go wrong so easily there).

In comparison, python also has lots of namespacing and I never heard anyone would be against because in python REPL there is no auto completion.

Change 573775 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Fix the namespace of SpecialPageFactory

https://gerrit.wikimedia.org/r/573775

Change 573775 merged by jenkins-bot:
[mediawiki/core@master] Fix the namespace of SpecialPageFactory

https://gerrit.wikimedia.org/r/573775

Change 577012 had a related patch set uploaded (by Tim Starling; owner: Tim Starling):
[mediawiki/core@master] Add PSR-4 mappings for existing namespaces and the top level

https://gerrit.wikimedia.org/r/577012

We already ran into this issue with renaming RDBMS classes, and found a very simple workaround for it: The alias must be in the same file as the real class, and both the class and alias must be in the autoloader. E.g. Foo.php will contain class Foo {} and class_alias( Foo::class, 'FooAlias' ) and the autoloader will contain an entry for both Foo and FooAlias. This is already enforced by a phpunit structure test in MediaWiki core.

I tried moving the parser classes to a namespace in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/578487, and this workaround didn't work, it issues a warning. To boil it down, in eval.php:

> spl_autoload_register( function( $class ){ print "Autoload: $class\n"; } );
> class A { function foo( A $a ) {} }
> class C extends A { function foo( B $b ) { var_dump( $b ); } }
Warning: Declaration of C::foo(B $b) should be compatible with A::foo(A $a) in /srv/mw/core/maintenance/eval.php(78) : eval()'d code on line 1
> class_alias( 'A', 'B' );
> ( new C )->foo( new A );
object(A)#349 (0) {
}

The autoloader does not fire for B when defining C, which we knew already, but for some reason the consequences didn't click. We can't have production code issuing warnings when loading class files. Calling the function works, the type hint passes, but declaring the class does not work. Moving core classes breaks all parameter type declarations in extensions that are validated at parse-time due to extending a core class or interface.

Running this on 3v4l.org shows that the autoloader *does* fire for B on PHP 7.4+. So the solution is to delay this project until MW requires PHP 7.4.

We already ran into this issue with renaming RDBMS classes, and found a very simple workaround for it: […]

[…] this workaround didn't work, it issues a warning. To boil it down, in eval.php:

> spl_autoload_register( function( $class ){ print "Autoload: $class\n"; } );
> class A { function foo( A $a ) {} }
> class C extends A { function foo( B $b ) { var_dump( $b ); } }
Warning: Declaration of C::foo(B $b) should be compatible with A::foo(A $a) in /srv/mw/core/maintenance/eval.php(78) : eval()'d code on line 1
> class_alias( 'A', 'B' );
> ( new C )->foo( new A );
object(A)#349 (0) {
}

I can reproduce this warning indeed. The above code runs class_alias for class A running after class C, though,, not after class A as I'd expect. Is that intentional? With the below order PHP does not issue any warning:

> class A { function foo( A $a ) {} }
> class_alias( 'A', 'B' );

> class C extends A { function foo( B $b ) { var_dump( $b ); } }

> ( new C )->foo( new A );
class A#182 (0) {
}

It is intentional. Sorry, I should have used different classes for the type declaration and the instantiated class to make this clearer. Let me just use the names from the actual test failure:

class ContentHandler {
    function getDataForSearchIndex(MediaWiki\Parser\ParserOutput $out) {}
}

class EntityHandler extends ContentHandler {
    function getDataForSearchIndex(ParserOutput $out) {}
}

The second class declaration produces warning. An EntityHandler is evidently created before the ParserOutput class is loaded.

Change 577012 merged by jenkins-bot:
[mediawiki/core@master] Add PSR-4 mappings for existing namespaces and the top level

https://gerrit.wikimedia.org/r/577012

It is intentional. Sorry, I should have used different classes for the type declaration and the instantiated class to make this clearer. Let me just use the names from the actual test failure:

class ContentHandler {
    function getDataForSearchIndex(MediaWiki\Parser\ParserOutput $out) {}
}

class EntityHandler extends ContentHandler {
    function getDataForSearchIndex(ParserOutput $out) {}
}

The second class declaration produces warning. An EntityHandler is evidently created before the ParserOutput class is loaded.

Ah okay. There's something I'm missing though as I currently see two possible scenarios in my mind, neither of which is an issue:

  • Maybe PHP does autoload classes it sees in typehints. If this is the case, then on either type hint, it would load the class accordingly and always get both the class and the typehint in one go.
  • Maybe PHP doesn't autoload classes it sees in typehints. If this is the case, then both for aliased classes and regular classes mentioned in typehint it would not know the true identify of what the typehint refers to. But maybe it makes up the identity on the spot based solely on the string, and thus not have the knowledge it would normally be able to get about classes being identical.

I suspect the latter is the case, which is rather dissapointing from PHP's side. In my opinion, either PHP should trigger the autoloader and support aliases via the typehints true identity, or decide not to autoload classes and thus not decide on the compatibility until such time it does decide resolve both typehint(s).

  • Maybe PHP does autoload classes it sees in typehints. If this is the case, then on either type hint, it would load the class accordingly and always get both the class and the typehint in one go.
  • Maybe PHP doesn't autoload classes it sees in typehints. If this is the case, then both for aliased classes and regular classes mentioned in typehint it would not know the true identify of what the typehint refers to. But maybe it makes up the identity on the spot based solely on the string, and thus not have the knowledge it would normally be able to get about classes being identical.

It is the former in PHP 7.4+ and the latter in earlier versions. So as I say, after PHP 7.4 we are good to go. Right now, as you say, it compares the class names of the undeclared classes in the type hints and raises a warning if they are not the same.

kostajh changed the task status from Open to Stalled.Jun 1 2021, 8:38 PM

It is the former in PHP 7.4+ and the latter in earlier versions. So as I say, after PHP 7.4 we are good to go. Right now, as you say, it compares the class names of the undeclared classes in the type hints and raises a warning if they are not the same.

Marking as stalled pending use of PHP 7.4 (blocked on T278139: Drop PHP 7.3 support in MediaWiki when appropriate)

It is the former in PHP 7.4+ and the latter in earlier versions. So as I say, after PHP 7.4 we are good to go. Right now, as you say, it compares the class names of the undeclared classes in the type hints and raises a warning if they are not the same.

Marking as stalled pending use of PHP 7.4 (blocked on T278139: Drop PHP 7.3 support in MediaWiki when appropriate)

Noting that as of today we are 100% on php 7.4. We probably need to wait a bit before moving forward with this but I think it's close to being unblocked now.

Correction: We are mostly done but apis and job runners have not moved there yet.

taavi changed the task status from Stalled to Open.Oct 9 2022, 7:49 PM

For migration of extensions depending on core classes, you can use https://lsc.toolforge.org (let me know if you want to be added to the list of people allowed making large scale patches)

Change 842932 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Move ForkController to MediaWiki/Maintenance/ namespace

https://gerrit.wikimedia.org/r/842932

Change 842932 merged by jenkins-bot:

[mediawiki/core@master] Move ForkController to MediaWiki/Maintenance/ namespace

https://gerrit.wikimedia.org/r/842932

Change 842944 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Maintenance: Move OrderedStreamingForkController to PSR-4 namespace

https://gerrit.wikimedia.org/r/842944

Change 842944 merged by jenkins-bot:

[mediawiki/core@master] Maintenance: Move OrderedStreamingForkController to PSR-4 namespace

https://gerrit.wikimedia.org/r/842944

Change 843471 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Feed: Move feed-related classes to Feed/ and namespace them

https://gerrit.wikimedia.org/r/843471

Change 843471 merged by jenkins-bot:

[mediawiki/core@master] Feed: Move feed-related classes to Feed/ and namespace them

https://gerrit.wikimedia.org/r/843471

Change 849126 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Reorg: Move StubObject classes in includes to its own directory

https://gerrit.wikimedia.org/r/849126

Change 849126 merged by jenkins-bot:

[mediawiki/core@master] Reorg: Move StubObject classes in includes to its own directory

https://gerrit.wikimedia.org/r/849126

Change 849554 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Reorg: Move some of request related classes to MediaWiki/Request

https://gerrit.wikimedia.org/r/849554

Change 849554 merged by jenkins-bot:

[mediawiki/core@master] Reorg: Move some of request related classes to MediaWiki/Request

https://gerrit.wikimedia.org/r/849554

From https://integration.wikimedia.org/ci/job/mwext-php74-phan-docker/6764/console:

includes/View/AbuseFilterViewList.php:164 UnusedPluginSuppression Plugin BuiltinSuppressionPlugin suppresses issue SecurityCheck-ReDoS on this line but this suppression is unused or suppressed elsewhere

(for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/AbuseFilter/+/849574)

This just made me realize that taint-check does not read class aliases when determining the taintedness of hardcoded methods. In practice, if WebRequest is namespaced, taint-check no longer knows that getVal() etc. return an unsafe value. I think the patch above should be reverted immediately, then we can fix this, and then re-do the patch.

Change 850081 had a related patch set uploaded (by Zabe; author: Zabe):

[mediawiki/core@master] Revert "Reorg: Move some of request related classes to MediaWiki/Request"

https://gerrit.wikimedia.org/r/850081

This just made me realize that taint-check does not read class aliases when determining the taintedness of hardcoded methods. In practice, if WebRequest is namespaced, taint-check no longer knows that getVal() etc. return an unsafe value. I think the patch above should be reverted immediately, then we can fix this, and then re-do the patch.

So, thinking about this... In the long run, I believe it would be better to have those annotations directly on the method, rather than hardcoded in the plugin. That way, the annotations are much easier to find, change, read, and document. This should be done carefully though, to make sure that we don't break anything. I think this could be a possible game plan:

  • Copy annotations from taint-check to WebRequest (and potentially other classes, but not all of them for now)
    • Backport said change to all supported releases, just to be sure.
  • Add a test in core to ensure that said annotations are working (i.e., write a test file that uses the unsafe methods, make sure that it's analyzed by phan, make sure it reports issues). Ideally, this would be done for all methods hardcoded in taint-check, not just the ones we're migrating now.
  • Resolve T291743 so that caused-by lines are easier to read when annotations are involved
  • In taint-check, remove hardcoded taintedness from methods that are now annotated in core
  • Release new version of taint-check and mw-phan, upgrade to it in core
  • Finally, move the WebRequest class and its friends.

If it sounds complex, it's because it is complex. However, some of the steps would make it easier to rename other classes in the future.

Just quick q: What classes are hard-coded like this? While you change the phan rules, I move the other classes.

Change 850081 merged by jenkins-bot:

[mediawiki/core@master] Revert "Reorg: Move some of request related classes to MediaWiki/Request"

https://gerrit.wikimedia.org/r/850081

Just quick q: What classes are hard-coded like this? While you change the phan rules, I move the other classes.

Eh... See https://github.com/wikimedia/phan-taint-check-plugin/blob/master/MediaWikiSecurityCheckPlugin.php#L57 :)

I started working on T321806 to unblock this task, but it could take some time.

Change 850424 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] Reorg: Move some of request related classes to MediaWiki/Request

https://gerrit.wikimedia.org/r/850424

I have been creating a lot of patches here which can be spammy for everyone subscribed to this ticket. I'm moving them to T321882: Reduce number of files directly under includes/

Change 842538 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] title: Move Title.php to includes/title/ where related classes are

https://gerrit.wikimedia.org/r/842538

Change 842538 merged by jenkins-bot:

[mediawiki/core@master] title: Move Title.php to includes/title/ where related classes are

https://gerrit.wikimedia.org/r/842538

I have been creating a lot of patches here which can be spammy for everyone subscribed to this ticket. I'm moving them to T321882: Reduce number of files directly under includes/

Can I suggest we also do something like that for maintenance/? That folder has a bunch of partially related files that could be organized. Things like

  • update scripts that are part of update.php or manual updating, but unlikely to be run independently (eg migrateRevisionActorTemp.php, updateRestrictions.php, populateChangeTagDef,php)
  • scripts that are for people developing in core, like generateConfigSchema.php, formatInstallDoc.php, manageForeignResources.php, findClasses.php, findDeprecated.php
  • scripts relating to changing server configuration (eg migrateUserGroup.php, namespaceDupes.php, renameRestrictions.php
  • scripts to get some information shown on the command line (eg showSiteStats.php, version.php, showJobs.php, parse.php, lag.php)

and other potential groups. If there were some folders organizing these then it would be easier to find things in my view. You could have folders like maintenance/updates/, maintenance/configuration/, maintenance/developers/, etc.

I totally agree we need to clean up maintenance/. I have removed numerous scripts after the clean up of T259771: RFC: Drop support for older database upgrades but the biggest complexity here is that they are being used by being called directly and thus depend on the path and there is not an easy way to make a "class alias" for a path, possibly you can do symlinks but they won't work cross platforms and all sorts of mess around that.

The good news is yesterday I closed T99268: RfC: Create a proper command-line runner for MediaWiki maintenance tasks making maint scripts runnable via class names rather than path. The only thing left is to actually deprecate calling maint scripts directly (via a deprecating warning, I don't know how yet tbh?) switch all of the calls to use run.php instead and then drop the support for direct call in 1.41 and then let the party begin. Does that sound good to you?

Change 893535 had a related patch set uploaded (by Jforrester; author: Jforrester):

[mediawiki/core@master] Reorg: Namespace the Title class

https://gerrit.wikimedia.org/r/893535

Change 893535 merged by jenkins-bot:

[mediawiki/core@master] Reorg: Namespace the Title class

https://gerrit.wikimedia.org/r/893535