Page MenuHomePhabricator

Investigate coupling in PHP code
Closed, ResolvedPublic

Description

Event Timeline

ItamarWMDE renamed this task from Investigate Coupling in PHP code to Investigate coupling in PHP code.Jun 11 2020, 9:56 AM

@Tarrow @Lucas_Werkmeister_WMDE @Ladsgroup When we say "PHP code" (which is pretty broad), do we mean just class dependencies, or is there anything else you had in mind?

I think I was mainly thinking of class dependencies, though I guess there are a few other types of coupling. The one that comes to mind right now is how some callbacks are moved between Lib, Repo, and Client; for instance, both Repo and Client will define a formatter-factory-callback for a data type or data value type, and WikibaseLib’s DataTypeDefinitions::getFormatterFactoryCallbacks() returns those callbacks. The signature of those callbacks (for formatter-factory-callback, two arguments of type string and FormatterOptions, and returns a ValueFormatter) is a type of coupling, I think.

Some hints that might be useful:
You can use https://github.com/mamuz/PhpDependencyAnalysis (the docker work is pretty good actually)

Here's what I use to build graph of classes:

mode: 'usage'
source: './lib'
filePattern: '*.php'
formatter: 'PhpDA\Writer\Strategy\Svg'
target: './phpda-class-lib.svg'
ignore: 'tests'
groupLength: 3
visitorOptions:
   PhpDA\Parser\Visitor\TagCollector: {excludePattern: '/^((?!Wikibase)\w)/'}
   PhpDA\Parser\Visitor\Required\UsedNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/'}
   PhpDA\Parser\Visitor\Required\MetaNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/'}
   PhpDA\Parser\Visitor\Required\DeclaredNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/'}

The /^((?!Wikibase)\w)/ drops any class that doesn't start with Wikibase making the graph much smaller (so your system can easily generate it)

For namespaces:

mode: 'usage'
source: '.'
filePattern: '*.php'
ignore: 'tests'
formatter: 'PhpDA\Writer\Strategy\Svg'
target: './phpda.svg'
groupLength: 1
visitor:
  - PhpDA\Parser\Visitor\TagCollector
  - PhpDA\Parser\Visitor\SuperglobalCollector
visitorOptions:
  PhpDA\Parser\Visitor\Required\DeclaredNamespaceCollector: {minDepth: 2, sliceLength: 2}
  PhpDA\Parser\Visitor\Required\MetaNamespaceCollector: {minDepth: 2, sliceLength: 2}
  PhpDA\Parser\Visitor\Required\UsedNamespaceCollector: {minDepth: 2, sliceLength: 2}
  PhpDA\Parser\Visitor\TagCollector: {minDepth: 2, sliceLength: 2}
ItamarWMDE updated the task description. (Show Details)Jun 12 2020, 3:44 PM

High-level overview of the five aspects Client, DataAccess, Lib, Repo, View:

In the “who uses this” direction:

  • Client is nice and isolated. It’s not used from anywhere else.
  • View is nice and isolated. It’s used by Repo and nowhere else.
  • Repo is almost nice and isolated. It’s used from Client when registering the pageterms API query module, but only when the repo is also installed, so that on repo wikis, pageterms gets the terms of the item on the page you specified, not of the item linked to the page you specified. (This is ugly, and T115117 is the task for fixing it. Little activity since 2015.) It’s used from Lib, in EntityChange::setRevisionInfo(), which gets a generic Content and casts it to an EntityContent, a Repo class. Client and Lib also use it in tests, mostly skipped when the repo is not installed. DataAccess and View don’t use Repo.
  • DataAccess is not used from View, but is used from everywhere else.
  • Lib is used from everywhere.

In the “whom does this use” direction:

  • Lib uses DataAccess, and (in one instance, the aforementioned EntityChange::setRevisionInfo(), plus tests) Repo.
  • DataAccess uses Lib.
  • View uses Lib.
  • Repo uses DataAccess, Lib, and View.
  • Client uses DataAccess and Lib, and (in one instance, the aforementioned pageterms API module, plus tests) Repo.

Lib and DataAccess are sufficiently entangled that I figure we should, for now, treat them as one (and indeed DataAccess is rolled out as part of the Lib extension). I think the first coupling we should get rid of is the use of Repo in Client and Lib.

  • Use of Repo in Client
    • pageterms: Probably fix T115117, should be doable IMHO. But I think Product needs to decide what the expected outcome is – not sure if that would be Sam or Lydia, to be honest.
    • tests: I think we should find a decent solution for tests that require Repo and/or Client, even if the code they’re meant to cover is elsewhere (in Client/Repo or in Lib). Maybe we can put these directly in tests/ rather than repo/tests/, client/tests/, etc.?
  • Use of Repo in Lib
    • EntityChange::setRevisionInfo(): The situation is that EntityChange::setRevisionInfo() is called with a RevisionRecord, and to set the object_id field of the change that it represents, it gets the Content of that RevisionRecord, assumes that it’s an EntityContent, and calls getEntityId() on it. I think it was @Ladsgroup who suggested that we should introduce an interface here, and initially I thought something similar – we already have an EntityHolder interface, we could introduce a similar EntityIdHolder and make EntityContent implement it. (Though EntityHolder is currently in Repo, whereas EntityIdHolder would have to be elsewhere to be useful for this, e. g. in Lib or somewhere in DataModel(Services).) But I’m not convinced this would be very useful. Even if it would remove the reference from EntityChange to the \Wikibase\Repo namespace, EntityChange::setRevisionInfo() would still expect EntityContent in practice, it just wouldn’t be as visible anymore.

      Instead, I think we should challenge the role of this EntityChange class overall. It’s used in Client, so why can it even expect to access a Repo class without crashing on wikis where Repo doesn’t exist? And the answer is that while the class as a whole is used on Client and Repo, this particular method, setRevisionInfo() is only called from Repo. I think this reflects a more general split within this EntityChange class: it has “getter” methods which can be called from Client and Repo, and “setter” methods such as setRevisionInfo() which can only be called from Repo. (With one exception: Client’s InjectRCRecordsJob::getChange() calls $change->setField(), to update the “info” field.) So maybe we should split this class into two parts, so that the parts that can only be called by the Repo, including setRevisionInfo(), are also defined in the Repo extension, where they are free to access classes like EntityContent.

      Alternatively, and much more simply, I suddenly realize: all three non-test callers of setRevisionInfo() already have a Content object. Why not make them get the entity ID (casting to EntityContent as needed, which they, being in Repo, are allowed to do) and pass it into setRevisionInfo()?
    • tests: same as for “Use of Repo in Client”, I figure.

(I don’t think this investigation is complete yet, but what I wrote above can be reviewed already, and is probably enough for today.)

Change 605835 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Make NoBadDependencyUsageTest thresholds sharp

https://gerrit.wikimedia.org/r/605835

Change 605835 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Make NoBadDependencyUsageTest thresholds sharp

https://gerrit.wikimedia.org/r/605835

  • tests: I think we should find a decent solution for tests that require Repo and/or Client, even if the code they’re meant to cover is elsewhere (in Client/Repo or in Lib). Maybe we can put these directly in tests/ rather than repo/tests/, client/tests/, etc.?

Not anything against this proposal just want to mention we have dedicated jenkins jobs that run the whole tests on "repo" mode and one dedicated "client" mode to make sure these two work independently. If "shared tests" starts running them without making sure it's not taken into account, it'll explode majestically.

Instead, I think we should challenge the role of this `EntityChange` class overall. It’s used in Client, so why can it even expect to access a Repo class without crashing on wikis where Repo doesn’t exist? And the answer is that while the class as a whole is used on Client and Repo, this particular method, `setRevisionInfo()` is only called from Repo. I think this reflects a more general split within this `EntityChange` class: it has “getter” methods which can be called from Client and Repo, and “setter” methods such as `setRevisionInfo()` which can only be called from Repo. (With one exception: Client’s `InjectRCRecordsJob::getChange()` calls `$change->setField()`, to update the “info” field.) So maybe we should split this class into two parts, so that the parts that can only be called by the Repo, including `setRevisionInfo()`, are also defined in the Repo extension, where they are free to access classes like `EntityContent`.

Looking beyond the code, this seems like a coupling on the logic too which is actually a legitimate one. The client needs to use the repo data to inject RC records and show it. What usually happens is having an API (in the general sense of the term) to make this coupling as loose as possible which can happen though shared class/interface in lib/. Not anything against your comment though.

I'm a very visual person so I went ahead and built some graphs. The first one is high level look of dependencies in the components (lib, view, client, repo, etc.).


The second one is classes that have cross component dependency (Class foo in client depending on class bar in repo):

The third one is basically the second graph but removing wikibase libraries that are out of the extension:

The last one is the third graph but assumes repo and view and the same (and ignores their dependencies):

(It's basically done mostly manually)

The phpda files:

mode: 'usage'
source: '.'
filePattern: '*.php'
formatter: 'PhpDA\Writer\Strategy\Script'
target: './phpda-all.dot'
ignore: 'tests'
groupLength: 3
visitor:
  - PhpDA\Parser\Visitor\TagCollector
  - PhpDA\Parser\Visitor\SuperglobalCollector
visitorOptions:
   PhpDA\Parser\Visitor\TagCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2, sliceLength: 2}
   PhpDA\Parser\Visitor\Required\UsedNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2, sliceLength: 2}
   PhpDA\Parser\Visitor\Required\MetaNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2}
   PhpDA\Parser\Visitor\Required\DeclaredNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2, sliceLength: 2}

This produces the dot files for classes and then you can modify the file.

Building graph on component level:

mode: 'usage'
source: '.'
filePattern: '*.php'
formatter: 'PhpDA\Writer\Strategy\SVG'
target: './phpda-all-ns.svg'
ignore: 'tests'
groupLength: 3
visitor:
  - PhpDA\Parser\Visitor\TagCollector
  - PhpDA\Parser\Visitor\SuperglobalCollector
visitorOptions:
   PhpDA\Parser\Visitor\TagCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2, sliceLength: 2}
   PhpDA\Parser\Visitor\Required\UsedNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2, sliceLength: 1}
   PhpDA\Parser\Visitor\Required\MetaNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2, sliceLength: 1}
   PhpDA\Parser\Visitor\Required\DeclaredNamespaceCollector: {excludePattern: '/^((?!Wikibase)\w)/', minDepth: 2, sliceLength: 2}

Copy over from meeting notes:

SummaryRepo has a few uses in Lib and Client. DataAccess and Lib are entangled, Repo and Client use Lib extensively.
Risks, threats, challenges identifiedLots of Lib cannot be extracted into composer packages without some MediaWiki packages (e. g. rdbms) being extracted first.
Opportunities noticedUses of Repo in Lib and Client can be removed with not too much effort. View can be merged into Repo.
Other remarksWe may want to monorepo-ize MediaWiki core as well (to make rdbms and other libs available as composer packages).
Ladsgroup closed this task as Resolved.Jun 24 2020, 10:11 AM