Page MenuHomePhabricator

RfC: Create a proper command-line runner for MediaWiki maintenance tasks
Open, NormalPublic

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Do you envision this as somehow interacting with the mwscript stuff, or are concerns separate?

Ltrlg added a subscriber: Ltrlg.Jun 10 2015, 4:01 PM

the files in maintenance/ must be audited to ensure they are safe to include (i.e., they have no code with side-effects in file scope)

Doesn't that conflict with being a maintenance script that is executable by itself?

The top-level entry-point script could then walk the maintenance/ hierarchy and include each PHP file it encounters. It could then iterate on get_declared_classes(), looking for Maintenance subclasses.

That doesn't work for extensions. Please come up with a way that does. E.g. some way to register maintenance scripts instead.

Spage assigned this task to ori.Jun 10 2015, 11:08 PM
Spage added a subscriber: Spage.

This was discussed in IRC office hour, summary. There is general support for the idea, TIm Starling concluded

update the RFC, propose an interface, write to wikitech-l, then maybe we should schedule another meeting

Reedy added a comment.EditedDec 7 2015, 11:37 PM

Copied from T120757, mostly complementary stuff/extra ideas

// Detect $IP
$IP = getenv( 'MW_INSTALL_PATH' );
if ( $IP === false ) {
	$IP = __DIR__ . '/../..';
}
// Require base maintenance class
require_once "$IP/maintenance/Maintenance.php";

All Maintenance scripts in extensions end up with code like this. It's messy, and it's a PITA. Even more so with things like WikimediaMaintenance

Don't know if this should be an rfc... But IMHO, we should have some sort of wrapper script, which invokes maintenance scripts, wherever they lie.

So all maintenance scripts should be in the autoloader, and then you run something like

php maintenance/run.php ScriptNameHere --arg=1 --arg2=3

I'd hazard a guess, that something like this would be further beneficial when running hhvm in repo authoritative mode, where we end up with an "all in one" type of file...

But then, at the end, we wouldn't need the block of code at the top of every extension maintenance script, Or the

require_once __DIR__ . '/Maintenance.php';

in core ones, almost randomly guessing where Maintenance.php, which is not always obviously, especially if you symlink things in.

I'm up for a discussion on "how" scripts are invoked (wrapper script? some other magic?); I'm certainly not married to teh idea of having a wrapper runner script etc, but essentially having them "registered" (some new tracking global? Ugh. Certainly don't think doing it via reflection is the best idea either) as a maintenance script, and in the autoloader... Minimal extra overhead if they're not used, mostly an entry in the array

RobLa-WMF mentioned this in Unknown Object (Event).May 4 2016, 7:33 PM
Krinkle renamed this task from RfC: Create a proper command-line runner for MediaWiki maintenance tasks to RfC: Create a proper command-line runner for MediaWiki maintenance tasks.Jan 10 2018, 10:04 PM
Legoktm removed ori as the assignee of this task.Jun 5 2018, 1:37 AM
Legoktm added a subscriber: Legoktm.

Unassigning from ori since I assume he's not planning to work on this soon. I'm looking into writing up a more detailed RfC for this soon, but not cookie-licking it yet.

fbstj added a subscriber: fbstj.Jun 16 2018, 3:53 PM

This was discussed in IRC office hour, summary.

(The above link is broken, apparently due to T113000. The logs are available on-wiki.)

Legoktm claimed this task.Aug 17 2018, 7:08 PM
Legoktm updated the task description. (Show Details)

This RfC is now ready for discussion, I've taken Ori's initial details and expanded it into a proposed implementation: https://www.mediawiki.org/wiki/Requests_for_comment/Proper_command-line_runner_for_maintenance_tasks

daniel moved this task from Backlog to Inbox on the TechCom-RFC board.Aug 17 2018, 8:34 PM
daniel moved this task from Inbox to Request IRC meeting on the TechCom-RFC board.Aug 22 2018, 8:44 PM
Anomie added a subscriber: Anomie.Aug 24 2018, 3:30 PM

With regard to update.php, we sometimes need to run maintenance tasks from in the middle of update.php's updating of the database schema.

The main requirement this puts on a maintenance task under this RFC is that one task (update.php) needs to be able to launch another and get back some indication of success or failure, the child task can't just exit() on failure.

The classes that power maintenance scripts will be moved out of the entrypoints, probably into includes/Maintenance. This will probably happen gradually. The entry points will continue to function, but eventually will emit deprecation notices.
They'll still extend the Maintenance class, and continue to be included in the autoloader.

I note the existing Maintenance class has both definitions of parameters and such, and also code for setting up the environment and parsing command lines. Somehow or other we'd need to split those two concerns, the new backend classes likely don't need all the environment setup and command line parsing code in their base class.

By default when you ask the runner for the list of scripts, it'll provide just the sysadmin ones, and there will be flag so you can ask for the developer ones as well.

A command line flag, a LocalSettings.php flag, or both?

It'll be called mwcmd.php (please, come up with a better name) in the root directory.

maintenance.php?

Syntax: php mwcmd.php update (for update.php), php mwcmd.php MassMessage:sendMessages (for extension scripts).

We could probably also use a syntax to somehow use a class from an out-of-tree file that's not in the registry. For example, if I copy a maintenance script to my home directory and change it somehow (for debugging or a quick hack). Or maybe an extension's "cleanup" script that will fix stuff in the database after the extension has been uninstalled without having to reinstall it first.

With regard to update.php, we sometimes need to run maintenance tasks from in the middle of update.php's updating of the database schema.
The main requirement this puts on a maintenance task under this RFC is that one task (update.php) needs to be able to launch another and get back some indication of success or failure, the child task can't just exit() on failure.

Right. We'd probably want fatalError() to do something else for children. Maybe throw an exception or return of a StatusValue?

The classes that power maintenance scripts will be moved out of the entrypoints, probably into includes/Maintenance. This will probably happen gradually. The entry points will continue to function, but eventually will emit deprecation notices.
They'll still extend the Maintenance class, and continue to be included in the autoloader.

I note the existing Maintenance class has both definitions of parameters and such, and also code for setting up the environment and parsing command lines. Somehow or other we'd need to split those two concerns, the new backend classes likely don't need all the environment setup and command line parsing code in their base class.

Good point. I think for nearly all scripts we don't need to maintain PHP interface compatibility (there might be a few that get extended?). We could probably extract a smaller "MaintenanceTask" (or another name) base class from Maintenance that doesn't include that stuff, but keeps a similar interface for most maintenance scripts to use.

By default when you ask the runner for the list of scripts, it'll provide just the sysadmin ones, and there will be flag so you can ask for the developer ones as well.

A command line flag, a LocalSettings.php flag, or both?

I was intending for a command line flag.

It'll be called mwcmd.php (please, come up with a better name) in the root directory.

maintenance.php?

I like!

Syntax: php mwcmd.php update (for update.php), php mwcmd.php MassMessage:sendMessages (for extension scripts).

We could probably also use a syntax to somehow use a class from an out-of-tree file that's not in the registry. For example, if I copy a maintenance script to my home directory and change it somehow (for debugging or a quick hack). Or maybe an extension's "cleanup" script that will fix stuff in the database after the extension has been uninstalled without having to reinstall it first.

Hmm, so you'd need to specify both an extra file name to include (since it wouldn't be in the autoloader), and then the class name to run. php maintenance.php --extra-include=~/foo.php class:MyFoo. (Can we just look up the name in the registry, and then check if that name is a class that implements MaintenanceTask? Or should we require a class: prefix to force it into a class?).

Right. We'd probably want fatalError() to do something else for children. Maybe throw an exception or return of a StatusValue?

Probably throw an exception, which would be caught by maintenance.php or runChild() and handled appropriately.

Hmm, so you'd need to specify both an extra file name to include (since it wouldn't be in the autoloader), and then the class name to run. php maintenance.php --extra-include=~/foo.php class:MyFoo. (Can we just look up the name in the registry, and then check if that name is a class that implements MaintenanceTask? Or should we require a class: prefix to force it into a class?).

There should be the option to still go via the registry, in case ~/foo.php is just a hacked copy of a registered class. We probably shouldn't allow naming classes directly at all unless --extra-include is given. I don't have an opinion on whether a "class:" prefix should be required.

TechCom is hosting an IRC meeting to discuss this RFC tomorrow. The meeting is scheduled for Tuesday 11 September at 2pm PST(21:00 UTC, 23:00 CET) in #wikimedia-office. NOTE: this meeting is one day earlier than TechCom IRC discussions are normally.

I strongly support this as it seems like it would resolve T195082

Volans added a subscriber: Volans.Sep 11 2018, 9:34 PM
TheDJ added a subscriber: TheDJ.EditedSep 12 2018, 10:58 AM

I noted this on the RFC talk page, but let me also put it here: Why limit to maintenance tasks ?
Why not a general architecture to run (command line) tools, of which the maintenance tool is just a more specific set/scope (maybe it's own registry ?)

I advise people to take a look at something like https://laravel.com/docs/5.6/artisan
which also allows you to document the command and its arguments, to call command line programs programmatically, to run them using a scheduler or to queue the command itself as job.

Ive been enjoying using that a lot and I hope it can inspire some of the work that would go into a tool like this.

I noted this on the RFC talk page, but let me also put it here: Why limit to maintenance tasks ?
Why not a general architecture to run (command line) tools, of which the maintenance tool is just a more specific set/scope (maybe it's own registry ?)

We tend to use "maintenance script" to refer to any "command line tool".

I advise people to take a look at something like https://laravel.com/docs/5.6/artisan
which also allows you to document the command and its arguments,

MediaWiki's existing Maintenance class does that too.

to call command line programs programmatically,

If you look at T99268#4530407, I was talking about much the same thing.

Note that in general it's better architecture to have shared backend logic than to "call command line programs programmatically" with the attendant serialization of parameters to $argv-style string arrays.

to run them using a scheduler

I don't know what you mean here, unless you're trying to say we should have some sort of cron reimplementation built into MediaWiki (which would probably use the job queue?).

Although really if you want scheduled runs of a "command line tool", you're probably best served by actually using cron.

or to queue the command itself as job.

See above regarding shared backend logic.

Based on the log, I think this was supposed to go to last call, but I'm not sure that ever happened (I don't see it in the TechCom notes...). Can we send it to last call now? I don't believe anything has significantly changed since it was last discussed.

daniel moved this task from Under discussion to Inbox on the TechCom-RFC board.Jul 9 2019, 11:30 AM

up for review per legoktm

daniel moved this task from Inbox to Last Call on the TechCom-RFC board.Jul 11 2019, 1:51 PM

Per the TechCom meeting on July 10, this goes on Last Call until July 24. If no relevant objections remain unaddressed by that time, this RFC will be approved as proposed.

daniel edited projects, added TechCom-RFC (TechCom-Approved); removed TechCom-RFC.EditedAug 13 2019, 9:22 AM

This RFC has been approved as proposed per the TechCom meeting on 2019-08-07.

Implementation will probably be taken on by the Core Platform Team at some point, but it's currently not high priority.

@CCicalese_WMF, do you have an idea how to fit this into our processes? Writing the framework shouldn't take long, it's not much work. Should be doable in a week or two of focused work. Converting existing scripts would take longer, but would be trivial.

@daniel , we are adding this as a future initiative.

CCicalese_WMF removed Legoktm as the assignee of this task.Sep 11 2019, 2:30 PM
CCicalese_WMF triaged this task as Normal priority.

I'm an outreachy applicant. Can I take up this task? How should I get started with this?