Page MenuHomePhabricator

RfC: Create a proper command-line runner for MediaWiki maintenance tasks
Open, MediumPublic

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Legoktm removed ori as the assignee of this task.Jun 5 2018, 1:37 AM
Legoktm added a subscriber: Legoktm.

Unassigning from ori since I assume he's not planning to work on this soon. I'm looking into writing up a more detailed RfC for this soon, but not cookie-licking it yet.

fbstj added a subscriber: fbstj.Jun 16 2018, 3:53 PM

This was discussed in IRC office hour, summary.

(The above link is broken, apparently due to T113000. The logs are available on-wiki.)

Legoktm claimed this task.Aug 17 2018, 7:08 PM
Legoktm updated the task description. (Show Details)

This RfC is now ready for discussion, I've taken Ori's initial details and expanded it into a proposed implementation: https://www.mediawiki.org/wiki/Requests_for_comment/Proper_command-line_runner_for_maintenance_tasks

daniel moved this task from Old to P1: Define on the TechCom-RFC board.Aug 17 2018, 8:34 PM
Anomie added a subscriber: Anomie.Aug 24 2018, 3:30 PM

With regard to update.php, we sometimes need to run maintenance tasks from in the middle of update.php's updating of the database schema.

The main requirement this puts on a maintenance task under this RFC is that one task (update.php) needs to be able to launch another and get back some indication of success or failure, the child task can't just exit() on failure.

The classes that power maintenance scripts will be moved out of the entrypoints, probably into includes/Maintenance. This will probably happen gradually. The entry points will continue to function, but eventually will emit deprecation notices.

They'll still extend the Maintenance class, and continue to be included in the autoloader.

I note the existing Maintenance class has both definitions of parameters and such, and also code for setting up the environment and parsing command lines. Somehow or other we'd need to split those two concerns, the new backend classes likely don't need all the environment setup and command line parsing code in their base class.

By default when you ask the runner for the list of scripts, it'll provide just the sysadmin ones, and there will be flag so you can ask for the developer ones as well.

A command line flag, a LocalSettings.php flag, or both?

It'll be called mwcmd.php (please, come up with a better name) in the root directory.

maintenance.php?

Syntax: php mwcmd.php update (for update.php), php mwcmd.php MassMessage:sendMessages (for extension scripts).

We could probably also use a syntax to somehow use a class from an out-of-tree file that's not in the registry. For example, if I copy a maintenance script to my home directory and change it somehow (for debugging or a quick hack). Or maybe an extension's "cleanup" script that will fix stuff in the database after the extension has been uninstalled without having to reinstall it first.

With regard to update.php, we sometimes need to run maintenance tasks from in the middle of update.php's updating of the database schema.

The main requirement this puts on a maintenance task under this RFC is that one task (update.php) needs to be able to launch another and get back some indication of success or failure, the child task can't just exit() on failure.

Right. We'd probably want fatalError() to do something else for children. Maybe throw an exception or return of a StatusValue?

The classes that power maintenance scripts will be moved out of the entrypoints, probably into includes/Maintenance. This will probably happen gradually. The entry points will continue to function, but eventually will emit deprecation notices.

They'll still extend the Maintenance class, and continue to be included in the autoloader.

I note the existing Maintenance class has both definitions of parameters and such, and also code for setting up the environment and parsing command lines. Somehow or other we'd need to split those two concerns, the new backend classes likely don't need all the environment setup and command line parsing code in their base class.

Good point. I think for nearly all scripts we don't need to maintain PHP interface compatibility (there might be a few that get extended?). We could probably extract a smaller "MaintenanceTask" (or another name) base class from Maintenance that doesn't include that stuff, but keeps a similar interface for most maintenance scripts to use.

By default when you ask the runner for the list of scripts, it'll provide just the sysadmin ones, and there will be flag so you can ask for the developer ones as well.

A command line flag, a LocalSettings.php flag, or both?

I was intending for a command line flag.

It'll be called mwcmd.php (please, come up with a better name) in the root directory.

maintenance.php?

I like!

Syntax: php mwcmd.php update (for update.php), php mwcmd.php MassMessage:sendMessages (for extension scripts).

We could probably also use a syntax to somehow use a class from an out-of-tree file that's not in the registry. For example, if I copy a maintenance script to my home directory and change it somehow (for debugging or a quick hack). Or maybe an extension's "cleanup" script that will fix stuff in the database after the extension has been uninstalled without having to reinstall it first.

Hmm, so you'd need to specify both an extra file name to include (since it wouldn't be in the autoloader), and then the class name to run. php maintenance.php --extra-include=~/foo.php class:MyFoo. (Can we just look up the name in the registry, and then check if that name is a class that implements MaintenanceTask? Or should we require a class: prefix to force it into a class?).

Right. We'd probably want fatalError() to do something else for children. Maybe throw an exception or return of a StatusValue?

Probably throw an exception, which would be caught by maintenance.php or runChild() and handled appropriately.

Hmm, so you'd need to specify both an extra file name to include (since it wouldn't be in the autoloader), and then the class name to run. php maintenance.php --extra-include=~/foo.php class:MyFoo. (Can we just look up the name in the registry, and then check if that name is a class that implements MaintenanceTask? Or should we require a class: prefix to force it into a class?).

There should be the option to still go via the registry, in case ~/foo.php is just a hacked copy of a registered class. We probably shouldn't allow naming classes directly at all unless --extra-include is given. I don't have an opinion on whether a "class:" prefix should be required.

TechCom is hosting an IRC meeting to discuss this RFC tomorrow. The meeting is scheduled for Tuesday 11 September at 2pm PST(21:00 UTC, 23:00 CET) in #wikimedia-office. NOTE: this meeting is one day earlier than TechCom IRC discussions are normally.

I strongly support this as it seems like it would resolve T195082

Volans added a subscriber: Volans.Sep 11 2018, 9:34 PM
TheDJ added a subscriber: TheDJ.EditedSep 12 2018, 10:58 AM

I noted this on the RFC talk page, but let me also put it here: Why limit to maintenance tasks ?
Why not a general architecture to run (command line) tools, of which the maintenance tool is just a more specific set/scope (maybe it's own registry ?)

I advise people to take a look at something like https://laravel.com/docs/5.6/artisan
which also allows you to document the command and its arguments, to call command line programs programmatically, to run them using a scheduler or to queue the command itself as job.

Ive been enjoying using that a lot and I hope it can inspire some of the work that would go into a tool like this.

I noted this on the RFC talk page, but let me also put it here: Why limit to maintenance tasks ?
Why not a general architecture to run (command line) tools, of which the maintenance tool is just a more specific set/scope (maybe it's own registry ?)

We tend to use "maintenance script" to refer to any "command line tool".

I advise people to take a look at something like https://laravel.com/docs/5.6/artisan
which also allows you to document the command and its arguments,

MediaWiki's existing Maintenance class does that too.

to call command line programs programmatically,

If you look at T99268#4530407, I was talking about much the same thing.

Note that in general it's better architecture to have shared backend logic than to "call command line programs programmatically" with the attendant serialization of parameters to $argv-style string arrays.

to run them using a scheduler

I don't know what you mean here, unless you're trying to say we should have some sort of cron reimplementation built into MediaWiki (which would probably use the job queue?).

Although really if you want scheduled runs of a "command line tool", you're probably best served by actually using cron.

or to queue the command itself as job.

See above regarding shared backend logic.

Based on the log, I think this was supposed to go to last call, but I'm not sure that ever happened (I don't see it in the TechCom notes...). Can we send it to last call now? I don't believe anything has significantly changed since it was last discussed.

up for review per legoktm

daniel moved this task from P1: Define to P5: Last Call on the TechCom-RFC board.Jul 11 2019, 1:51 PM

Per the TechCom meeting on July 10, this goes on Last Call until July 24. If no relevant objections remain unaddressed by that time, this RFC will be approved as proposed.

This RFC has been approved as proposed per the TechCom meeting on 2019-08-07.

Implementation will probably be taken on by the Platform Engineering at some point, but it's currently not high priority.

@CCicalese_WMF, do you have an idea how to fit this into our processes? Writing the framework shouldn't take long, it's not much work. Should be doable in a week or two of focused work. Converting existing scripts would take longer, but would be trivial.

@daniel , we are adding this as a future initiative.

CCicalese_WMF removed Legoktm as the assignee of this task.Sep 11 2019, 2:30 PM
CCicalese_WMF triaged this task as Medium priority.

I'm an outreachy applicant. Can I take up this task? How should I get started with this?

@WDoranWMF @kchapman @daniel Would you all be interested in promoting/mentoring this project via Google Summer of Code 2020 or Outreachy Round 20?

@WDoranWMF @kchapman @daniel Would you all be interested in promoting/mentoring this project via Google Summer of Code 2020 or Outreachy Round 20?

Generally yes - which time periods are we talking about exactly?

which time periods are we talking about exactly?

See https://developers.google.com/open-source/gsoc/timeline

Hi! This project seems interesting to me and I would like to contribute to it via GSoC'20. But before that I have some queries:

  • Is the command runner we are planning is similar to the way we use Git? Like we enter git on the terminal or command prompt and we see all the commands that can be run with one line description about each command.
  • Can you please link the source of maintenance scripts we are planning to include?!

Thanks in advance!

Hi, see the /maintenance folder in mediawiki/core.

@Soumyaa1804 Thanks for your interest! FYI, we are not planning to promote this project via GSoC and might end up promoting via Outreachy. We will finalize in the next few days.

Okay! I would be interested to work on this even in Outreachy if it gets promoted there and my initial application gets selected. :)

Krinkle removed a subscriber: Krinkle.Feb 4 2020, 6:09 PM
srishakatux changed the visibility from "Public (No Login Required)" to "Outreachy Mentors (Project)".
srishakatux moved this task from Backlog to Featured projects on the Outreachy (Round 20) board.

(as per the program rules need to restrict access to this project task to Outreachy Mentors group until the application period begins)

srishakatux changed the visibility from "Outreachy Mentors (Project)" to "Public (No Login Required)".Mar 5 2020, 6:16 PM
srishakatux updated the task description. (Show Details)Mar 18 2020, 11:44 PM

Hi! @daniel I would like to contribute, can you please assign me some task to get started with?

@Aklapper Yes, that's a good point; I've added the link from the Outreachy/Participants page.

srishakatux updated the task description. (Show Details)Mar 27 2020, 7:09 PM
srishakatux removed a project: Outreachy (Round 20).

We are unlisting this project from Outreachy (Round 20). If you a potential intern, please explore other projects here https://www.mediawiki.org/wiki/Outreachy/Round_20#Ideas_for_projects.

Aklapper removed subscribers: Anomie, Spage.Oct 16 2020, 5:02 PM