Page MenuHomePhabricator

Add a command-line option to mwgrep to allow it to search a particular page across all wikis
Closed, ResolvedPublic

Description

It would be really nice if I could use mwgrep to search all instances of MediaWiki:Gadgets-definition across all wikis (for example to solve T115681), or to search MediaWiki:Common.js across all wikis to see which ones are using WikiMiniAtlas (random example). I realize the 2nd case could be accomplished with the existing functionality, but it would be a lot less efficient and you would have to analyze the results more closely. The 1st case is currently impossible since mwgrep implements a filter to look at all pages that end in .js or .css (which MediaWiki:Gadgets-definition doesn't).

For some reason the HEAD link is broken, but you can see a recent version at https://git.wikimedia.org/blob/operations%2Fpuppet/5b7895dcd5b49b385f97e99438acf837f6a1a1d8/files%2Fmisc%2Fscripts%2Fmwgrep

Event Timeline

kaldari created this task.Oct 16 2015, 5:04 AM
kaldari raised the priority of this task from to Needs Triage.
kaldari updated the task description. (Show Details)
kaldari added a project: acl*sre-team.
kaldari added subscribers: kaldari, ori.
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptOct 16 2015, 5:04 AM

Basically this would just entail setting the title.keyword filter to either be '.*\\.(js|css)' (the current default) or whatever was passed as the --title option. Should be pretty simple.

kaldari renamed this task from Add a command like option to mwgrep to allow it to search a particular page across all wikis to Add a command-line option to mwgrep to allow it to search a particular page across all wikis.Oct 16 2015, 5:11 AM
kaldari added a project: Community-Tech.
kaldari updated the task description. (Show Details)
kaldari updated the task description. (Show Details)
kaldari moved this task from Untriaged to Epic backlog on the Community-Tech board.

While I can understand the motivation for creating this task, I'm pretty wary of continuing to extend and enhance mwgrep. It seems particularly wrong for the Community Tech team to be working on this, given that mwgrep is exclusively used by shell users. Instead of working on a tool that 99.99% of users can't use, the Community Tech team could work on:

Special:Search already supports an intitle: option, by the way: https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=intitle%3A%22Barack+Obama%22&fulltext=Search. If intitle: can't do an exact title match (and there's not another keyword that can), enhancing intitle: to support regex, for example, would be another better use of Community Tech resources, in my opinion.

I agree that we (Wikimedia engineers) should be working towards making mwgrep mostly obsolete (as far as public wikis are concerned) and that we shouldn't be expanding it's capabilities at the moment. When I want to do stuff like this, I take a copy into my home directory and alter it there, because otherwise I need to write a patch for the puppet repository etc.

@MZMcBride: Good point. If mwgrep were available to the community, they would be able to keep a better eye on on-wiki JS and CSS across all the projects (and then wouldn't have to ask Community Tech to do tasks like T110149). I took a look at the bugs you mentioned:

  • T71489: Expose mwgrep functionality on-wiki
    • I think there's little chance this one will happen, as T109715 would be a better solution for several reasons: it gives a lot more flexibility to users, it doesn't have to be as performant, and it makes more sense (since this feature isn't of much use to most 3rd party MediaWiki users).
  • T88247: insource should search article text on non-wikitext pages. Probably.
    • Looks like a valid bug, but not high priority. This would definitely be in the domain of the Search team though and not Community Tech.
  • T109715: Replicate production elasticsearch indices to labs
    • This would be hugely useful (both to the community and Community Tech). It looks like Yuvi is making good progress on it. I'll ask him if Community Tech can help beta test it or if there's anything else we can do to help move it along.

In the meantime, I do still think this bug would be a useful feature to add to mwgrep and considering it would only be a small code change (2 or 3 lines), I don't think it would consume undue resources from any team. The alternative (in order to resolve T115681) is for us to write a bot or our own shell script (per Krenair). Writing a bot would be hugely inefficient, so I guess creating our own copy of mwgrep would be the next logical alternative. Personally, though, I would prefer to share useful code with other devs, even if it is only shell users for the time being. The up-side is that once T109715 is complete, hopefully it could be reused by anyone on Tool Labs.

kaldari updated the task description. (Show Details)Oct 16 2015, 5:52 PM
kaldari closed this task as Resolved.Oct 16 2015, 6:47 PM
kaldari claimed this task.

FWIW, I went ahead and just ran my query with a modified mwgrep (per Krenair), since it only required modifying 1 line of code (changing '\\.(js|css)' to 'definition'), but it looks like Ori added this as an actual command-line option in the meantime (https://gerrit.wikimedia.org/r/#/c/246891/).

I agree that we still need to get this capability in the hands of the community and will see what I do to help that along.

kaldari moved this task from Epic backlog to Backlog on the Community-Tech board.Oct 16 2015, 6:52 PM