Page MenuHomePhabricator

Expose mwgrep functionality on-wiki
Open, LowestPublic

Tokens
"Like" token, awarded by Esanders."Love" token, awarded by Framawiki."Orange Medal" token, awarded by Krinkle."Like" token, awarded by Pathoschild."Like" token, awarded by Ricordisamoa.
Assigned To
None
Authored By
Legoktm, Aug 13 2014

Description

mwgrep is a fantastic tool, but it's only available to shell users. There have been some discussions on IRC about making it available to other users.

I'm wondering if we can just turn this into a special page onwiki, that allows for cross-wiki regex searches? We can already use regex on individual wikis via API and search page (T45652), how difficult would it be to make that cross-wiki?

Details

Reference
bz69489

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:44 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz69489.
bzimport added a subscriber: Unknown Object (MLST).
Legoktm created this task.Aug 13 2014, 7:22 PM

We could try it. We'd have to make sure the pool counter is pretty tight for it like we do single wiki regexps and we might have to fiddle with timeouts. Otherwise we should be able to do it.

demon added a comment.Aug 13 2014, 7:56 PM

IMHO not going to happen because of the performance considerations. I'd much rather see us complete labs replication where we'd be able to expose mwgrep + any other crazy tool you can think up.

Is there a bug about replicating to labs?

I don't think so. We talked about it on IRC[1] but no one created a bug, AFAIK
[1] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-dev/20140809.txt

Deskana removed a subscriber: Deskana.Jan 21 2015, 3:32 PM
demon triaged this task as Lowest priority.Mar 12 2015, 10:08 PM
Restricted Application added a project: Discovery. · View Herald TranscriptAug 16 2015, 9:24 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

That should be created rather than allowing this to rot.

Change 232668 had a related patch set uploaded (by Ori.livneh):
WIP: Add mwgrep-web

https://gerrit.wikimedia.org/r/232668

IMHO not going to happen because of the performance considerations. I'd much rather see us complete labs replication where we'd be able to expose mwgrep + any other crazy tool you can think up.

Complete Labs replication of what? The search indices?

demon added a comment.Aug 20 2015, 2:43 PM

IMHO not going to happen because of the performance considerations. I'd much rather see us complete labs replication where we'd be able to expose mwgrep + any other crazy tool you can think up.

Complete Labs replication of what? The search indices?

Yep, that was always the idea I had. There's tons and tons of cool tools I can envision people building around this data.

ashley added a subscriber: ashley.Aug 20 2015, 5:26 PM

Complete Labs replication of what? The search indices?

Yep, that was always the idea I had. There's tons and tons of cool tools I can envision people building around this data.

Now filed as T109715: Replicate production elasticsearch indices to labs.

I think this task is framed incorrectly. My understanding is that mwgrep is a specific implementation of a solution, but this task should better describe what search-related functionality is desired via the wiki.

Off-hand, if the goal here is to have better on-wiki search, we should probably use the existing infrastructure (i.e., Special:Search with the "insource:" keyword).

If the goal here is to have cross-wiki search, we should still probably go through Special:Search or perhaps a dedicated Special page, but the user interface will be a bit tricky.

Exposing mwgrep as it is, a command-line tool with a specific output format, seems kind of silly and shortsighted. We should look at what functionality mwgrep provides that Special:Search (and the accompanying API search module[s]...) do not currently provide and we should implement better search capabilities from there, in my opinion.

demon added a comment.Aug 21 2015, 1:55 AM

I think this task is framed incorrectly. My understanding is that mwgrep is a specific implementation of a solution, but this task should better describe what search-related functionality is desired via the wiki.
Off-hand, if the goal here is to have better on-wiki search, we should probably use the existing infrastructure (i.e., Special:Search with the "insource:" keyword).
If the goal here is to have cross-wiki search, we should still probably go through Special:Search or perhaps a dedicated Special page, but the user interface will be a bit tricky.

This.

Exposing mwgrep as it is, a command-line tool with a specific output format, seems kind of silly and shortsighted. We should look at what functionality mwgrep provides that Special:Search (and the accompanying API search module[s]...) do not currently provide and we should implement better search capabilities from there, in my opinion.

Also this. I'm proposing this task be declined as-is.

<legoktm> [...] the general request really is: allow people to do regex cross-wiki searches so they are not dependent upon shell users

This is a perfectly valid request.

But looking at tasks such as T46420: Restore interwiki (sister projects) results in search queries, it seems we may already have interwiki search support? I'm not sure I've ever used it, but a number of tasks seem to suggest it exists.

The other issue we're hitting here is T108149: "insource" search doesn't find matches in js/css pages, I think. A lot of the current requests for mwgrep runs seem to stem from the inability to search JS/CSS pages using the "insource:" keyword.

demon added a comment.Aug 21 2015, 2:32 PM

<legoktm> [...] the general request really is: allow people to do regex cross-wiki searches so they are not dependent upon shell users
This is a perfectly valid request.
But looking at tasks such as T46420: Restore interwiki (sister projects) results in search queries, it seems we may already have interwiki search support? I'm not sure I've ever used it, but a number of tasks seem to suggest it exists.

It does. The UX is pretty terrible which is why it hasn't been widely enabled. It's currently only enabled on Italian projects (because they're awesome beta testers :P)

The other issue we're hitting here is T108149: "insource" search doesn't find matches in js/css pages, I think. A lot of the current requests for mwgrep runs seem to stem from the inability to search JS/CSS pages using the "insource:" keyword.

That too ^

Pathoschild rescinded a token.
Pathoschild awarded a token.

Well, the summary says "mwgrep functionality", not mwgrep interface. It can be clarified but the meaning seems clear enough. That said a web interface for regex searches, aimed mainly at global interface editor and the like, could also be a service similar to quarry and based on mwgrep interface + oauth, wouldn't seem that bad.

A special page initially restricted to sysops maybe is even easier to code though. First patch wins I'd say. ;)

Gilles added a subscriber: Gilles.Sep 29 2015, 8:21 PM
Deskana moved this task from Needs triage to Search on the Discovery board.Dec 23 2015, 5:22 AM

But looking at tasks such as T46420: Restore interwiki (sister projects) results in search queries, it seems we may already have interwiki search support? I'm not sure I've ever used it, but a number of tasks seem to suggest it exists.

It does. The UX is pretty terrible which is why it hasn't been widely enabled. It's currently only enabled on Italian projects (because they're awesome beta testers :P)

I just noticed https://gerrit.wikimedia.org/r/283107.

Why is mwgrep still needed? Can we enable interwiki search on Wikimedia wikis?

I think most of the insource: issues have been fixed.

Restricted Application added a project: Discovery-Search. · View Herald TranscriptApr 14 2016, 6:10 AM

Mwgrep is the equivalent of issuing over 1000 queries at a time. Productionizing intrerwiki search requires rethinking how indexes are handled. If anything the linked patch makes mwgrep even more expensive and less desirable to expose via the web.

If anything the linked patch makes mwgrep even more expensive and less desirable to expose via the web.

Then it's entirely the wrong direction to move in.

EBernhardson added a comment.EditedApr 14 2016, 4:15 PM

The reference

If anything the linked patch makes mwgrep even more expensive and less desirable to expose via the web.

Then it's entirely the wrong direction to move in.

The referenced patch has absolutely nothing to do with this ticket. It offers the functionality needed to the existing scope of users of mwgrep. Offering regexp or not through mwgrep is completely tangential to exposing mwgrep on-wiki. As i said above, the issue with exposing mwgrep on wiki has nothing to do with how expensive the individual queries it issues are, it has to do with it issuing more than 1000 queries in parallel from a single invocation. Please try and understand the issues before declaring someone else's work entirely wrong.

Krinkle removed a subscriber: Krinkle.

Change 232668 abandoned by Ori.livneh:
WIP: Add mwgrep-web

https://gerrit.wikimedia.org/r/232668

demon removed a subscriber: demon.Feb 7 2017, 5:53 AM
Krinkle updated the task description. (Show Details)
Krinkle removed a subscriber: wikibugs-l-list.
Framawiki added a subscriber: Framawiki.

not exactly on-wiki, but this is being worked on in a tool: https://tools.wmflabs.org/global-search/