Page MenuHomePhabricator

Expose mwgrep functionality on-wiki
Open, LowestPublic

Assigned To
None
Authored By
Legoktm
Aug 13 2014, 7:22 PM
Tokens
"Yellow Medal" token, awarded by Tgr."Like" token, awarded by Esanders."Love" token, awarded by Framawiki."Orange Medal" token, awarded by Krinkle."Like" token, awarded by Pathoschild."Like" token, awarded by Ricordisamoa.

Description

mwgrep is a fantastic tool, but it's only available to shell users. There have been some discussions on IRC about making it available to other users.

I'm wondering if we can just turn this into a special page onwiki, that allows for cross-wiki regex searches? We can already use regex on individual wikis via API and search page (T45652), how difficult would it be to make that cross-wiki?

Details

Reference
bz69489

Related Objects

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:44 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz69489.
bzimport added a subscriber: Unknown Object (MLST).

We could try it. We'd have to make sure the pool counter is pretty tight for it like we do single wiki regexps and we might have to fiddle with timeouts. Otherwise we should be able to do it.

IMHO not going to happen because of the performance considerations. I'd much rather see us complete labs replication where we'd be able to expose mwgrep + any other crazy tool you can think up.

Is there a bug about replicating to labs?

I don't think so. We talked about it on IRC[1] but no one created a bug, AFAIK
[1] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-dev/20140809.txt

demon triaged this task as Lowest priority.Mar 12 2015, 10:08 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

That should be created rather than allowing this to rot.

Change 232668 had a related patch set uploaded (by Ori.livneh):
WIP: Add mwgrep-web

https://gerrit.wikimedia.org/r/232668

IMHO not going to happen because of the performance considerations. I'd much rather see us complete labs replication where we'd be able to expose mwgrep + any other crazy tool you can think up.

Complete Labs replication of what? The search indices?

IMHO not going to happen because of the performance considerations. I'd much rather see us complete labs replication where we'd be able to expose mwgrep + any other crazy tool you can think up.

Complete Labs replication of what? The search indices?

Yep, that was always the idea I had. There's tons and tons of cool tools I can envision people building around this data.

Complete Labs replication of what? The search indices?

Yep, that was always the idea I had. There's tons and tons of cool tools I can envision people building around this data.

Now filed as T109715: Replicate production elasticsearch indices to labs.

I think this task is framed incorrectly. My understanding is that mwgrep is a specific implementation of a solution, but this task should better describe what search-related functionality is desired via the wiki.

Off-hand, if the goal here is to have better on-wiki search, we should probably use the existing infrastructure (i.e., Special:Search with the "insource:" keyword).

If the goal here is to have cross-wiki search, we should still probably go through Special:Search or perhaps a dedicated Special page, but the user interface will be a bit tricky.

Exposing mwgrep as it is, a command-line tool with a specific output format, seems kind of silly and shortsighted. We should look at what functionality mwgrep provides that Special:Search (and the accompanying API search module[s]...) do not currently provide and we should implement better search capabilities from there, in my opinion.

I think this task is framed incorrectly. My understanding is that mwgrep is a specific implementation of a solution, but this task should better describe what search-related functionality is desired via the wiki.

Off-hand, if the goal here is to have better on-wiki search, we should probably use the existing infrastructure (i.e., Special:Search with the "insource:" keyword).

If the goal here is to have cross-wiki search, we should still probably go through Special:Search or perhaps a dedicated Special page, but the user interface will be a bit tricky.

This.

Exposing mwgrep as it is, a command-line tool with a specific output format, seems kind of silly and shortsighted. We should look at what functionality mwgrep provides that Special:Search (and the accompanying API search module[s]...) do not currently provide and we should implement better search capabilities from there, in my opinion.

Also this. I'm proposing this task be declined as-is.

<legoktm> [...] the general request really is: allow people to do regex cross-wiki searches so they are not dependent upon shell users

This is a perfectly valid request.

But looking at tasks such as T46420: Restore interwiki (sister projects) results in search queries, it seems we may already have interwiki search support? I'm not sure I've ever used it, but a number of tasks seem to suggest it exists.

The other issue we're hitting here is T108149: "insource" search doesn't find matches in js/css pages, I think. A lot of the current requests for mwgrep runs seem to stem from the inability to search JS/CSS pages using the "insource:" keyword.

<legoktm> [...] the general request really is: allow people to do regex cross-wiki searches so they are not dependent upon shell users

This is a perfectly valid request.

But looking at tasks such as T46420: Restore interwiki (sister projects) results in search queries, it seems we may already have interwiki search support? I'm not sure I've ever used it, but a number of tasks seem to suggest it exists.

It does. The UX is pretty terrible which is why it hasn't been widely enabled. It's currently only enabled on Italian projects (because they're awesome beta testers :P)

The other issue we're hitting here is T108149: "insource" search doesn't find matches in js/css pages, I think. A lot of the current requests for mwgrep runs seem to stem from the inability to search JS/CSS pages using the "insource:" keyword.

That too ^

Well, the summary says "mwgrep functionality", not mwgrep interface. It can be clarified but the meaning seems clear enough. That said a web interface for regex searches, aimed mainly at global interface editor and the like, could also be a service similar to quarry and based on mwgrep interface + oauth, wouldn't seem that bad.

A special page initially restricted to sysops maybe is even easier to code though. First patch wins I'd say. ;)

But looking at tasks such as T46420: Restore interwiki (sister projects) results in search queries, it seems we may already have interwiki search support? I'm not sure I've ever used it, but a number of tasks seem to suggest it exists.

It does. The UX is pretty terrible which is why it hasn't been widely enabled. It's currently only enabled on Italian projects (because they're awesome beta testers :P)

I just noticed https://gerrit.wikimedia.org/r/283107.

Why is mwgrep still needed? Can we enable interwiki search on Wikimedia wikis?

I think most of the insource: issues have been fixed.

Mwgrep is the equivalent of issuing over 1000 queries at a time. Productionizing intrerwiki search requires rethinking how indexes are handled. If anything the linked patch makes mwgrep even more expensive and less desirable to expose via the web.

If anything the linked patch makes mwgrep even more expensive and less desirable to expose via the web.

Then it's entirely the wrong direction to move in.

The reference

If anything the linked patch makes mwgrep even more expensive and less desirable to expose via the web.

Then it's entirely the wrong direction to move in.

The referenced patch has absolutely nothing to do with this ticket. It offers the functionality needed to the existing scope of users of mwgrep. Offering regexp or not through mwgrep is completely tangential to exposing mwgrep on-wiki. As i said above, the issue with exposing mwgrep on wiki has nothing to do with how expensive the individual queries it issues are, it has to do with it issuing more than 1000 queries in parallel from a single invocation. Please try and understand the issues before declaring someone else's work entirely wrong.

Change 232668 abandoned by Ori.livneh:
WIP: Add mwgrep-web

https://gerrit.wikimedia.org/r/232668

not exactly on-wiki, but this is being worked on in a tool: https://tools.wmflabs.org/global-search/

Tgr added a subscriber: Tgr.

not exactly on-wiki, but this is being worked on in a tool: https://tools.wmflabs.org/global-search/

That seems like an adequate replacement. Any reason to keep the task open?

@Legoktm: Do you agree with the last comment?