Page MenuHomePhabricator

Add a list=recentchanges result property for title without namespace
Closed, DeclinedPublic

Description

Add a list=recentchanges result property for title without namespace prefix. See bug 57084#c17.

I'm not sure what we'd call it. strippedtitle? parsedtitle? unprefixedtitle? It might have been better if from the beginning we'd called the title with namespace prefix fulltitle or something.

I'm thinking unprefixedtitle might be the least ambiguous result property name.


Version: 1.24rc
Severity: enhancement

Details

Reference
bz69005

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 22 2014, 3:27 AM
bzimport set Reference to bz69005.
bzimport added a subscriber: Unknown Object (MLST).

Also, page move log events should have a new "unprefixedtarget" log parameter. We originally had:

4::target
5::noredir

Gerrit change 150969 adds:

6::nullrevid
7::redirpageid

So I guess this would be:

8::unprefixedtarget

I'd really rather not clutter the API result with duplicates of the same information.

If a client needs an unprefixed title, it can easily enough remove everything up to the first colon for namespaces other than 0, or it can use meta=siteinfo&siprop=namespaces to fetch the actual prefix for each namespace and strip it explicitly.

As for the log event parameters, that's outside the scope of the API. The API just returns whatever parameters were logged with the event.

(In reply to Brad Jorsch from comment #2)

I understand your concern, but I think it is worthwhile to entertain this bug.

You're right that you can get namespaces from siteinfo and parse titles with it, but it's hardly easy and is certainly error prone. I know because I end up doing it a lot and I wrote a library to make the work easier. See https://pythonhosted.org/mediawiki-utilities/lib/title.html#mw-lib-title For what it is worth, your "easy" solution wouldn't work. For example, "Foo:Bar" would be in the main namespace since "Foo" is not a valid namespace prefix. In other words, splitting on colon doesn't work. Also, you can't just regular namespace names. You also need to be able to handle cannonical names and aliases. Sure, this information is inside of sitematrix, but how many people know this? The people who have spent hours dealing with errors.

Really, the information that is currently stored in params is bad. It's not just the pre-pended namespace prefix that is problematic. The fact that spaces are underscores in log_title, but spaces are spaces in log_params has probably cost a substantial amount of developer time handling errors.

The current proposal is to duplicate badly formatted information with nicely formatted information. The only other alternative is to break backwards compatibility by changing how params are stored to something that is more sane.

(In reply to Aaron Halfaker from comment #3)

For what it is worth, your "easy" solution wouldn't work. For example,
"Foo:Bar" would be in the main namespace since "Foo" is not a valid
namespace prefix.

If it's in the main namespace it'll have namespace 0, so it'll fail the "for namespaces other than 0" check.

You also need to be able to handle cannonical names and aliases.

Where do API title result properties contain non-canonical names? Or are we getting into special cases of log entry parameters (which, again, is outside the scope of the API) or the like?

Really, the information that is currently stored in params is bad. It's not
just the pre-pended namespace prefix that is problematic. The fact that
spaces are underscores in log_title, but spaces are spaces in log_params has
probably cost a substantial amount of developer time handling errors.

Again, log entry parameter formatting is mainly outside the scope of the API; certain types are specially handled for hysterical raisins, but mostly the API generally doesn't even know that some parameter is a title. And the few that are specially handled are consistent with how titles are returned in the rest of the API.

The current proposal is to duplicate badly formatted information with nicely
formatted information.

I'd remove "badly" and "nicely" from that sentence: "The current proposal is to duplicate formatted information with [slightly-differently] formatted information."

You're right. This should be addressed in a bug about changing log_params.

But I'd still argue that having two different data formats for title is less desirable than having one data format and that the data formats that don't require parsing strings are more desirable than those that do.

Also, given that API results are designed primarily for machines rather than humans to read, and that text doesn't use a lot of bandwidth, does it matter much if there's clutter in the API result? I think bot programmer labor is the scarcer resource to take into consideration. It probably makes the bot code more complex and susceptible to bugginess when people have to include code to handle stripping the namespace prefixes, rather than using a result property that doesn't require so much additional processing. Shall we reopen the bug?

(In reply to Nathan Larson from comment #6)

does it matter much if there's clutter in the API result?

Yes.

Shall we reopen the bug?

No.

(In reply to Brad Jorsch from comment #7)

(In reply to Nathan Larson from comment #6)

does it matter much if there's clutter in the API result?

Yes.

Why does it matter, and how much does it matter? This is useful information not just for this bug but for MediaWiki API engineering in general.

Maybe it's "useful information", but rather high-level for this specific bug report - something like wikitech-l might be a better suited venue.

Would it be an acceptable compromise to add an "unprefixedtitle" option to rcprop for optionally displaying the unprefixed title, kinda like we have a "parsedcomment" parameter for optionally providing the "comment" data in a different form? Then those users who don't want what would be, for them, clutter, can simply omit "unprefixedtitle" from the rcprop part of their query.