Page MenuHomePhabricator

New Hook TitleUserCase for manipulation of names in MW.
Open, LowPublic

Description

Author: bacher

Description:
Patches the Title class, adds a new hook to manipulate Title dbkeys

We need for our project (http://bowiki.net) the possibility to use our own naming conventions for MediaWiki (MW) pages.

We need to identify pages by there names against names that are additionaly stored in another software that is accessed via a special protocol, and we need to be sure, that independent of cases, objects with the same pronouncation may be identified.

We therefor need to have a standard method to deal with them on both sides (the external program AND the MW). We need to apply a strtolower and a ucfirst on each page name.

The MW has no such functionality implemented currently. That's what i've learnde from discussing this issue on the MW-IRC (also found nothing appropriate on the MW developer documentation and MW homepage itself).

We solved to manipulate names with changing the $dbkey (just in place where ucfirst would be applied) variable in the Title class of the MW-API (see patch). The patch provided adds a Hook to also deal with that.

We would really be happy to see the additional functionality in a future MW version.

Thanks in advanve

Isnogud@#mediawiki aka Joshua Bacher


Version: unspecified
Severity: enhancement

attachment Title(Hook)_TitleUserCase.patch ignored as obsolete

Details

Reference
bz13166

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:02 PM
bzimport added a project: MediaWiki-General.
bzimport set Reference to bz13166.
bzimport added a subscriber: Unknown Object (MLST).
bzimport created this task.Feb 26 2008, 2:46 PM

Modified version of patch applied in r31505

brion added a comment.Mar 3 2008, 7:55 PM

Reverted in r31519

bacher wrote:

Well. In response to your svn comment:

It would be sufficient for us, to only access the $dbkey variable, the Instance itself is completely uninteresting.

I think it's a real nice feature that people might like to use, to have there own control for the names in the wiki.

As we are using an ontology and a case sensitive reasoner, we absolutely need some kind of control level, how names are treated, to be sure, that the names match.

We use a similar syntax to define relations between pages then the semantic media wiki does. but we want to be sure that if a page was used in a link with a different case typing, that the wiki automatically knows about the object mentioned in the link:

If we have a page called Someidentifier and a user uses something like SomeIdendifier in a relation

[[relation::SomeIdentifier]]

we want to have the SomeIdentifier to show up as a known page. In combination with a redirect, a user following the link will then be sent to the Someidentifier page.

I think that there is no way, to handle this situation, otherwise.

So, what do you think?

thanks isnogud

bacher wrote:

A second patch

attachment Tpatch2.p ignored as obsolete

bacher wrote:

Hello again,

i was rethinking about the problem, as it didn't let me sleep. stared a little bit on the code and here is what i found:

  1. I would think that adding a big Hook to the beginning to the secureAndSplit function is a good idea, here is why:

The Code in SecureAndSplice is quite important, it checks for some stuff. If we place the manipulation a little bit earlier, the developer would also gain from all the checkings that are going on there for his dbkey.

A good place to set the Hook is - in my opinion - the place after we split NS and dbkey. There we are able to manipualte both. i think this is at line 1840: Namespace is now set dbkey also, ready to be manipulated, lets do it.

As far as i can see, secureAndSplit just does stuff on NS and title, so if a hook is placed there, it should use $this, $dbkey, and $ns at the suggested line. but that slightly causes problems, because there is no recheck if we passed a new NS after this line. so we just have to set it new.

  1. I would think that a functionality that manipulates the dbkey conflict with the

wgCapitals, since we choose a manual manipulation with using the hook.

If one decides Manipulation of the Title he should be automatically excluded from using the internal ucfirst. But is free to use it in there own function.

  1. A suggestion for a better name for the Hook!

if we change the namespace OR if we directly check the dbkey, on both actions we manipulate the TitleDBKey so maybe a good name is to call it: AlternativeTitleDBkey - directly access and alter the dbkey and the namespace for the title object.

Well, i created a patch according to my suggestion here. added the following hook, and added a checking for the wgCapitals manipulation.

if (if (!array_key_exists('AlternativeTitleDBkey', $wgHooks))) {

if(wfRunHooks('AlternativeTitleDBKey', array( $this, &$dbkey, &$ns ))== true){
      $this->mNamespace=$ns;
}else return false;

}

just check the attached patch. Tpatch.p

Thanks in advance

I would like to note that a rewrite of the Title class may be done.

It was partially discussed in wikitech-l, mostly between me and Simetrical.

The idea was primarily to introduce a ''real title'' in addition to the current page title which is basically a db key.
This would mean that you could create or move a page as [[_main_Page]], and while it would still be the same as [[Main Page]] and going two the two different pages would lead to the same page. The title would actually show itself as [[_main_Page]] and it would stay that way. In other words, _'s, and various other characters that are normally normalized strictly would actually become valid for use without normalization or need for use of {{DISPLAYTTILE:...}}.

The part relevant to this bug is that in addition to that, the idea was to create a normalization function which would be extensible. In other words, rather than this ugly hack (And yes, this is an ugly hack, worse than DISPLAYTITLE), a extension could easily extend/hook into this normalization function to provide exactly what you are trying to do, but in a robust and clean way.

Additionally, looking over the comments inside the "[Wikitech-l] [MediaWiki-CVS] SVN: [31519] trunk/phase3" discussion on secureAndSplit I have a feeling that I am likely going to section off secureAndSplit into more intuitive parts of what it actually does (Splitting interwiki, disallowing illegal characters, normalizing case, rip out dangerous or other unicode characters which shouldn't be there, split namespace, pull out dangerous ../ and ./ sequences and disallow tilde(~) sequences, limit title length, etc...) And rather than sticking it all inside of a single function, I'm likely going to do it in a sort of sequence or list of actions to apply to the title with data on how and where to apply it stored in what will probably be an array or list of some sort. That way you can insert extra things to do at any point in the process, and also remove or replace parts of the sequence (Such as the case normalization).
This of course, will completely void out that hook and break anything using it because there will be no sane place for it to exist.

bacher wrote:

The last patch was broken.

The former patch was broken:

'if (if (' <- lookst nasty

attachment Tpatch2.p ignored as obsolete

bacher wrote:

Additionally, looking over the comments inside the "[Wikitech-l]
[MediaWiki-CVS] SVN: [31519] trunk/phase3" discussion on secureAndSplit I have
a feeling that I am likely going to section off secureAndSplit into more
intuitive parts of what it actually does (Splitting interwiki, disallowing
illegal characters, normalizing case, rip out dangerous or other unicode
characters which shouldn't be there, split namespace, pull out dangerous ../
and ./ sequences and disallow tilde(~) sequences, limit title length, etc...)
And rather than sticking it all inside of a single function, I'm likely going
to do it in a sort of sequence or list of actions to apply to the title with
data on how and where to apply it stored in what will probably be an array or
list of some sort. That way you can insert extra things to do at any point in
the process, and also remove or replace parts of the sequence (Such as the case
normalization).

thats a real good idea. as i could read from the code, splitAndSecure is indeed a function that may be split up into different functionality parts. well, the idea with the list of things is indeed a good thing. so what came to my mind was that you introduce smth. like a global array and store the different checks or transformations there. but it will not be sufficient to only apply that on the title, you need that for the namespaces too, right? Maybe a global array is not to good, since one could go there and delete checks that are pretty important.

The tricky part here, is how to deal with the priority of jobs. Sounds like a good job for a priority queue http://en.wikipedia.org/wiki/Priority_queue.

This of course, will completely void out that hook and break anything using it
because there will be no sane place for it to exist.

that would be a real cool thing. i would be happy to give a helping hand on that.

maybe i start implementing a priority queue. what do you devolopers think?

you may also contact me on MediaWiki-General irc (isnogud)

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 13 2016, 10:16 AM