Page MenuHomePhabricator

Replace by providing improved handling for external id formatter urls
Open, MediumPublic


It should make it possible to specify formatter urls more complex than currently.

There is probably already a bug about this, but in case it got forgotten since we first discussed it. Probably one for T150179

Event Timeline

Can you please add a link to the sourcecode of the tool so we can evaluate what kind of operations it does?

Sure added to summary above.

matej_suchanek renamed this task from Replace by providing improved handeling for external id formatter urls to Replace by providing improved handling for external id formatter urls.Mar 22 2017, 5:59 PM

As background, I'm seeing about 2000 "hits" per day on this service right now, with about a dozen properties linking through it to their databases.

I believe a way this could be done would be to allow the attachment of regular expressions to the formatter URL, and have the external id URL conversion code understand them. That is, if there was a qualifier property that specified "regex substitution" for example, the ISNI problem (of additional spaces within the id that must be removed for the formatter URL) would be handled by a value something like "s/\s+//g" (remove all spaces). Some of the others might need a "regex match" on the id that allows specifying a $1, $2, $3 grouping pattern, and the formatter URL then looks something like http://...../$1/$2/$3 (or that could also possibly be handled by a substitution as in the ISNI case). The IMDB case is more difficult because it's essential 4 different formatter URLs based on the first characters of the id, so it might need a "regex filter" that limits the scope of each formatter URL based on the id; wikibase would then need to look through the filter regexes to find a matching formatter URL and use that.