Page MenuHomePhabricator

[Migrated] External to Interwiki
Open, HighPublic

Description

I think AWB should have a feature that changes external links to sister projects into interwiki links, like changing http://en.wikibooks.org/wiki/Main Page to Main Page. @Wikihermit 00:45, 11 June 2007 (UTC)

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone

Event Timeline

Reguyla raised the priority of this task from to Needs Triage.
Reguyla updated the task description. (Show Details)
Reguyla added a project: AutoWikiBrowser.
Reguyla moved this task to General fixes on the AutoWikiBrowser board.
Reguyla added subscribers: Reguyla, Aklapper.

@Reedy, 11:10, 12 June 2007 (UTC) wrote:

By MaxSem from Wikihermits talk page:

  1. find \[http://en\.wikibooks\.org/wiki/(\S*) (.*)\], replace with [[b:$1|$2]]
  2. find \[http://en\.wikibooks\.org/wiki/(\S*)], replace with [[b:$1]].

@Reedy, 16:57, 12 June 2007 (UTC) wrote:

  1. find \[http://en\.wikisource\.org/wiki/(\S*) (.*)\], replace with [[s:$1|$2]]
  2. find \[http://en\.wikisource\.org/wiki/(\S*)], replace with [[s:$1]].
  3. find \[http://en\.wikiquote\.org/wiki/(\S*) (.*)\], replace with [[q:$1|$2]]
  4. find \[http://en\.wikiquote\.org/wiki/(\S*)], replace with [[q:$1]].
  5. find \[http://en\.wiktionary\.org/wiki/(\S*) (.*)\], replace with [[wiktionary:$1|$2]]
  6. find \[http://en\.wiktionary\.org/wiki/(\S*)], replace with [[wiktionary:$1]].
  7. find \[http://commons\.wikimedia\.org/wiki/(\S*) (.*)\], replace with [[commons:$1|$2]]
  8. find \[http://commons\.wikimedia\.org/wiki/(\S*)], replace with [[commons:$1]].
  9. find \[http://en\.wikinews\.org/wiki/(\S*) (.*)\], replace with [[n:$1|$2]]
  10. find \[http://en\.wikinews\.org/wiki/(\S*)], replace with [[n:$1]].
  11. find \[http://en\.wikispecies\.org/wiki/(\S*) (.*)\], replace with [[s:$1|$2]]
  12. find \[http://en\.wikispecies\.org/wiki/(\S*)], replace with [[s:$1]].

Implementation...?

@JLaTondre, 00:44, 13 June 2007 (UTC) wrote:
Another common pattern is ''word [http://en.--whateversite--.org/wiki/word]'' which should be replaced by ''[[whatever:word]]''.
I would be wary of implementing the ''[http://en.--whateversite--.org/wiki/word]'' versions on their own. I have seen quite a few cases where that is used as footnotes. That may not be the correct usage, but converting it to an interwiki link would be worse as it would result in an unintelligible sentence.
Example: ''Alfred Tennyson's works ][http://en.wikisource.org/wiki/Author:Alfred_Tennyson]] are'' should not become ''Alfred Tennyson's works https://en.wikisource.org/wiki/Author:Alfred_Tennyson are''.

@OsamaK, 15:28, 17 June 2007 (UTC) wrote:
This code is for en.wiki only! We use AWB in other wiki!

@Reedy, 17 June 2007 (UTC) wrote:
We know. It hasnt been implemented as of yet (it may not ever be), so it doesnt really matter atm.

@Dispenser, 04:13, 9 June 2008 (UTC) wrote:
I needed code for [[tools:~dispenser/view/Main_Page|my tool]] since people didn't know which form to enter in. It has since become convenient to just paste the URL in and watch the magic happen. I hope the AWB devs implement this for the list maker parts of the interface.

<source lang="javascript">
function fixTitle(e) {
    // Convert from the escaped UTF-8 byte code into Unicode
    s = unescape(decodeURI(e.value))
    // Convert secure URLs into non-secure equivalents (note the secure system is considered a 'hack')
    s = s.replace(/\w+:\/\/secure\.wikimedia\.org\/(\w+)\/(\w+)\//, 'http://$2.$1.org/')
    // Convert http://lang.domain.org/wiki/ into interwiki format
    s = s.replace(/http:\/\/(\w+)\.(\w+)\.org\/wiki\/([^#{|}\[\]]*).*/i, '$2:$1:$3')
    // Scripts paths (/w/index.php?...) into interwiki format
    s = s.replace(/http:\/\/(\w+)\.(\w+)\.org\/.*?title=([^#&{|}\[\]]*).*/i, '$2:$1:$3')
    // Remove [[brackets]] from link
    s = s.replace(/[^\n]*?\[\[([^[\]{|}]+)[^\n]*/g, '$1')
    // '_' -> ' ' and hard coded home wiki
    s = s.replace(/_/g, ' ').replace(/^ *(w:|wikipedia:|)(en:|([a-z\-]+:)) */i, '$3')
    // Use short prefix form (wiktionary:en:Wiktionary:Main Page -> wikt:en:Wiktionary:Main Page)
    s = s.replace(/^ *(?:wikimedia:(m)eta|wikimedia:(commons)|(wikt)ionary|wiki(?:(n)ews|(b)ooks|(q)uote|(s)ource|(v)ersity))(:[a-z\-]+:)/i, '$1$2$3$4$5$6$7$8$9')
    // Put back in
    e.value = s
}

A general implementation (suitable for general fixes) for foundation links from the code above:

  1. Find \[http://(\w+)\.(\w+)\.org/wiki/([^{|}\[\]<>"\n]+) +([^]]+)\] replace with [[$2:$1:$3|$4]]
  2. Find \[\[(?:wikimedia:(m)eta|wikimedia:(commons)|(wikt)ionary|wiki(?:(n)ews|(b)ooks|(q)uote|(s)ource|(v)ersity))(:[a-z\-]+:[^{}\[\]]+)\]\] replace with [[$1$2$3$4$5$6$7$8$9]]

It avoid the flaws from above and works across all languages.

@Reedy, 12:57, 9 June 2008 (UTC) wrote:
Cool, thanks!

@Rocket000, 00:17, 20 June 2008 (UTC) wrote:
I've been using my own regexes for this (though, not as good as the combo above) and would love to see this implemented.

@Reedy, 17:44, 23 June 2008 (UTC) wrote:
Would be good to implement this.. Not sure why the first one is needed in the ListMaker...? If you can elaborate/be a bit more specific Dispenser, i shall get this implemented.

@Reedy, 22:20, 3 July 2008 (UTC) wrote:
Partially implemented. {{awbsvn|3036}} (code exists, but not in use. As per the discussion page, it doesnt seem to actually work as a general fix or in list maker.......)

@Dispenser, 01:33, 13 October 2008 (UTC) and Updated: 01:56, 23 November 2008 (UTC) wrote:
While the code I provided above is good what I had coded it for (a user page input routine) it wasn't good enough for a general fix (potential language issues). Thus I've coded the following which should be nearly problem free:

<source lang="python">
familiesIWlist = {
        'wikipedia':    'w',
        'wiktionary':   'wikt',
        'wikinews':     'n',
        'wikibooks':    'b',
        'wikiquote':    'q',
        'wikisource':   's',
        'wikiversity':  'v',
}
for m in re.finditer(ur'\[http://([a-z0-9\-]+)\.(\w+)\.org/wiki/([^{|}\[\]<>"\s?]+) +([^]\n]+)\]', text):
    if m.group(1) == 'commons':
        iwPrefix = 'commons'
    elif m.group(1) == 'meta':
        iwPrefix = 'm'
    elif m.group(1) in familiesIWlist:
        # don't allow http://sources.wikipedia.org
        continue
    elif m.group(2) in familiesIWlist:
        iwPrefix = '%s:%s' % (familiesIWlist[m.group(2)], m.group(1))
    else:
        # TODO: Prevent iw linking on [[Wikipedia]] where it's used as references
        continue
    text = text.replace(m.group(0), '[[%s:%s|%s]]' % (iwPrefix, m.group(3), m.group(4)))
</source>

@Dispenser, 01:56, 23 November 2008 (UTC) wrote:
I've implemented the above code in my [[tools:~dispenser/view/Pywikipedia|commonfixes.py library]]. Over time I've noticed a few problems:

  1. Due to the haphazard way naming was done the above code will try to make download.wikipedia.org and sources.wikipedia.org into invalid interwikis. This can be solved using https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=interwikimap&sifilteriw=local interwiki table, which would include more links than we can cover in regexes.
  2. http://en.wikipedia.org/w/index.php?title=Wikipedia&diff=248964771&oldid=248921495 Edit on the "Wikipedia" article raises question wheather those links can be linked. And I ''think'' I read somewhere that they aren't suppose to be directly, but it still doesn't answer the question about interwiki.
  3. [ { | } ] and more are not valid characters, as well as their escaped counter parts like %7B. However, this highly unlikely since people typically don't link to invalid pages anyway.

For those of you still interested in linking I've come up with a set of regex. The whatever.wikipedia.org glitch is avoid by only allowing languages with two to three letter character codes.

**Regex:**      \[http://([a-z0-9\-]{3})\.(?:(wikt)ionary|wiki(n)ews|wiki(b)ooks|wiki(q)uote|wiki(s)ource|wiki(v)ersity)\.(?:com|net|org)/wiki/([^][{|}\s"]*) +([^\n\]]+)\]
**Replace:**    [[$2$3$4$5$6$7:$1:$8|$9]]
**Regex:**      \[http://(?:(m)eta|(commons)|(incubator)|(quality))\.wikimedia\.(?:com|net|org)/wiki/([^][{|}\s"]*) +([^\n\]]+)\]
**Replace:**    [[$1$2$3$4:$5|$6]]
**Regex:**      \[http://([a-z0-9\-]+)\.wikia\.(?:com|net|org)/wiki/([^][{|}\s"]+) +([^\n\]]+)\]
**Replace:**    [[wikia:$1:$2|$3]]

However you should likely add <code>(?<![*#:;]{2})</code> to the beginning to avoid changing lists and <code>(?![^<>]*&lt;/ref>)</code> to the end to avoid changing links in references.

@Dispenser, 08:07, 29 December 2008 (UTC) wrote:
According to https://en.wikipedia.org/wiki/WP:WAWI, AWB would have to detect the difference between a convenient link to material like wikisource page and a reference link like a wikisource policy page. Luckily that's why we created namespaces.

@Reedy, 13:52, 29 December 2008 (UTC) wrote:
Soo.. Are we alright to implement this or something? ;P

@Magioladitis, 17:08, 19 September 2009 (UTC) wrote:
Yes. Haven't you already?

@Magioladitis, 19:12, 22 February 2010 (UTC) wrote:
http://en.wikipedia.org/w/index.php?title=List_of_Italian_Americans&diff=prev&oldid=345188320

Magioladitis raised the priority of this task from Medium to High.Jul 7 2015, 2:14 PM