Page MenuHomePhabricator

[IABot] URLs with periods after them are recorded with the periods in the externallinks_global table
Closed, ResolvedPublic

Description

If a URL appears on Wikipedia with a period after it, for example...

<ref>http://www.statistics.sk/mosmis/eng/run.html.</ref>

.. it is recorded in the externallinks_global table with the period. MediaWiki ignores periods at the end of URLs, so IABot should as well.

Event Timeline

This is done quite easily, but it would be nice to know which characters are ignored in a URL not contained in brackets. I've been trying to research that, but haven't had any luck.

It looks like these are the characters that are excluded from a URL if they appear at the end of it:

.,:;?!)”<>[]\

Do you have a page I can test this on?

Here's a page you can test against a fresh database: https://test.wikipedia.org/wiki/Links

All the links at the top should get recorded as a single entry: https://test.wikipedia.org/wiki/Main_Page

All the links at the bottom should get their own separate entries.