Page MenuHomePhabricator

Deploy InternetArchiveBot on the Dutch Wikipedia (nlwiki)
Closed, ResolvedPublic

Description

The Dutch Wikipedia is the fifth largest Wikipedia, with just under 2 million articles. This should be the fifth wiki to be deployed to.

Event Timeline

The translations of the discussion are a bit rough, but it seems we have consensus here.

Seems like consensus to me (Dutch speaker) as well.

In that case I'm going to need the page to go to to get my bot approved.

In that case I'm going to need the page to go to to get my bot approved.

https://nl.wikipedia.org/wiki/Wikipedia:Aanmelding_botgebruikers, instructions in English are also there.

@Sjoerddebruin Do you guys have any templates for archive services. IE, when a URL is dead in a reference, enwiki uses Template:Webarchive to associated the dead URL with a respective snapshot of it when it was once alive.

I can't find any.

I think only our cite web template has parameters for that. No separate template that I'm aware of.

Okay thanks. This is all needed to help configure IABot to be run on the wiki. If you can, can you list me all of the Cite templates of your native language that have url and archiveurl parameters? I need to add them to IABot's list of recognized international cite templates.

There's:

Then there's templates that should have an archiveurl, but doesn't yet:

Not sure if I'm overlooking some.

(and the redirects linking there, of course)

Awaiting approval or another trial.

@Effeietsanders The bot approvals doesn't seem to be moving along. It's the final hurdle to fully deploying the bot.

@Cyberpower678 as I understand, they need someone to adapt the template in such a way that the original link is no longer displayed if you say it's dead. How would you normally do that, through an additional parameter 'deadurl' to be set to 'yes'? (that is what I understand from your talkpage).

@Cyberpower678 as I understand, they need someone to adapt the template in such a way that the original link is no longer displayed if you say it's dead. How would you normally do that, through an additional parameter 'deadurl' to be set to 'yes'? (that is what I understand from your talkpage).

Correct. I could probably adapt them, but I doubt I have the permissions needed to do so. They're probably protected from editing.

Changed!

If dode-url = ja or dead-url = yes, instead of the url the archiefurl/archiveurl is used in template Citeer web on Dutch Wikipedia.

Installed the parameters in the config file.

Order is incorrect. Plural for "bronnen" is also incorrect. See https://nl.wikipedia.org/w/index.php?title=...Something_to_Be_%28single%29&type=revision&diff=49365878&oldid=37572390

"Redden 1 bronnen en labelen 0 als dood. InternetArchiveBot (v1.4beta4)"

Plural for "bronnen" and order:
-> 1 bron redden en 0 labelen als dood. InternetArchiveBot (v1.4beta4)
-> 2 bronnen redden 0 labelen als dood. InternetArchiveBot (v1.4beta4)

NOTICE: "bron" instead of "bronnen" (only when amount = 1)
NOTICE: amount first, then "bronnen" or "labelen"

Order is incorrect. Plural for "bronnen" is also incorrect. See https://nl.wikipedia.org/w/index.php?title=...Something_to_Be_%28single%29&type=revision&diff=49365878&oldid=37572390

"Redden 1 bronnen en labelen 0 als dood. InternetArchiveBot (v1.4beta4)"

Plural for "bronnen" and order:
-> 1 bron redden en 0 labelen als dood. InternetArchiveBot (v1.4beta4)
-> 2 bronnen redden 0 labelen als dood. InternetArchiveBot (v1.4beta4)

NOTICE: "bron" instead of "bronnen" (only when amount = 1)
NOTICE: amount first, then "bronnen" or "labelen"

Edit summaries do not have {{plural}} support, and are controlled from https://nl.wikipedia.org/wiki/User:InternetArchiveBot/Dead-links.js so this cannot be fixed. You can reword the summary to make it more consistent across the board though.

@Cyberpower678 I would find the following to make most sense: "X bron(nen) gered, en Y gelabeld als onbereikaar"

  • perfect tense because the action is completed
  • bron(nen) is a common way to phrase if singular/plural is unknown
  • I think 'dood' sounds a bit.. morbid :) Onbereikbaar means you cannot connect to it - which is more correct anyway.

And of course advertise this tool on nlwiki.

Is there anything else I need to done. All the concerns so far have been addressed.

It seems some errors are being reported - which puts the decision in question. Could you take a look at them here? https://nl.wikipedia.org/wiki/Wikipedia:Aanmelding_botgebruikers#InternetArchiveBot

Some seem indeed unintended results (which may have a reasonable cause, but I'd like you to confirm) and a few others are more a matter of clearer edit summaries perhaps.

It seems some errors are being reported - which puts the decision in question. Could you take a look at them here? https://nl.wikipedia.org/wiki/Wikipedia:Aanmelding_botgebruikers#InternetArchiveBot

Some seem indeed unintended results (which may have a reasonable cause, but I'd like you to confirm) and a few others are more a matter of clearer edit summaries perhaps.

I've responded to the English, but I have no idea regarding your Dutch conversation. I would highly recommend advertising the URL https://tools.wmflabs.org/iabot?wiki=nlwiki which is the maintenance tool that comes with the bot as well as usable features such as on demand page checking.

Thanks. I'll keep an eye on it.

In the mean time, I also noticed that in the case that people added an explicit archive.org link, you replace that with a regular link if the regular link is live. How can we signal that we want to use the archive.org link instead anyway (aside from reporting it as dead)? In some cases, it is desirable to link to an old version, for example of a government website that has variable content. I should check with the template folks, but I guess best is to add a variable for that?

In the mean time, it seems that those are two categories where people run into issues: where you're replacing explicit archive.org links, and where you replace standalone links (false positives). Is there a way to increase the threshold a bit to decrease the false positives?

ps: the link https://tools.wmflabs.org/iabot/index.php?wiki=nlwikipage=manageurlsingle&url=http%3A%2F%2Fwww.isuresults.com%2Fbios%2Fisufs00008719.htm you pasted, doesn't work here.

Thanks. I'll keep an eye on it.

In the mean time, I also noticed that in the case that people added an explicit archive.org link, you replace that with a regular link if the regular link is live. How can we signal that we want to use the archive.org link instead anyway (aside from reporting it as dead)? In some cases, it is desirable to link to an old version, for example of a government website that has variable content. I should check with the template folks, but I guess best is to add a variable for that?

In the mean time, it seems that those are two categories where people run into issues: where you're replacing explicit archive.org links, and where you replace standalone links (false positives). Is there a way to increase the threshold a bit to decrease the false positives?

ps: the link https://tools.wmflabs.org/iabot/index.php?wiki=nlwikipage=manageurlsingle&url=http%3A%2F%2Fwww.isuresults.com%2Fbios%2Fisufs00008719.htm you pasted, doesn't work here.

IABot NEVER replaces archives with the original URL. It's not programmed to do that. I need an example of what you mean.

I fixed the link on the discussion.

It's not replacing archive URLs, but moving them to the correct parameter. The archives are still accessible on the page.

I fixed the link on the discussion.

Thanks for fixing the link. Not sure where in that page it shows that it was accessed 3 times as dead, though. Or am I overlooking something? (in the mean time, I reported it as false positive, which may have affected the page)

I fixed the link on the discussion.

Thanks for fixing the link. Not sure where in that page it shows that it was accessed 3 times as dead, though. Or am I overlooking something? (in the mean time, I reported it as false positive, which may have affected the page)

It did, if you look https://tools.wmflabs.org/iabot/index.php?wiki=nlwiki&page=manageurlsingle&url=http%3A%2F%2Fwww.isuresults.com%2Fbios%2Fisufs00008719.htm

The bottom contains a change log.

Quoted Text

I fixed the link on the discussion.

Thanks for fixing the link. Not sure where in that page it shows that it was accessed 3 times as dead, though. Or am I overlooking something? (in the mean time, I reported it as false positive, which may have affected the page)

It did, if you look https://tools.wmflabs.org/iabot/index.php?wiki=nlwiki&page=manageurlsingle&url=http%3A%2F%2Fwww.isuresults.com%2Fbios%2Fisufs00008719.htm

The bottom contains a change log.

This is very odd - I only see one entry in the log, my own. Is it a matter of not having sufficient rights?

Quoted Text

I fixed the link on the discussion.

Thanks for fixing the link. Not sure where in that page it shows that it was accessed 3 times as dead, though. Or am I overlooking something? (in the mean time, I reported it as false positive, which may have affected the page)

It did, if you look https://tools.wmflabs.org/iabot/index.php?wiki=nlwiki&page=manageurlsingle&url=http%3A%2F%2Fwww.isuresults.com%2Fbios%2Fisufs00008719.htm

The bottom contains a change log.

This is very odd - I only see one entry in the log, my own. Is it a matter of not having sufficient rights?

What you're seeing in that log entry is the false positive report you submitted having taken effect and resetting the bot's perception of the URL. The bot now sees the URL as alive. If you visited that page before reporting, you would have seen it being marked as dead. If you are curious as to knowing if a URL is being seen as dead in intermittent checks, it would show up as Dying. After 3 failed checks, consecutively, spaced apart 3 days each minimum, the status changes from Dying to Dead and will be treated as dead.

@Smile4ever that would be a workaround, but rather undesirable. After all, the link still works fine - and changing it to yes might trigger it to be reported as dead in the database. It may very well be that we only want to use this specific archive version only on that article. For example: in case of a government page with the makeup of the cabinet. In some articles A we may want to refer to a specific version (the page as it was in 2007), in some other articles B to another version (the page as it was in 2010) and in some other articles C yet, we may want to refer to the most up to date version (the current version). If I report it as dead in A, I fear that it will eventually trickle also to B and C. It would probably be better to have some kind of override (archiveonly = yes, for example) that we would only use if we really want to point to a very specific version.

Quoted Text

I fixed the link on the discussion.

Thanks for fixing the link. Not sure where in that page it shows that it was accessed 3 times as dead, though. Or am I overlooking something? (in the mean time, I reported it as false positive, which may have affected the page)

It did, if you look https://tools.wmflabs.org/iabot/index.php?wiki=nlwiki&page=manageurlsingle&url=http%3A%2F%2Fwww.isuresults.com%2Fbios%2Fisufs00008719.htm

The bottom contains a change log.

This is very odd - I only see one entry in the log, my own. Is it a matter of not having sufficient rights?

What you're seeing in that log entry is the false positive report you submitted having taken effect and resetting the bot's perception of the URL. The bot now sees the URL as alive. If you visited that page before reporting, you would have seen it being marked as dead. If you are curious as to knowing if a URL is being seen as dead in intermittent checks, it would show up as Dying. After 3 failed checks, consecutively, spaced apart 3 days each minimum, the status changes from Dying to Dead and will be treated as dead.

OK, so it's not really a log, but a status indicator with some added info. You can only see the current status and the report(s) belonging to that - but if the status changed, it will no longer show previous statuses. Correct?

@Smile4ever that would be a workaround, but rather undesirable. After all, the link still works fine - and changing it to yes might trigger it to be reported as dead in the database. It may very well be that we only want to use this specific archive version only on that article. For example: in case of a government page with the makeup of the cabinet. In some articles A we may want to refer to a specific version (the page as it was in 2007), in some other articles B to another version (the page as it was in 2010) and in some other articles C yet, we may want to refer to the most up to date version (the current version). If I report it as dead in A, I fear that it will eventually trickle also to B and C. It would probably be better to have some kind of override (archiveonly = yes, for example) that we would only use if we really want to point to a very specific version.

The original link doesn't work - my first example was a bad example. Check the two other example links please.

The problem currently is that the bot makes archived links unarchived, which is not what we want. "dodeurl=ja" links the archived URL in the template instead of the original url, which is what was intended by the last article editor.

@Smile4ever that would be a workaround, but rather undesirable. After all, the link still works fine - and changing it to yes might trigger it to be reported as dead in the database. It may very well be that we only want to use this specific archive version only on that article. For example: in case of a government page with the makeup of the cabinet. In some articles A we may want to refer to a specific version (the page as it was in 2007), in some other articles B to another version (the page as it was in 2010) and in some other articles C yet, we may want to refer to the most up to date version (the current version). If I report it as dead in A, I fear that it will eventually trickle also to B and C. It would probably be better to have some kind of override (archiveonly = yes, for example) that we would only use if we really want to point to a very specific version.

The original link doesn't work - my first example was a bad example. Check the two other example links please.

The problem currently is that the bot makes archived links unarchived, which is not what we want. "dodeurl=ja" links the archived URL in the template instead of the original url, which is what was intended by the last article editor.

Yes, I know. The thing is, that is a workaround - it is not a solution, for the reasons I laid out. We would have to adapt the template though - it is not something the bot can help at this point. An additional parameter would be needed.

@Cyberpower678 what is the best practice on this from other language editions?

What @Effeietsanders meant to say:

We should have a parameter "archiveonly" to indicate we want to link the archived URL instead of the original link (in the cite template) when the original link is not dead (yet).

Enwiki uses deadurl=bot: unknown for that setting.