Page links (templates, categories, etc) may be out of date
Open, LowPublic
Actions

Assigned To

None

Authored By

	jayvdb
	Jun 6 2015, 5:18 AM

Description

(This may be only an obscure problem which doesnt have a lot of impact, but we need to be very sure that we know the limitations, and pywikibot developers and script writers need to be made aware of them.)

We know that links on a page (as exposed via the API) may not out of sync with the page. There isnt clear documentation explaining all the ways this can occur.

Page.templatesWithParams has this warning as a comment:

# WARNING: may not return all templates used in particularly
# intricate cases such as template substitution

Usually bots dont need 100% accurate information, but there are times when they need to be accurate. e.g. honouring {{nobots}} templates is a pretty hard requirement. However if {{nobots}} is only used on pages in ways that are always accurate, the botMayEdit function doesnt need to worry about the edge cases in the underlying MediaWiki parser. Using the {{nobots}} example, it is typically a top level object in the parse tree, and not included in intricate templates, or transcluded. The documentation at https://en.wikipedia.org/wiki/Template:Bots is fairly clear that usage of the template should be simple and bots may ignore the template if it is not used in a bot-friendly way.

One example is parser functions are lazy evaluated (T10314), leading to bugs like T20478: Time-dependent conditionals (#ifeq, #switch, etc.) can leave link tables inconsistent.

It seems that MW also doesnt know that the links are out of date, so it isnt just intentional deferment of an expensive db update.

mwparserfromhell and the various regex in pywikibot are far from perfect also, and are unlikely to ever support parser functions, so they cant be used as a way to reliably get links. And it wouldnt surprise me if these approaches also match links (templates, categories,etc) in unevaluated portions of the wikicode, which means they would include links which do not exist.

Maybe the only way to make Page methods reliable (wrt to *used* links) is to have a parameter that causes PWB to do a purge with forcelinkupdate before using these API methods.

Another option is to use API parse to obtain the links.

This may be a situation where we need to ensure the Pywikibot devs and Wikimedia ops / dbs / devs are all on the same page before we proceed with a solution that works for Wikimedia sites.