Sometimes a slave server stops replicating, for instance due to some transitory funky error:
Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_do_db: Replicate_ignore_db: Last_errno: 1205 Last_error: Error 'Lock wait timeout exceeded; Try restarting transaction' on query. Default database: 'enwiki'. Query: 'UPDATE /* HTMLCacheUpdate::invalidateIDs This flag once ... */ `page` SET page_touched = '20090127180707' WHERE (page_id IN ('14890591'))'
In this case, there's no end-user-visible report of lag, but weird things happen such as a failure to show updated information on Special:Contributions.
After restarting the slave thread, we get a nice big warning like this:
Due to high database server lag, changes newer than 2146 seconds might not be shown in this list.
which is neat. It would be nice to have a similar warning if we're pulling from a server that's outright not replicating... it may be difficult to tell how far behind it is in this case, but even a "we're broken" warning would be nice.
Note that the lag report in the API shows up "" instead of say "0" for this case:
whereas the 'lagtop' script reports a 0. Lagtop perhaps should be updated to show a visible warning as well if this is detectable.