Author: bugzilla_wikipedia_org.to.jamesd
Description:
Sequences like this for recentchanges happen several
times an hour:
DB Time Query or State -- ---- ----------
enwiki 239 UPDATE recentchanges SET
rc_this_oldid=6638303 WHERE rc_namespace=0 AND
rc_title='Legend_tripping' AND
rc_timestamp='20040912055536'
enwiki 205 UPDATE recentchanges SET
rc_this_oldid=6638305 WHERE rc_namespace=0 AND
rc_title='Gholamhossein_Mosahab' AND
rc_timestamp='20040229205048'
enwiki 204 UPDATE recentchanges SET
rc_this_oldid=6638306 WHERE rc_namespace=6 AND
rc_title='ZeroG.jpg' AND rc_timestamp='20040623185548'
enwiki 190 UPDATE recentchanges SET
rc_this_oldid=6638307 WHERE rc_namespace=0 AND
rc_title='Australian_hornet' AND
rc_timestamp='20040828135413'
enwiki 189 UPDATE recentchanges SET
rc_this_oldid=6638308 WHERE rc_namespace=0 AND
rc_title='Nima_Yooshij' AND
rc_timestamp='20040326005112'
enwiki 184 UPDATE recentchanges SET
rc_this_oldid=6638309 WHERE rc_namespace=6 AND
rc_title='As15-86-11603.jpg' AND
rc_timestamp='20040704033819'
enwiki 170 UPDATE recentchanges SET
rc_this_oldid=6638313 WHERE rc_namespace=6 AND
rc_title='Apollo15missionpatch.png' AND
rc_timestamp='20040726010236'
enwiki 170 UPDATE recentchanges SET
rc_this_oldid=6638314 WHERE rc_namespace=0 AND
rc_title='City_of_Sydney' AND
rc_timestamp='20041009112904'
Notice the timestamps, for a time before any existing
recentchanges records. Show innodb status indicates
that they are waiting for a lock on the rc_timestamp
index, presumably all waiting for the first block,
since all are before any entries.
Suggested change:
A. Check the timestamp to see if it is earlier than
the recentchanges purge time limit. If it is, don't
update: if the record still exists it is going to be
purged soon anyway.
This would be less pleasant for low traffic wikis,
where purges may be relatively infrequent and the
records without the information would stay around for
a while.
B. Unnecessary if A is done, better overall.
If the timestamp is before the purge threshold, just
do the update. If not:
Use the rc_timestamp index and search for any entries
with the timestamp from the planned update, limit 1.
If no results, there's no work to do and no write
lock needed.
Do the search on a DB slave. Two possible wrong
result cases for a slave:
- New record or edit. The timestamp will be within
the purge threshold so this code path will never
happen.
- Record just purged from the master, still
replicating. This will show a hit and a need to
update the record. The master will get an unnecessary
update, just as it does now. But far less of them.
Version: 1.3.x
Severity: normal