Page MenuHomePhabricator

Don't regenerate sitemap files if nothing changed
Closed, DeclinedPublic

Description

Author: sergey.chernyshev

Description:
It's not necessary to regenerate sitemap files if it didn't actually change.

This might not make sense for Wikipedia where something changes for sure within a day, but might be quite useful for smaller sites.

One simple solution (considering current code structure) would be to generate files into temporary folder simultaneously calculating last modified timestamps and only moving new files in place of old ones if they actually have new entries (calculated timespamps are newer then old file's timestamps). Might also be a good idea to change file's timestamp to match calculated last modified timestamp.

Index sitemap should also be created using these last modified timestamps.


Version: unspecified
Severity: minor

Details

Reference
bz12862

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:02 PM
bzimport set Reference to bz12862.
bzimport added a subscriber: Unknown Object (MLST).

All I know is I use a cronjob to generate the sitemaps every month.

Well, indeed for small wikis, if no page at all in a namespace
changes, then that sitemap.gz for that namespace does not need to be
replaced.

Anyway, Google etc. will still see the date of the individual files are the
same as last time.

So though the idea is worthy, the savings aren't very big.

On a small wiki the sitemap is fast to produce, I don't think there is any point in saving a few CPU cycles by adding some more complexity.