Sitemap generation is operationally complicated because sitemaps are treated like dumps. But generating them is fast enough that they could be treated like an API instead.
Consider an endpoint like /w/rest.php/site/v1/sitemap/<indexId>/<fileId>. If you allow N URLs per sitemap file, and M sitemap files per index, then this endpoint would stream out the URLs in the page_id range from N(jM + i) to N(jM + i + 1) - 1. The number of URLs produced would depend on how many pages have been deleted in the range, but would be at most N.
In WMF production, on Commons, the query
SELECT page_namespace, page_title, page_touched FROM page WHERE page_id BETWEEN 4000000 AND 4030000
only takes 40-50ms. We might not even need an object cache, we could just rely on the CDN to merge requests and cache responses.
The index file, say /w/rest.php/site/v1/sitemap/<indexId>, only needs to know the maximum page_id to figure out how many sitemap files it needs to link to.
It's not clear right now whether T54647 would benefit from having sitemaps for Commons. But if we did need sitemaps for Commons, this is how I think we should make them.
This would be a useful facility to have in core for the benefit of third party users. They could alias /sitemap.xml to /w/rest.php/site/v1/sitemap/0 to enable search engine discovery of the pages on their wiki.