Currently the only page titles available separately are namespace 0: all-titles-in-ns0.gz
Apart from this most other titles are available in pages-articles.xml.bz2
Except for User pages and Talk pages, which are available in pages-meta-current.xml.bz2
The articles and meta-current dumps are typically a couple of orders of magnitude larger than the all-titles-in-ns0 dump.
The only ways to get complete lists of page titles are to download and process these two enormous dump files or making excessive use of the API.
- We could dump a page title list to accompany each of pages-articles.xml.bz2 and pages-meta-current.xml.bz2
- We could dump a page title list for all namespaces.
- We could dump a page title list for all pages not already covered by all-titles-in-ns0.gz
- We could dump a page title list for each namespace.
For my current purpose I already need to process pages-articles.xml.bz2 so I only lack page titles for User and Talk pages so a dump of the titles for those namespaces would be enough for me, but might not be the best for other potential users of the data.
Version: unspecified
Severity: enhancement