|Resolved||Addshore||T76705 SiteStore / SiteList performance and caching (tracking)|
|Invalid||None||T47532 Add file-based cached implementation of SiteStore|
|Invalid||None||T47531 Sites class should work with alternative storage implementations|
|Invalid||None||T77991 [Story] File based caching: Decide on file format|
|Resolved||aude||T77990 File based caching: Split of site lookup for readonly access|
|Invalid||None||T77993 [Task] File based caching: Implement new interface based on pre-generated files|
|Invalid||None||T77994 [Task] File based caching: Script that generates files|
|Declined||None||T77995 File based caching: Minimize access to local-id lookup|
- Mentioned In
- T93375: vagrant default wiki is not in interwiki table / sites table which means it can not be linked in wikidata
T76706: Design caching infrastructure for SiteStore
T58602: avoid fetching SiteList object from memcached
- Mentioned Here
- T77990: File based caching: Split of site lookup for readonly access
T77991: [Story] File based caching: Decide on file format
T77993: [Task] File based caching: Implement new interface based on pre-generated files
T77994: [Task] File based caching: Script that generates files
T77995: File based caching: Minimize access to local-id lookup
T77997: Investigate how to best move the sites component out of core
For deployment on high traffic sites like Wikipedia, there should be an implementation of the SiteStore interface backed by CDB. This allows fast access to the sites information, simmilar to the way the interwiki mappings are currently stored using CDB.
Suggest this approach:
- SiteStore class remains as is
- CachingSiteStore decorator is added
- CachingSiteStore has a field of type "general caching interface", which might be an existing interface or a new one
- An implementation of this "general caching interface" is created (or re-used if it exists) that does the type of caching deemed best at this time for the site info on the WMF cluster
I'm not sure the file should really be a cache. If we have the site info in a file, just use that file, period. It can and probably should be read-only.
We may want two or three files (corresponding to $wgInterwikiScopes).
This file is going to be read and parser a *lot*, so file format and encoding matter. I think we should at least consider & benchmark CDB, JSON, and CSV.
By the way, if we don't need unicode, try to bypass utf8 decoding, that tends to be slow.
if we make it a file store (suggest a name?), that might work but the SiteStore interface is not very suitable since it contains a bunch of write methods that don't make sense in a file store.
I think we need something like:
public function getSites();
public function getSite( $siteGlobalId );
if it's json, maybe for getSite, we can load the array with mapping $globalSiteId => $siteData and then lazy initialize the Site objects when requested. There are numerous places where we want just one Site object or just a few. This doesn't necessarily fit well with the design of the SiteList class though.
good idea to benchmark cdb, json and other options.
During SprintStart meeting today, we've split up further needed work into several tasks:
T77990 Split of site lookup for readonly access
T77991 Decide on file format: CDB, JSON, CSV
T77993 Implement new interface based on pre-generated files (one file per group and one file with everything indexed by global site id)
-> load everything when local lookup is needed
T77994 Script that generates files
T77995 Minimize access to local-id-lookup
Not a blocker, but should be done at some point: T77997 Investigate how to best move the sites component out of core
Static php files (which can be created at deploy time if needed like the l10n cdb files are on the WMF cluster) are much better than json for server-side performance. HHVM and PHP5/7 can both cache PHP bytecode and eliminate the file read and parse stages that json requires.