Description: The property suggester is used to provide property recommendations while editing wikidata items. This request is about the deployment of a new backend (SchemaTree) for this service. The initial goal is to perform A/B testing to determine whether the new system provides better recommendations without sacrificing performance. If this is the case, we will switch completely to the new backend. The service is already running on Wikimedia Cloud VPS (project:schematreerecommender) and used on https://wikidata.beta.wmflabs.org
See also https://phabricator.wikimedia.org/project/profile/2796/ and https://phabricator.wikimedia.org/T285098
Timeline: Ideally this service is in production before April 2022 as starting in April the volunteers have limited availability.
Point person:
- Marta Jansone @Martaannaj https://wikitech.wikimedia.org/wiki/User:Martaannaj
- Michael Cochez, @Michaelcochez https://wikitech.wikimedia.org/wiki/User:Michaelcochez
- (backup: Wikidata Team)
Technologies: The backend service (the index) is written in the go programming language. This service is called from the PropertySuggester MediaWiki extension. The reason to use go for this backend service is that this limits the resource usage greatly and allows for very fast processing of requests.
Technical details:
This service can be run on a virtual machine.
It should be noted that this service is completely stateless. At the start it loads an index from disk, which is then used to serve any future requests. This means:
- That the service can be replicated easily.
- That in case of a failure of the service, it can be restarted without consequences to future requests.
The service does not access any other services
- For the A/B testing, all event logging is done before this service is called
We periodically need to recreate that index (similarly to the current recommender, this will be done outside of the production environment ).
- This index is stored as a file and must be readable/retrievable by the service. The file is currently 85MB (compressed) and grows sublinear in the number of wikidata entities.
Given the statelessness of this service, a cache could be used. However,
- Since we are A/B testing this cache will skew our measurements, specifically the ones related to timing.
- The recommender server is essentially an index optimized specifically for this task, so it is unclear whether adding an additional caching layer will help.
- Given the current traffic patterns (https://grafana.wikimedia.org/d/000000559/api-requests-breakdown?orgId=1&refresh=5m&var-metric=p99&var-module=wbsgetsuggestions&from=now-30d&to=now ), or even if the number of requests would double, the service would be able to deal with the traffic.
Diagram: