Now that we have a WDQS cluster that can be accessed only inside production (cf. T178492) we should switch the Recommendation API service to use wdqs-internal.discovery.wmnet for issuing requests.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Recommendation API: Migrate to the new WDQS internal cluster | operations/puppet | production | +1 -1 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • mobrovac | T190266 Switch the Recommendation API to use the internal WDQS cluster | |||
Resolved | Smalyshev | T178492 Create a more controlled WDQS cluster | |||
Restricted Task | |||||
Resolved | mpopov | T179850 Run analysis of WDQS internal and external traffic | |||
Resolved | Gehel | T184083 Define the constraints of the new WDQS cluster | |||
Resolved | Gehel | T187766 Install / configure new WDQS servers | |||
Resolved | Gehel | T187800 rack/setup/install wdqs200[4-6] | |||
Resolved | • ayounsi | T188303 switch port configuration for wdq200[4-6] | |||
Resolved | RobH | T182991 New WDQS clusters eqiad + codfw | |||
Unknown Object (Task) | |||||
Unknown Object (Task) | |||||
Resolved | Smalyshev | T187767 Choose a service name for the new internal WDQS cluster and configure LVS | |||
Resolved | Smalyshev | T192835 Monitoring for internal cluster | |||
Resolved | Smalyshev | T192942 Identify and migrate existing internal clients of wdqs to the new internal cluster |
Event Timeline
Noticed that it works for some wikis. Example: https://es.wikipedia.org/api/rest_v1/data/recommendation/article/creation/translation/en/
The query does not always fail, e.g. I have just executed https://en.wikipedia.org/api/rest_v1/data/recommendation/article/creation/translation/es successfully. The issue that you are experiencing is known and the real culprit here is the WDQS cluster (which is used internally by the Recommendation API service). There are options being currently considered for making the cluster more robust, cf. T178492: Create a more controlled WDQS cluster.
@Gehel @Smalyshev should we now point the Recommendation API service to use a different LVS for WDQS? We are currently using wdqs.discovery.wmnet
Yes, we should migrate! I'm not sure where the recommendation API is configured. Mediawiki has been migrated to wdqs-internal.discovery.wmnet (T192942) around 0:00 UTC Apr 1st (just 10 hours ago), and I have no report of any issue yet. Note that wdqs-internal.discovery.wmnet only supports HTTP at this point (T193473).
Change 430052 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/puppet@production] Recommendation API: Migrate to the new WDQS internal cluster
Change 430052 merged by Gehel:
[operations/puppet@production] Recommendation API: Migrate to the new WDQS internal cluster
Mentioned in SAL (#wikimedia-operations) [2018-05-07T08:58:05Z] <mobrovac@tin> Started restart [recommendation-api/deploy@ac66089]: Use the internal WDQS cluster LVS - T190266
The Recommendation API service is now using the internal WDQS cluster, which should greatly improve the service's stability. Resolving.