Investigate recommendation-api-ng internal endpoint failure
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	kevinbazira
	Sep 27 2023, 11:30 AM

Description

In T347015#9188042, the recommendation-api-ng was deployed on LiftWing staging and an internal endpoint was set.

When we query the spec parameter, the endpoint works well:

# working within 15716.rec-api-staging-endpoint screen
kevinbazira@deploy2002:/srv/deployment-charts/helmfile.d/ml-services$ curl https://recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443/api/spec
{"basePath":"/api","consumes":["application/json"],"definitions":{"Article":{"properties":{"pageviews":{"description":"pageviews","type":"integer"},"rank":{"description":"rank","type":"number"},"title":{"description":"title","type":"string"},"wikidata_id":{"description":"wikidata_id","type":"string"}},"required":["rank","title","wikidata_id"],"type":"object"}},"info":{"title":"API","version":"1.0"},"paths":{"/":{"get":{"deprecated":true,"description":"Gets recommendations of source articles that are missing in the target","operationId":"get_legacy_article","parameters":[{"description":"Source wiki project language code","in":"query","name":"s","required":true,"type":"string"},{"description":"Target wiki project language code","in":"query","name":"t","required":true,"type":"string"},{"default":12,"description":"Number of recommendations to fetch","in":"query","maximum":24,"minimum":0,"name":"n","type":"integer"},{"description":"Seed article for personalized recommendations that can also be a list separated by \"|\"","in":"query","name":"article","pattern":"^([^|]+(\\|[^|]+)*)?$","type":"string"},{"default":true,"description":"Whether to include pageview counts","in":"query","name":"pageviews","type":"boolean"},{"collectionFormat":"multi","default":"morelike","description":"Which search algorithm to use if a seed is specified","enum":["morelike","wiki"],"in":"query","name":"search","type":"string"}],"responses":{"200":{"description":"Success","schema":{"items":{"$ref":"#/definitions/Article"},"type":"array"}}},"tags":["default"]}},"/spec":{"get":{"operationId":"get_spec","responses":{"200":{"description":"Success"}},"tags":["default"]}}},"produces":["application/json"],"responses":{"MaskError":{"description":"When any error occurs on mask"},"ParseError":{"description":"When a mask can't be parsed"}},"swagger":"2.0","tags":[{"description":"Default namespace","name":"default"}]}

But when we query other parameters, the endpoint hangs and on checking the pod, we get a readiness probe failed error:

# working within 15716.rec-api-staging-endpoint screen
kevinbazira@deploy2002:~$ curl 'https://recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443/api/?s=en&t=fr&n=3&article=Apple'

# working outside the screen
kevinbazira@deploy2002:~$ kube_env recommendation-api-ng ml-staging-codfw

kevinbazira@deploy2002:~$ kubectl get pods
NAME                                          READY   STATUS    RESTARTS   AGE
recommendation-api-ng-main-7b8659c6f7-46dpd   1/2     Running   0          5d13h

kevinbazira@deploy2002:~$ kubectl describe pod recommendation-api-ng-main-7b8659c6f7-46dpd
...
Readiness probe failed: Get "http://10.194.61.138:8080/api/spec": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

The goal of this task is to find out what is causing this internal endpoint failure and a possible solution.

Details

Subject	Repo	Branch	Lines +/-
ml-services: update recommendation-api-ng max_candidates	operations/deployment-charts	master	+1 -1
ml-services: update the recommendation-api-ng image	operations/deployment-charts	master	+2 -1
Reduce the number candidates processed by the rec-api-ng	research/recommendation-api	master	+5 -1
ml-services: add listener for mw-api in the rec-api-ng's config	operations/deployment-charts	master	+1 -0
ml-services: enable the uwsgi master process for rec-api-ng	operations/deployment-charts	master	+1 -1
ml-services: increase recommendation-api-ng uwsgi workers	operations/deployment-charts	master	+5 -2
ml-services: fix recommendation-api-ng readiness probe failure	operations/deployment-charts	master	+1 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T296994 Observations from research study for Section Translation on Thai Wikipedia
Open	None	T293648 Content Translation Recommendations API
Open	kevinbazira	T308164 Migrate Content Translation Recommendation API to Lift Wing
Open	kevinbazira	T339890 Host the recommendation-api container on LiftWing
Resolved	kevinbazira	T347015 Deploy the recommendation-api-ng on LiftWing
Resolved	kevinbazira	T347475 Investigate recommendation-api-ng internal endpoint failure
Resolved	kevinbazira	T348607 Configure envoy settings to enable rec-api-ng container to access endpoints external to k8s/LiftWing

Event Timeline

kevinbazira created this task.Sep 27 2023, 11:30 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 27 2023, 11:30 AM

Change 960681 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: fix recommendation-api-ng readiness probe failure

https://gerrit.wikimedia.org/r/960681

gerritbot added a project: Patch-For-Review.Sep 27 2023, 12:02 PM

Change 960681 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: fix recommendation-api-ng readiness probe failure

https://gerrit.wikimedia.org/r/960681

Maintenance_bot removed a project: Patch-For-Review.Sep 27 2023, 3:31 PM

Change 961509 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: increase recommendation-api-ng uwsgi workers

https://gerrit.wikimedia.org/r/961509

gerritbot added a project: Patch-For-Review.Sep 28 2023, 11:00 AM

kevinbazira mentioned this in T347263: Create external endpoint for recommendation-api-ng hosted on LiftWing.Oct 3 2023, 8:03 AM

Investigation Report

Without being able to fully test the recommendation-api on wmflabs and LiftWing, I ran a couple of experiments to investigate the cause of the performance disparity between the recommendation-api hosted on wmflabs and LiftWing, which is causing the internal endpoint to hang (as shown in this task's description). The steps I took and the results achieved are detailed below.

Query API, record duration, and develop hypothesis

Querying the recommendation-api hosted on wmflabs returned results within ~3 seconds:

$ time curl 'https://recommend.wmflabs.org/types/translation/v1/articles?source=en&target=fr&seed=Apple'
[{"pageviews": 136, "title": "Ceratotheca_sesamoides", "wikidata_id": "Q16751030", "rank": 499.0}, {"pageviews": 359, "title": "Balady_citron", "wikidata_id": "Q4849788", "rank": 497.0}, {"pageviews": 181, "title": "York_Imperial", "wikidata_id": "Q8055480", "rank": 496.0}, {"pageviews": 126, "title": "Bidni", "wikidata_id": "Q25095805", "rank": 492.0}, {"pageviews": 841, "title": "Ilama_(fruit)", "wikidata_id": "Q24849787", "rank": 487.0}, {"pageviews": 26, "title": "Christmas_tree_pests_and_weeds", "wikidata_id": "Q5111561", "rank": 484.0}, {"pageviews": 11, "title": "Kentville_Research_and_Development_Centre", "wikidata_id": "Q4816432", "rank": 483.0}, {"pageviews": 39, "title": "Apis_mellifera_pomonella", "wikidata_id": "Q310606", "rank": 480.0}, {"pageviews": 353, "title": "EverCrisp", "wikidata_id": "Q65057164", "rank": 478.0}, {"pageviews": 72, "title": "Splendour_(apple)", "wikidata_id": "Q19840849", "rank": 474.0}, {"pageviews": 27, "title": "Specific_replant_disease", "wikidata_id": "Q889601", "rank": 473.0}, {"pageviews": 1078, "title": "Butterfly_gardening", "wikidata_id": "Q3243966", "rank": 468.0}]

real	0m3.817s
user	0m0.049s
sys	0m0.016s

Since I couldn't run tests on LiftWing, I used the same configurations we are running on LiftWing and tested the recommendation-api in a docker image on my local machine which returned results within ~13 seconds:

$ time curl 'http://127.0.0.1/api/?s=en&t=fr&n=3&article=Apple'
[{"title": "Ceratotheca_sesamoides", "rank": 499.0, "pageviews": 136, "wikidata_id": "Q16751030"}, {"title": "Balady_citron", "rank": 497.0, "pageviews": 359, "wikidata_id": "Q4849788"}, {"title": "York_Imperial", "rank": 496.0, "pageviews": 181, "wikidata_id": "Q8055480"}]

real	0m13.885s
user	0m0.010s
sys	0m0.009s

Considering a similar codebase is running on both wmflabs and the local container, my hypothesis was that the performance disparity was caused by the resources configured to run the local container vs those on the wmflabs servers.

Test hypothesis by profiling the code and comparing resources in local container vs those on stat1008 server

Since I could not run code profiling on the recommendation-api instance hosted on wmflabs, I run it on stat1008 to simulate wmflabs resources.

To test the hypothesis, I profiled the translation recommendation service in the local container and on stat1008 using the code below:

$ echo "
> import cProfile
> import pstats
> from recommendation.api.types.translation import translation
> profile = cProfile.Profile()
> profile.run(\"translation.recommend('en', 'fr', 'morelike', 'Apple', 400, include_pageviews=False)\")
> # profile.print_stats()
> results = pstats.Stats(profile)
> results.sort_stats('time')
> results.print_stats()
> " > profile_translation_recommendation.py
...
$ python profile_translation_recommendation.py > profile_results.log

Here are the top 20 code profiling results returned from the container in my local environment:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      130    5.775    0.044    5.775    0.044 {method 'acquire' of '_thread.lock' objects}
        3    1.499    0.500    1.499    0.500 {method 'do_handshake' of '_ssl._SSLSocket' objects}
        6    1.416    0.236    1.416    0.236 {method 'read' of '_ssl._SSLSocket' objects}
        3    0.753    0.251    0.753    0.251 {method 'connect' of '_socket.socket' objects}
        1    0.090    0.090    5.878    5.878 /home/recommendation-api/recommendation/api/external_data/wikidata.py:56(get_items)
        3    0.071    0.024    0.071    0.024 {built-in method getaddrinfo}
        3    0.068    0.023    0.068    0.023 {method 'load_verify_locations' of '_ssl._SSLContext' objects}
        3    0.006    0.002    0.008    0.003 /usr/local/lib/python3.4/site-packages/urllib3/connectionpool.py:407(close)
        1    0.005    0.005    9.774    9.774 <string>:1(<module>)
        6    0.004    0.001    0.007    0.001 /usr/local/lib/python3.4/email/feedparser.py:217(_parsegen)
        3    0.003    0.001    0.003    0.001 /usr/local/lib/python3.4/json/decoder.py:349(raw_decode)
        1    0.002    0.002    2.827    2.827 /home/recommendation-api/recommendation/api/types/translation/candidate_finders.py:27(get_morelike_candidates)
       13    0.002    0.000    0.002    0.000 {built-in method start_new_thread}
        6    0.002    0.000    0.002    0.000 {method 'set_ciphers' of '_ssl._SSLContext' objects}
      100    0.001    0.000    0.001    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
      222    0.001    0.000    0.001    0.000 {built-in method __new__ of type object at 0x7fe5518c2d40}
      501    0.001    0.000    0.001    0.000 /home/recommendation-api/recommendation/api/types/translation/recommendation.py:2(__init__)
      213    0.001    0.000    0.001    0.000 /usr/local/lib/python3.4/collections/__init__.py:62(__setitem__)
        3    0.001    0.000    0.001    0.000 {method 'peer_certificate' of '_ssl._SSLSocket' objects}
      102    0.001    0.000    0.003    0.000 /usr/local/lib/python3.4/urllib/parse.py:286(urlparse)

The resources available on my local container are:

root@49af7b8f623c:/home/recommendation-api# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 142
Model name:            Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
Stepping:              10
CPU MHz:               1021.928
CPU max MHz:           4200.0000
CPU min MHz:           400.0000
BogoMIPS:              4199.88
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities

Here are the top 20 code profiling results returned from a python virtual env setup on stat1008 to run the rec-api:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      126    2.194    0.017    2.194    0.017 {method 'acquire' of '_thread.lock' objects}
        5    0.687    0.137    0.687    0.137 {method 'read' of '_ssl._SSLSocket' objects}
        1    0.057    0.057    2.259    2.259 /srv/home/kevinbazira/profile-rec-api/research-recommendation-api/recommendation/api/external_data/wikidata.py:56(get_items)
        3    0.028    0.009    0.028    0.009 {method 'load_verify_locations' of '_ssl._SSLContext' objects}
        3    0.006    0.002    0.006    0.002 {method 'do_handshake' of '_ssl._SSLSocket' objects}
        3    0.005    0.002    0.006    0.002 /srv/home/kevinbazira/profile-rec-api/rec-api-venv/lib/python3.7/site-packages/urllib3/connection.py:270(close)
        3    0.005    0.002    0.005    0.002 {built-in method _socket.getaddrinfo}
        1    0.004    0.004    3.037    3.037 <string>:1(<module>)
        3    0.002    0.001    0.002    0.001 /usr/lib/python3.7/json/decoder.py:343(raw_decode)
      251    0.002    0.000    0.002    0.000 {built-in method __new__ of type object at 0x81d8a0}
        3    0.001    0.000    0.001    0.000 {method 'connect' of '_socket.socket' objects}
       13    0.001    0.000    0.001    0.000 {built-in method _thread.start_new_thread}
        2    0.001    0.000    0.001    0.001 /srv/home/kevinbazira/profile-rec-api/research-recommendation-api/recommendation/api/external_data/fetcher.py:99(<listcomp>)
      119    0.001    0.000    0.001    0.000 {method 'search' of 're.Pattern' objects}
       18    0.001    0.000    0.001    0.000 /srv/home/kevinbazira/profile-rec-api/rec-api-venv/lib/python3.7/site-packages/urllib3/util/url.py:227(_encode_invalid_chars)
        2    0.001    0.000    0.001    0.000 {method 'decompress' of 'zlib.Decompress' objects}
        6    0.001    0.000    0.001    0.000 /usr/lib/python3.7/http/cookiejar.py:452(parse_ns_headers)
      102    0.001    0.000    0.002    0.000 /usr/lib/python3.7/urllib/parse.py:361(urlparse)
        1    0.001    0.001    0.680    0.680 /srv/home/kevinbazira/profile-rec-api/research-recommendation-api/recommendation/api/types/translation/candidate_finders.py:27(get_morelike_candidates)
     1738    0.001    0.000    0.001    0.000 {method 'lower' of 'str' objects}

The resources available on the stat1008 server:

(rec-api-venv) kevinbazira@stat1008:~/profile-rec-api/research-recommendation-api$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              32
On-line CPU(s) list: 0-31
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
Stepping:            7
CPU MHz:             800.352
CPU max MHz:         3200.0000
CPU min MHz:         800.0000
BogoMIPS:            4200.00
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            11264K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

The results above demonstrate the hypothesis was correct. The performance disparity was caused by my local configuration having fewer resources than the servers.

Conclusion

In conclusion, for the recommendation-api instance hosted on LiftWing to achieve the same fast performance as the one on wmflabs, we need to use resources similar to those on the stat server. Currently, I am not aware of the amount of resources available on LiftWing. However, I will engage the SREs to get more information on the resources we can configure on LiftWing.

@kevinbazira I really like the profiling that you did, but I think that the conclusion may not be what we are looking for. You are comparing a result obtained on your local machine (that connects to your Internet ISP with a certain latency etc..) vs a host that is running inside the WMF network. I assume that the HTTP URL that we are testing fetches data from the network, so this surely explain part of the differences that you are seeing.

One interesting thing that I noticed is the amount of time it takes to execute a similar bit of code:

Local env:

1    0.005    0.005    9.774    9.774 <string>:1(<module>)
6    0.004    0.001    0.007    0.001 /usr/local/lib/python3.4/email/feedparser.py:217(_parsegen)
3    0.003    0.001    0.003    0.001 /usr/local/lib/python3.4/json/decoder.py:349(raw_decode)
1    0.002    0.002    2.827    2.827 /home/recommendation-api/recommendation/api/types/translation/candidate_finders.py:27(get_morelike_candidates)

The profiling on stat1008 doesn't lead to the same profiling, but I'd be curious to know why it takes so much time (~9s afaics) to execute that. Maybe we are heavily parsing json code every time?

On stat1008 we have:

  1    0.004    0.004    3.037    3.037 <string>:1(<module>)
  3    0.002    0.001    0.002    0.001 /usr/lib/python3.7/json/decoder.py:343(raw_decode)
251    0.002    0.000    0.002    0.000 {built-in method __new__ of type object at 0x81d8a0}
  3    0.001    0.000    0.001    0.000 {method 'connect' of '_socket.socket' objects}
 13    0.001    0.000    0.001    0.000 {built-in method _thread.start_new_thread}
  2    0.001    0.000    0.001    0.001 /srv/home/kevinbazira/profile-rec-api/research-recommendation-api/recommendation/api/external_data/fetcher.py:99(<listcomp>)

Again some json decoder popping up, but probably the hw on stat1008 works faster and we see only 3s, that are a lot anyway.

The other bit to work on, in my opinion, is:

/home/recommendation-api/recommendation/api/external_data/wikidata.py:56(get_items.py)

In both cases it seems that we spend a lot of time fetching data from wikidata, maybe we can tune the HTTP connection pool used by uwsgi? Is it related to the number of workers, completely separate, etc..? Are we missing some config added to wmlabs?

@Isaac looping you in since you may be already aware of some bottlenecks :) Don't feel that you need to code or anything, if you could give us hints as you see updates coming we'd be really thankful :)

Thanks @elukey. Currently, these are the resources allocated to the recommendation-api pod. I am not aware of the amount of resources available on LiftWing. Is it possible to increase the resources allocated to the pod so that we can eliminate the possibility of resource constraints affecting the performance of the endpoint?

I'll push a patch to test this once you confirm.

In T347475#9224098, @kevinbazira wrote:

Thanks @elukey. Currently, these are the resources allocated to the recommendation-api pod. I am not aware of the amount of resources available on LiftWing. Is it possible to increase the resources allocated to the pod so that we can eliminate the possibility of resource constraints affecting the performance of the endpoint?

I'll push a patch to test this once you confirm.

We'll have to change the settings of the application in Lift Wing for sure, the point is that we should figure out what is the bottleneck. One possible first step could be:

Increase the number of uwsgi workers to two
increase cpu allocation to 2

This will alleviate the problem, but not resolve it (what if we need more workers etc..). We should figure out where the bottleneck in the code is, and possibly improve it, before proceeding further.

Have you tried to check my suggestions in T347475#9220460 ? The profiling that you did is good, but we are missing the follow ups.

Keep it in mind that on k8s we have a minimal set of dependencies deployed, so we will not able to perform any profiling in there. We should concentrate on local/stat1008 testing, and see if we can improve settings or code.

For example, how much time does it take to complete a request on stat1008? From the profiling IIUC it seems that we are around several seconds, you could keep that result as baseline and try to see if changing settings etc.. could help (either code bottlenecks or uwsgi settings that we are missing etc..). Once we'll have a good improvement it will be clear what do add to Lift Wing :)

Change 961509 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: increase recommendation-api-ng uwsgi workers

https://gerrit.wikimedia.org/r/961509

Maintenance_bot removed a project: Patch-For-Review.Oct 4 2023, 11:10 AM

Leaving some thoughts: as Kevin's profiling shows, the majority of the work is going on under the umbrella of /home/recommendation-api/recommendation/api/external_data/wikidata.py:56(get_items) (code). This function takes the set of related or popular articles in the source language (max of 500 and usually this high in source languages like English) and filters them down to only articles that don't exist yet in the target language. It does this by calling the wbgetentities endpoint in batches of 50 articles via multiprocessing so we're talking 10 API calls most of the time here. Options for how to reduce this down:

Reduce down the number of candidates from 500 that are checked: If all 500 are needed, then doing them all at once via multiprocessing is obviously fastest. But if we can get our final recommendation set of 10 items with just say 100 candidates, then that's likely to be far faster. This should be simple to implement -- instead of this line with the single call on all 500 candidates, you instead chunk the candidates in perhaps sets of 100 with a basic for loop and stop building the list of recs when it hits say 5x of the count variable. The reason to not stop as soon as you hit count recommendations is because you want to have some overhead so the sorting has more items to work with. I forget where but there's also some randomness built in so two people making the same request don't get the same list and that would also require more recs generated than you return. Trivially you could also just remove all multiprocessing from the Wikidata get_items code and do it in sets of 50 but not sure if that's a great idea because it'll be quite slow when you do need to check e.g., 200 or 300 candidates.
Add more CPUs so the multiprocessing can handle all 10 API calls at once. Self-explanatory but expensive.
Reduce size of Wikidata API result objects: A while back we added code to allow for filtering the results list to just biographies of women (as part of a WikiGap campaign). That code still exists in the codebase even though it's no longer used. It was a reasonable patch for the time but rolling it back I think is also fine as we think about how to do this in a more sustainable way. Most of the code is trivial (things like this) but the big change it made was requiring that we fetch each item's claims alongside sitelinks/urls. Removing that piece of the configuration should massively drop the size of the returned API objects and I suspect speed up latency. The campaign was the only thing that used it so I think it probably wouldn't create errors to just adjust the configuration to remove claims from the parameter but you'd have to test to be sure.

@kevinbazira thanks a lot for all the thorough investigation!
@Isaac Thanks a lot for your recommendations they are really helpful.
In the codebase there is a script for profiling under recommendation/data/profile_performance.py
After running it the most improtant part is the following (which shows what you all have been saying already about wikidata.py)

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    5.367    5.367 /Users/isaranto/repoz/recommendation-api/recommendation/api/types/related_articles/related_articles.py:128(recommend)
     1    0.000    0.000    5.366    5.366 /Users/isaranto/repoz/recommendation-api/recommendation/api/types/related_articles/candidate_finder.py:24(get_candidates)
     3    0.000    0.000    4.903    1.634 /Users/isaranto/repoz/recommendation-api/recommendation/utils/logger.py:19(timed)
     3    0.016    0.005    4.903    1.634 /Users/isaranto/repoz/recommendation-api/recommendation/api/external_data/wikidata.py:56(get_items)
     3    0.000    0.000    4.885    1.628 /Users/isaranto/repoz/recommendation-api/recommendation/api/external_data/wikidata.py:71(chunk_query_for_parameter)
     2    0.000    0.000    4.236    2.118 /Users/isaranto/repoz/recommendation-api/recommendation/api/external_data/wikidata.py:47(get_titles_from_wikidata_items)
     1    0.000    0.000    0.667    0.667 /Users/isaranto/repoz/recommendation-api/recommendation/api/external_data/wikidata.py:42(get_wikidata_items_from_titles)
     1    0.000    0.000    0.425    0.425 /Users/isaranto/repoz/recommendation-api/recommendation/api/types/related_articles/candidate_finder.py:61(resolve_seed)
     1    0.000    0.000    0.425    0.425 /Users/isaranto/repoz/recommendation-api/recommendation/api/external_data/fetcher.py:83(wiki_search)
     1    0.000    0.000    0.423    0.423 /Users/isaranto/repoz/recommendation-api/recommendation/api/external_data/fetcher.py:11(get)

I suggest we cut down the candidates to half and we also try play around with uwsgi settings as @elukey suggested above (which kevin is already doing).
If these don't give us serving time same as the wmflabs instance then we can focus on further improving the code but not before that, as our initial purpose is to just port the app to LW. This is open for discussion of course but improving the rec-api would be part of another task.

Thanks @isarantopoulos and @Isaac for the suggestions!

@elukey advised that we limit the recommendation-api resources to 2 uwsgi workers and 2 cpus. We implemented this and on testing it on the staging environment, the message below is returned:

kevinbazira@deploy2002:~$ curl 'https://recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443/api/?s=en&t=fr&n=3&article=Apple'
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error:

I used the same settings we set above and run the container on the ml-sandbox as shown below:

# run in first terminal
$ ssh ml-sandbox.machine-learning.eqiad1.wikimedia.cloud
kevinbazira@ml-sandbox:~$ docker pull docker-registry.wikimedia.org/wikimedia/research-recommendation-api:stable
kevinbazira@ml-sandbox:~$ docker run -p 80:8080 -it --memory=1g  --cpus=2 --entrypoint=/bin/bash docker-registry.wikimedia.org/wikimedia/research-recommendation-api:stable
kevinbazira@ml-sandbox:~$ uwsgi --http :8080 --workers 2 --wsgi-file recommendation/data/recommendation.wsgi
...
# run in second terminal
kevinbazira@ml-sandbox:~$ time curl 'http://127.0.0.1/api/?s=en&t=fr&n=3&article=Apple'
[{"pageviews": 343, "title": "Treculia_africana", "wikidata_id": "Q2017779", "rank": 494.0}, {"pageviews": 24, "title": "Christmas_tree_pests_and_weeds", "wikidata_id": "Q5111561", "rank": 489.0}, {"pageviews": 1123, "title": "Banana_bunchy_top_virus", "wikidata_id": "Q4853861", "rank": 483.0}]

real	0m3.708s
user	0m0.014s
sys	0m0.005s

Having used similar resource limits on both the ml-sandbox and Liftwing, I wonder why the endpoint on Liftwing is still not working as expected.

I could see time-outs being related to resources but the memory usage of the app should be incredibly low so I wouldn't expect resources to be connected to other types of errors. Only other thing I could think of would be if the multiprocessing library didn't play well with the staging environment for some reason (code).

@kevinbazira on Lift Wing we also have envoy proxy as TLS terminator, that handles TLS traffic on port 8443 and proxies to 8080 (that is the port bound by uwsgi). There is probably an issue between the two with the new settings.

@Isaac do you reckon if we could use multi-threading instead of multiprocessing? Are those all HTTP-like calls (hence preemptable) or are we talking about cpu bound code?

@Isaac do you reckon if we could use multi-threading instead of multiprocessing? Are those all HTTP-like calls (hence preemptable) or are we talking about cpu bound code?

It's just API calls to Wikidata that can be slow to return given their size. They're calls to the wbgetentities endpoint via requests.post() calls.

elukey moved this task from Unsorted to In Progress on the Machine-Learning-Team board.Oct 9 2023, 3:42 PM

Change 964629 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: enable the uwsgi master process for rec-api-ng

https://gerrit.wikimedia.org/r/964629

gerritbot added a project: Patch-For-Review.Oct 10 2023, 7:45 AM

Change 964629 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: enable the uwsgi master process for rec-api-ng

https://gerrit.wikimedia.org/r/964629

Maintenance_bot removed a project: Patch-For-Review.Oct 10 2023, 8:10 AM

In T347475#9235450, @Isaac wrote:

@Isaac do you reckon if we could use multi-threading instead of multiprocessing? Are those all HTTP-like calls (hence preemptable) or are we talking about cpu bound code?

It's just API calls to Wikidata that can be slow to return given their size. They're calls to the wbgetentities endpoint via requests.post() calls.

After a closer look I think that we are already using a thread pool:

https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.dummy

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

In particular, the Pool function provided by multiprocessing.dummy returns an instance of ThreadPool, which is a subclass of Pool that supports all the same method calls but uses a pool of worker threads rather than worker processes.

See this line. Does it make sense?

Change 964859 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] ml-services: add listener for mw-api in the rec-api-ng's config

https://gerrit.wikimedia.org/r/964859

gerritbot added a project: Patch-For-Review.Oct 10 2023, 9:12 AM

After a closer look I think that we are already using a thread pool:
https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.dummy

Yeah, I think you're right though threading etc. has never been my strong suit. A few additional thoughts:

When I run it locally, I get this error (not sure how important this is): https://stackoverflow.com/questions/54042038/uswgi-unable-to-load-configuration-from-from-multiprocessing-semaphore-tracker
More long-term as we think about latency: I played around non-scientifically with the number of candidates (250 vs. 500) and removing the claims component from the API calls, I see the largest improvements from dropping claims. This does make some sense: the threading allows for the API calls to not block each other and most of the processing is happening on the Wikibase API side so reducing the number of candidates doesn't have a major impact as even a single API call takes some time. Removing the claims however from the call greatly reduces the size of the response object and presumably work done by the Wikibase API so the responses are noticeably quicker. So I know you don't really want to mess with the core logic of the API but that might be the sensible pathway here if adding resources and reducing the number of candidates doesn't have the desired impact.

kevinbazira mentioned this in T348607: Configure envoy settings to enable rec-api-ng container to access endpoints external to k8s/LiftWing.Oct 11 2023, 7:15 AM

Change 964859 merged by Elukey:

[operations/deployment-charts@master] ml-services: add listener for mw-api in the rec-api-ng's config

https://gerrit.wikimedia.org/r/964859

Maintenance_bot removed a project: Patch-For-Review.Oct 11 2023, 7:30 AM

Following T347475#9235065, it was clear that the rec-api-ng instance on LiftWing doesn't work like the rec-api container outside LiftWing, despite having similar resources allocated. We did some troubleshooting with @elukey on IRC and discovered that the container on LiftWing can not access wikimedia, wikidata, and wikipedia endpoints. In T348607 we are configuring envoy to enable this instance to access endpoints external to k8s/LiftWing.

We resolved the envoy proxy issues as shown in T348607#9264192. However, the rec-api instance hosted on LiftWing is still not performing as expected. The container logs on staging show that the worker processes are being terminated when the rec-api is processing a large number of results (501) from the "morelike" external endpoint.

Wed Oct 18 10:32:52 2023 - *** HARAKIRI ON WORKER 1 (pid: 134, try: 1) ***
Wed Oct 18 10:32:52 2023 - HARAKIRI !!! worker 1 status !!!
Wed Oct 18 10:32:52 2023 - HARAKIRI [core 0] 127.0.0.1 - GET /api/?s=en&t=fr&n=3&article=Apple since 1697625156
Wed Oct 18 10:32:52 2023 - HARAKIRI !!! end of worker 1 status !!!
2023-10-18 10:32:52,799 recommendation.utils.event_logger log_api_request():39 INFO -- Logging event: {"schema": "TranslationRecommendationAPIRequests", "$schema": "/analytics/legacy/$translationrecommendationapirequests/1.0.0", "revision": 16261139, "event": {"timestamp": 1697625172, "sourceLanguage": "en", "targetLanguage": "fr", "seed": "Apple", "searchAlgorithm": "morelike"}, "webHost": "recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443", "client_dt": "2023-10-18T10:32:52.799807", "meta": {"stream": "eventlogging_TranslationRecommendationAPIRequests", "domain": "recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443"}}
2023-10-18 10:32:53,089 recommendation.api.types.translation.candidate_finders get_morelike_candidates():39 INFO -- morelike returned 501 results
DAMN ! worker 1 (pid: 134) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 1 (new pid: 183)
[pid: 183|app: 0|req: 8/35] 10.192.0.201 () {30 vars in 376 bytes} [Wed Oct 18 10:33:02 2023] GET /api/spec => generated 1908 bytes in 4 msecs (HTTP/1.1 200) 5 headers in 198 bytes (1 switches on core 0)
Wed Oct 18 10:33:08 2023 - *** HARAKIRI ON WORKER 2 (pid: 166, try: 1) ***
Wed Oct 18 10:33:08 2023 - HARAKIRI !!! worker 2 status !!!
Wed Oct 18 10:33:08 2023 - HARAKIRI [core 0] 127.0.0.1 - GET /api/?s=en&t=fr&n=3&article=Apple since 1697625172
Wed Oct 18 10:33:08 2023 - HARAKIRI !!! end of worker 2 status !!!
DAMN ! worker 2 (pid: 166) died, killed by signal 9 :( trying respawn ...
Respawned uWSGI worker 2 (new pid: 187)

We are going to experiment with different resource limits and monitor the rec-api-ng performance on staging to identify the optimal settings that will enable the API to return results and perform as expected.

Change 966830 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update rec-api-ng resource limits to match wmflabs

https://gerrit.wikimedia.org/r/966830

gerritbot added a project: Patch-For-Review.Oct 19 2023, 9:20 AM

Change 966831 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[research/recommendation-api@master] Reduce the number candidates processed by the rec-api-ng

https://gerrit.wikimedia.org/r/966831

Change 966831 merged by jenkins-bot:

[research/recommendation-api@master] Reduce the number candidates processed by the rec-api-ng

https://gerrit.wikimedia.org/r/966831

Change 966833 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update the recommendation-api-ng image

https://gerrit.wikimedia.org/r/966833

Change 966833 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update the recommendation-api-ng image

https://gerrit.wikimedia.org/r/966833

Change 966830 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update recommendation-api-ng max_candidates

https://gerrit.wikimedia.org/r/966830

Maintenance_bot removed a project: Patch-For-Review.Oct 20 2023, 11:10 AM

In T348607#9267956 we fixed the envoy listener for the pageviews endpoint. Now the rec-api-ng is able to access all external endpoints from k8s/LiftWing.

We also run load tests on staging that showed 4 cpus, 4 workers, and candidates lower than 500 didn't add much improvement to the api performance:

results for 250 candidates, 4 cpus, and 4 workers

$ time curl 'https://recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443/api/?s=en&t=fr&n=3&article=Apple'
[{"pageviews": 0, "title": "Beetroot", "wikidata_id": "Q99548274", "rank": 243.0}, {"pageviews": 0, "title": "Malus_\u00d7_zumi", "wikidata_id": "Q5990804", "rank": 239.0}, {"pageviews": 0, "title": "PRI_disease_resistant_apple_breeding_program", "wikidata_id": "Q19840594", "rank": 237.0}]

real	0m2.658s
user	0m0.013s
sys	0m0.000s

results for 500 candidates, 2 cpus, and 2 workers

$ time curl 'https://recommendation-api-ng.k8s-ml-staging.discovery.wmnet:31443/api/?s=en&t=fr&n=3&article=Apple'
[{"pageviews": 0, "title": "Peanut_stunt_virus", "wikidata_id": "Q7157945", "rank": 496.0}, {"pageviews": 0, "title": "Kentville_Research_and_Development_Centre", "wikidata_id": "Q4816432", "rank": 495.0}, {"pageviews": 0, "title": "Specific_replant_disease", "wikidata_id": "Q889601", "rank": 494.0}]

real	0m2.141s
user	0m0.010s
sys	0m0.005s

The final rec-api-ng configuration that we've deployed on staging and prod uses 500 candidates, 2 cpus, and 2 workers.

@Isaac, the rec-api-ng endpoint now works as shown below. Please let us know whether there are edge cases we might have missed:

$ time curl "https://recommendation-api-ng.discovery.wmnet:31443/api/?s=en&t=fr&n=3&article=Basketball"
[{"pageviews": 0, "title": "2019_United_States_FIBA_Basketball_World_Cup_team", "wikidata_id": "Q56042822", "rank": 495.0}, {"pageviews": 0, "title": "Molly_Bolin", "wikidata_id": "Q27451749", "rank": 482.0}, {"pageviews": 0, "title": "Dom_Flora", "wikidata_id": "Q5289728", "rank": 478.0}]

real	0m1.597s
user	0m0.007s
sys	0m0.007s

thanks @kevinbazira ! That API call worked for me as well. A few notes:

When I add the &pageviews parameter, the pageviews values in the response all become null instead of populated. I'm not sure whether that's actually something you all want to fix or not -- aka whether it's easy to fix and whether Content Translation actually needs that or not. My guess is no but they could tell you. I mainly mention it because perhaps it's related the point below.
When I leave off the seed parameter, I'm not getting results. The Content Translation folks use this when a user shows up for general recommendations (not based on past edit history or a specific search). For instance, when I go to Content Translation on a wiki where I qualify for it (probably not English but most other wikis will let you access it if you're signed in), you'll see something in the network tab like: https://recommend.wmflabs.org/types/translation/v1/articles?source=en&target=es&seed=&search=related_articles&application=CX. Replacing that with the internal endpoint version (curl "https://recommendation-api-ng.discovery.wmnet:31443/types/translation/v1/articles?source=en&target=es&seed=&search=related_articles&application=CX") returned an empty list. Not sure where it's breaking -- it's supposed to use grab popular articles in the wiki when the seed isn't set (code).
Thanks for doing the comparison of 500 vs. 250 candidates! Seems that any latency improvements will have to come from a more in-depth consideration of existing features and what to retain.

Thank you for sharing the notes, @Isaac. We were able to reproduce this issue in T348607#9275075 and fixed it in T348607#9275348. The rec-api-ng now returns results whether or not the seed parameter is specified as shown below:

results when seed is Apple

$ time curl "https://recommendation-api-ng.discovery.wmnet:31443/types/translation/v1/articles?source=en&target=es&seed=Apple&search=related_articles&application=CX"
[{"pageviews": 1411, "title": "Agriculture_in_California", "wikidata_id": "Q65090775", "rank": 498.0}, {"pageviews": 98, "title": "Splendour_(apple)", "wikidata_id": "Q19840849", "rank": 495.0}, {"pageviews": 878, "title": "EverCrisp", "wikidata_id": "Q65057164", "rank": 494.0}, {"pageviews": 346, "title": "Genetically_modified_tree", "wikidata_id": "Q17016206", "rank": 490.0}, {"pageviews": 42, "title": "Strawberry_vein_banding_virus", "wikidata_id": "Q7622526", "rank": 488.0}, {"pageviews": 26, "title": "Grossane", "wikidata_id": "Q3117044", "rank": 486.0}, {"pageviews": 884, "title": "Cider_in_the_United_States", "wikidata_id": "Q21028582", "rank": 481.0}, {"pageviews": 14, "title": "Pam's_Delight", "wikidata_id": "Q19598371", "rank": 479.0}, {"pageviews": 188, "title": "Boscia_senegalensis", "wikidata_id": "Q4947162", "rank": 476.0}, {"pageviews": 619, "title": "Rambo_apple", "wikidata_id": "Q3418406", "rank": 475.0}, {"pageviews": 635, "title": "Euonymus_hamiltonianus", "wikidata_id": "Q11340926", "rank": 474.0}, {"pageviews": 986, "title": "Butterfly_gardening", "wikidata_id": "Q3243966", "rank": 473.0}]

real	0m2.230s
user	0m0.006s
sys	0m0.009s

results when seed is not specified

$ time curl "https://recommendation-api-ng.discovery.wmnet:31443/types/translation/v1/articles?source=en&target=es&seed=&search=related_articles&application=CX"
[{"pageviews": 11635, "title": "Murder_of_Grace_Millane", "wikidata_id": "Q59600568", "rank": 489.0}, {"pageviews": 21420, "title": "Danielle_Lloyd", "wikidata_id": "Q2429311", "rank": 483.0}, {"pageviews": 17802, "title": "Joshua_Dobbs", "wikidata_id": "Q18637320", "rank": 469.0}, {"pageviews": 30073, "title": "Vikram_(2022_film)", "wikidata_id": "Q102147287", "rank": 465.0}, {"pageviews": 12294, "title": "Pigging", "wikidata_id": "Q338447", "rank": 459.0}, {"pageviews": 63601, "title": "Tiger_Nageswara_Rao", "wikidata_id": "Q118695775", "rank": 455.0}, {"pageviews": 19448, "title": "Kyle_Whittingham", "wikidata_id": "Q1795070", "rank": 441.0}, {"pageviews": 43083, "title": "Ravindra_Jadeja", "wikidata_id": "Q2721457", "rank": 440.0}, {"pageviews": 21970, "title": "57_Seconds", "wikidata_id": "Q111945315", "rank": 409.0}, {"pageviews": 14891, "title": "Patrick_Grant_(designer)", "wikidata_id": "Q7146595", "rank": 402.0}, {"pageviews": 17978, "title": "KL_Rahul", "wikidata_id": "Q6668646", "rank": 396.0}, {"pageviews": 35053, "title": "Shaun_King", "wikidata_id": "Q21012565", "rank": 377.0}]

real	0m3.849s
user	0m0.009s
sys	0m0.004s

Excellent thanks @kevinbazira ! I'm also seeing that the semantics for the &pageviews is that it defaults to true and any other value nulls it out, so that all looks good too.

calbon triaged this task as Medium priority.Nov 2 2023, 7:25 PM

kevinbazira mentioned this in T348156: Goal: Increase the number of models hosted on Lift Wing.Nov 14 2023, 4:01 PM

kevinbazira closed subtask T348607: Configure envoy settings to enable rec-api-ng container to access endpoints external to k8s/LiftWing as Resolved.Dec 19 2023, 6:39 AM

Thank you all for your help with this issue. The rec-api-ng internal endpoint failures were resolved. We will continue to monitor this issue in case there are any further updates to the envoy settings that may affect the internal endpoints.

kevinbazira moved this task from In Progress to 2023-2024 Q3 Done on the Machine-Learning-Team board.Dec 19 2023, 7:09 AM

Investigate recommendation-api-ng internal endpoint failureClosed, ResolvedPublicActions