We are currently calling the MW api in kserve with a blocking code, brought by the mwapi package. We should figure out if we could use some other way, like https://pypi.org/project/async-mediawiki/
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T272917 Lift Wing proof of concept | |||
Resolved | elukey | T296173 Load test the Lift Wing cluster | |||
Resolved | kevinbazira | T309623 Test async preprocess on kserve | |||
Resolved | achou | T313493 Add support for async session to python-mwapi |
Event Timeline
While working on another task, I had to check some details of how revscoring is currently handling http connections to the mw api. For T301878 it would be nice to make a single async HTTP call to the mw api, and share the results. As far as I can see, revscoring uses the mwapi behind the scenes and it doesn't seems to be configurable. There is a cache parameter in the api extractor that seems to indicate that if we provide some cache result (like the feature list for a certain revid etc..) the mwapi library may not be called, but after some tests with the revscoring lib on stat1004 I am not sure if it is viable or not.
Ideally we could think about adding async support to revscoring, but it may be a long endeavor.
I tested an async call for the description edit model https://gitlab.wikimedia.org/mnz/description-edits/-/tree/liftwing locally with docker (non-transformer arch). I modified the get_revision function in preprocess to a coroutine by replacing mwapi with a tornado async http call. Tested the latency using wrk and it seems improved.
On stat1004 I created this small script:
import revscoring from revscoring import Model from revscoring.extractors import api import mwapi import nltk import logging logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) nltk.download('stopwords') model_fd = open('model.bin', 'rb') model = Model.load(model_fd) s = mwapi.Session('https://en.wikipedia.org') extractor = api.Extractor(s) features = extractor.extract(123456, model.features) print(list(features))
With debug logs, I can see the following entries:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): en.wikipedia.org:443 DEBUG:urllib3.connectionpool:https://en.wikipedia.org:443 "GET /w/api.php?action=query&prop=revisions&revids=123456&rvslots=main&rvprop=comment%7Cuserid%7Ccontent%7Ctimestamp%7Cids%7Cuser%7Csize%7Ccontentmodel&format=json HTTP/1.1" 200 570 DEBUG:urllib3.connectionpool:https://en.wikipedia.org:443 "GET /w/api.php?action=query&prop=revisions&revids=123443&rvslots=main&rvprop=comment%7Cuserid%7Ccontent%7Ctimestamp%7Cids%7Cuser%7Csize%7Ccontentmodel&format=json HTTP/1.1" 200 575 DEBUG:urllib3.connectionpool:https://en.wikipedia.org:443 "GET /w/api.php?action=query&list=users&ususers=Fredbauder&usprop=gender%7Cregistration%7Ceditcount%7Cgroups&format=json HTTP/1.1" 200 187
In theory the above are all calls made to the MW API for a single API extract action (for a given rev-id, in this case 123456).
The api calls are all blocking, so when executed in the Tornado ioloop they will limit the concurrency of other users trying to score another rev-id for the same model.
This is the idea: if user1 wants to score rev-id 123456 for enwiki, they will cause several mwapi blocking http calls to be made in the tornado ioloop. If user2 and user3 come and ask for other scores, their preprocess code will cause other/similar HTTP calls via the mwapi (all blocking). With async, when any user's HTTP call is waiting to get the response from MWAPI, any other HTTP call (from other users) can be scheduled in the Tornado IOloop and increase performances.
In our case everything is blocking, and once a HTTP call to the mw api is made, the tornado ioloop needs to wait to get the response before scheduling another http call (unless we add more tornado ioloops to the pod, tuning the workers parameter, but it will fork more python processes etc.. not ideal if we can use async).
As I was thinking through async, I noticed the feature extraction code is being called twice here: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/revscoring/editquality/model-server/model.py#L47-L48
This will be a good starting point to refactor and reduce the number of blocking calls we make.
Had a quick chat with @achou and we agreed on this. Will start from here as we continue to make this async.
One clarification - let's try to time box the amount of time to spend on trying to run revscoring in async mode. I am a little skeptical that we'll be able to make it given the obstacles that we are seeing, so probably we'll need to have/test multiple workers in kserve. This will translate in a simpler architecture (still blocking sadly) that forks tornado async loops in multiple processes, to increase a little parallelism for a single pod.
This may be a problem due to https://github.com/kserve/kserve/issues/1759. In kserve 0.7, our current version, we cannot fork more than one tornado ioloops due to the aforementioned issue (kudos to Aiko for the finding). Migrating to kserve 0.8 may be a problem due to https://github.com/kserve/kserve/pull/1969, since the controller starts to use more up-to-date knative APIs (and we can't upgrade to knative 1.x until we upgrade k8s as well).
Maybe it could be possible to update kserve only on our Docker image, leaving the controller to 0.7, but the compatibility may not be 100%.
Change 805388 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):
[machinelearning/liftwing/inference-services@main] editquality: refactor setting of the HTTP host header into its own method
https://www.mediawiki.org/wiki/ORES/Feature_injection#Feature_injection:_playing_with_what_ORES_sees is really interesting, we could try to figure out how it works and see if we can do the same (thanks to Aaron for the link).
Me and Aiko investigated the possibility of moving mwapi to asyncio, but we concluded that it is not a viable road for us. Revscoring would need to be changed to support an event loop and coroutines, and ORES would probably need a change too.
We can try to see if feature injection helps us, and if it doesn't we'll just live with performance bottlenecks on revscoring-based models.
I think that I got, more or less, how the ORES feature injection works (https://www.mediawiki.org/wiki/ORES/Feature_injection#Feature_injection:_playing_with_what_ORES_sees).
This is the example code that I have on stat1004:
import revscoring from revscoring import Model from revscoring.extractors import api import mwapi import nltk import logging logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) nltk.download('stopwords') model_fd = open('model.bin', 'rb') model = Model.load(model_fd) s = mwapi.Session('https://en.wikipedia.org') extractor = api.Extractor(s) caches = { 123456: { "feature.english.badwords.revision.diff.match_delta_decrease": 0, "feature.english.badwords.revision.diff.match_delta_increase": 0, "feature.english.badwords.revision.diff.match_delta_sum": 0, "feature.english.badwords.revision.diff.match_prop_delta_decrease": 0.0, "feature.english.badwords.revision.diff.match_prop_delta_increase": 0.0, "feature.english.badwords.revision.diff.match_prop_delta_sum": 0.0, "feature.english.dictionary.revision.diff.dict_word_delta_decrease": -2, "feature.english.dictionary.revision.diff.dict_word_delta_increase": 2, "feature.english.dictionary.revision.diff.dict_word_delta_sum": 0, "feature.english.dictionary.revision.diff.dict_word_prop_delta_decrease": -2.0, "feature.english.dictionary.revision.diff.dict_word_prop_delta_increase": 2.0, "feature.english.dictionary.revision.diff.dict_word_prop_delta_sum": 0.0, "feature.english.dictionary.revision.diff.non_dict_word_delta_decrease": 0, "feature.english.dictionary.revision.diff.non_dict_word_delta_increase": 0, "feature.english.dictionary.revision.diff.non_dict_word_delta_sum": 0, "feature.english.dictionary.revision.diff.non_dict_word_prop_delta_decrease": 0.0, "feature.english.dictionary.revision.diff.non_dict_word_prop_delta_increase": 0.0, "feature.english.dictionary.revision.diff.non_dict_word_prop_delta_sum": 0.0, "feature.english.informals.revision.diff.match_delta_decrease": 0, "feature.english.informals.revision.diff.match_delta_increase": 0, "feature.english.informals.revision.diff.match_delta_sum": 0, "feature.english.informals.revision.diff.match_prop_delta_decrease": 0.0, "feature.english.informals.revision.diff.match_prop_delta_increase": 0.0, "feature.english.informals.revision.diff.match_prop_delta_sum": 0.0, "feature.len(<datasource.tokenized(datasource.revision.parent.text)>)": 223.0, "feature.len(<datasource.tokenized(datasource.revision.text)>)": 223.0, "feature.len(<datasource.wikitext.revision.markups>)": 16.0, "feature.len(<datasource.wikitext.revision.parent.markups>)": 16.0, "feature.len(<datasource.wikitext.revision.parent.uppercase_words>)": 0.0, "feature.len(<datasource.wikitext.revision.parent.words>)": 100.0, "feature.len(<datasource.wikitext.revision.words>)": 100.0, "feature.revision.comment.has_link": False, "feature.revision.comment.suggests_section_edit": False, "feature.revision.diff.longest_new_repeated_char": 1, "feature.revision.diff.longest_new_token": 1, "feature.revision.page.is_articleish": True, "feature.revision.page.is_draftspace": False, "feature.revision.page.is_mainspace": True, "feature.revision.user.has_advanced_rights": False, "feature.revision.user.is_admin": False, "feature.revision.user.is_anon": False, "feature.revision.user.is_bot": False, "feature.revision.user.is_curator": False, "feature.revision.user.is_patroller": False, "feature.revision.user.is_trusted": False, "feature.temporal.revision.user.seconds_since_registration": 12696497, "feature.wikitext.revision.chars": 621.0, "feature.wikitext.revision.diff.markup_delta_decrease": 0.0, "feature.wikitext.revision.diff.markup_delta_increase": 0.0, "feature.wikitext.revision.diff.markup_delta_sum": 0.0, "feature.wikitext.revision.diff.markup_prop_delta_decrease": 0.0, "feature.wikitext.revision.diff.markup_prop_delta_increase": 0.0, "feature.wikitext.revision.diff.markup_prop_delta_sum": 0.0, "feature.wikitext.revision.diff.number_delta_decrease": 0.0, "feature.wikitext.revision.diff.number_delta_increase": 0.0, "feature.wikitext.revision.diff.number_delta_sum": 0.0, "feature.wikitext.revision.diff.number_prop_delta_decrease": 0.0, "feature.wikitext.revision.diff.number_prop_delta_increase": 0.0, "feature.wikitext.revision.diff.number_prop_delta_sum": 0.0, "feature.wikitext.revision.diff.uppercase_word_delta_decrease": 0.0, "feature.wikitext.revision.diff.uppercase_word_delta_increase": 0.0, "feature.wikitext.revision.diff.uppercase_word_delta_sum": 0.0, "feature.wikitext.revision.diff.uppercase_word_prop_delta_decrease": 0.0, "feature.wikitext.revision.diff.uppercase_word_prop_delta_increase": 0.0, "feature.wikitext.revision.diff.uppercase_word_prop_delta_sum": 0.0, "feature.wikitext.revision.external_links": 0.0, "feature.wikitext.revision.headings": 0.0, "feature.wikitext.revision.parent.chars": 622.0, "feature.wikitext.revision.parent.external_links": 0.0, "feature.wikitext.revision.parent.headings": 0.0, "feature.wikitext.revision.parent.ref_tags": 0.0, "feature.wikitext.revision.parent.tags": 1.0, "feature.wikitext.revision.parent.templates": 0.0, "feature.wikitext.revision.parent.wikilinks": 7.0, "feature.wikitext.revision.ref_tags": 0.0, "feature.wikitext.revision.tags": 1.0, "feature.wikitext.revision.templates": 0.0, "feature.wikitext.revision.wikilinks": 7.0 } } features = extractor.extract(123456, model.features, caches=caches) print("\n\nRESULTS\n\n") for feature in features: print(feature)
The model.bin is s3://wmf-ml-models/goodfaith/enwiki/20220214192144/model.bin, meanwhile the list of features were retrieved in https://ores.wikimedia.org/v3/scores/enwiki/123456/goodfaith?features. With DEBUG logging enabled, I don't see any call to the mwapi when all the features are injected.
The main question: is the feature list consistent, for a given model, between scores? Namely, is there any chance that the list of features is different from a revision to the other (for the same model of course)? I'd say no, but I'll follow up with Aaron.
If the above is true, then we could do the following:
- For every model, we get the list of features that the api extractor would fetch from the mwapi.
- We call the MW API asynchronously from preprocess, fetching what we need.
- We then pass the features to predict.
The above in combination with ray workers should give us a serious boost in performances in my opinion. It could also open the way to having a full and separated transformer, but we'd need to look more into this.
@achou @kevinbazira @klausman @calbon thoughts?
Reporting a summary of what has been discussed over IRC. The extractor calculates most of the above features, so one way forward could be to instruct it to have a sort of http_cache parameter able to cache raw results from the MW API.
I investigated the async-mediawiki (aiowiki) library. I feel it is not a suitable replacement for the mwapi package. A few reasons:
- the purpose for this library is not to be a generic tool for MediaWiki API, so it only supports limited queries hard-coded to fetch certain data types: pages, wikitext, summary, images https://github.com/Gelbpunkt/aiowiki/blob/34f24bdcaedfd5da96cb6125714505945211287c/aiowiki/http.py#L91 https://github.com/Gelbpunkt/aiowiki/blob/34f24bdcaedfd5da96cb6125714505945211287c/aiowiki/http.py#L103, https://github.com/Gelbpunkt/aiowiki/blob/34f24bdcaedfd5da96cb6125714505945211287c/aiowiki/http.py#L115, https://github.com/Gelbpunkt/aiowiki/blob/34f24bdcaedfd5da96cb6125714505945211287c/aiowiki/http.py#L161
- it has very different usage from mwapi, see aiowiki example
- it doesn't support automated continuing queries, that is, when all the data is not returned in the response of a query, there will be a continue attribute to indicate that there is more data. To get further data, one should add its values to the original request. see mwapi example.
I'd be inclined to add async support on the mwapi that can be used in the future models on LiftWing.
@kevinbazira any thought?
@achou thank you for digging into the async-mediawiki library. Following yesterday's chat in the meeting, I wonder whether we would benefit more from adding async/await to the preprocess method in model.py or adding async support on the mwapi library — considering we wouldn't want to disrupt other mwapi library users that don't have our specific use-case.
@kevinbazira As I showed in the meeting, there will be an AsyncSession class for the async use case, so it won't disrupt other mwapi library users who are using the original Session objects. I created a pull request for mwapi https://github.com/mediawiki-utilities/python-mwapi/pull/46, please review it and suggestions are welcome :)
One thing to remember is that once async is used in model.py (so a function turns into a co-routine), then the code that it calls should be non-blocking as well, otherwise we wouldn't get any benefit. In this case, we have three main actors:
- model.py needs to set preprocess as async.
- the preprocess function needs to await on mediawiki calls to retrieve feature data, but at the moment this is not possible. Aiko is trying to resolve this via https://github.com/mediawiki-utilities/python-mwapi/pull/46. At the moment it is revscoring that calls directly the mw api via the mwapi code, that is blocking.
- we need to inject to the revscoring's api extractor the above http call results, and it will require a small code change. revscoring in this way doesn't need to be aware of things being async, and also it doesn't need to call the MW API as well. This should remove blocking HTTP API calls, and let our pods scale better (in theory).
Another experiment that I made in these days, namely adding a simple HTTP cache to revscoring: https://github.com/elukey/revscoring/commit/aaa8a59c6f25ff9ba5c6ac718010e0837cdd8d3d
import revscoring from revscoring import Model from revscoring.extractors import api import mwapi import nltk import logging logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) nltk.download('stopwords') model_fd = open('model.bin', 'rb') model = Model.load(model_fd) s = mwapi.Session('https://en.wikipedia.org') extractor = api.Extractor(s) http_cache = { "revisions": { (123456,): {'batchcomplete': '', 'query': {'pages': {'64216': {'pageid': 64216, 'ns': 0, 'title': 'Basic training', 'revisions': [{'revid': 123456, 'parentid': 123443, 'user': 'Fredbauder', 'userid': 744, 'timestamp': '2002-07-25T05:02:10Z', 'size': 621, 'slots': {'main': {'contentmodel': 'wikitext', 'contentformat': 'text/x-wiki', '*': "'''Basic Training''' or bootcamp is a short intensive program for induction of recruits into an [[army]] or [[navy]]. In the [[United States]] a few military bases have special units devoted to basic training. The [[jargon]] of the service is introduced as well as the fundamentals of military discipline and the recruits are trained in the basic skills of their service. A great deal of emphasis is placed on proper wearing of the uniform, grooming, and drill.\r\n\r\nArmy recruits are often instructed in the firing and care of [[gun|guns]]; Navy recruits get a short introduction to [[fire fighting]].\r\n\r\nSee [[saltpetre]]"}}, 'comment': ''}]}}}}, (123443,): {'batchcomplete': '', 'query': {'pages': {'64216': {'pageid': 64216, 'ns': 0, 'title': 'Basic training', 'revisions': [{'revid': 123443, 'parentid': 0, 'user': 'Fredbauder', 'userid': 744, 'timestamp': '2002-07-25T05:00:59Z', 'size': 622, 'slots': {'main': {'contentmodel': 'wikitext', 'contentformat': 'text/x-wiki', '*': "'''Basic Training''' or bootcamp is a short intensive program for induction of recruits into an [[army]] or [[navy]]. In the [[United States]] a few military bases have special units devoted to basic training. The [[jargon]] of the service is introduced as well as the fundamentals of military discipline and the recruits are trained in the basic skills of their service. A great deal of emphasis is place upon proper wearing of the uniform, grooming, and drill.\r\n\r\nArmy recruits are often instructed in the firing and care of [[gun|guns]]; Navy recruits get a short introduction to [[fire fighting]].\r\n\r\nSee [[saltpetre]]"}}, 'comment': 'basic indeed'}]}}}} }, "users": { ('Fredbauder',): {'batchcomplete': '', 'query': {'users': [{'userid': 744, 'name': 'Fredbauder', 'editcount': 2319, 'registration': '2002-02-28T06:13:53Z', 'groups': ['*', 'user', 'autoconfirmed'], 'gender': 'unknown'}]}} } } features = extractor.extract(123456, model.features, http_cache=http_cache) print("\n\nRESULTS\n\n") for feature in features: print(feature)
The above code works nicely (with the updated version of revscoring of course), no HTTP call is made. Before it, there were 3 MW API HTTP calls + the initial attempt to connect to the MW API from urllib. In theory we could use something like the above in kserve's preprocess to populate the http_cache via Aiko's mwapi AsyncSession module, to then inject the results directly into revscoring (that will not make any blocking calls to the API).
New version (still missing tests, will add them tomorrow): https://github.com/elukey/revscoring/commit/63a7fd1ec6040a5d11c5cae033b2baf13fa1076e
import revscoring from revscoring import Model from revscoring.extractors import api from revscoring.extractors.api import MWAPICache import mwapi import nltk import logging logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) nltk.download('stopwords') model_fd = open('model.bin', 'rb') model = Model.load(model_fd) s = mwapi.Session('https://en.wikipedia.org') extractor = api.Extractor(s) http_cache = MWAPICache() http_cache.add_revisions_batch_doc([123456], {'batchcomplete': '', 'query': {'pages': {'64216': {'pageid': 64216, 'ns': 0, 'title': 'Basic training', 'revisions': [{'revid': 123456, 'parentid': 123443, 'user': 'Fredbauder', 'userid': 744, 'timestamp': '2002-07-25T05:02:10Z', 'size': 621, 'slots': {'main': {'contentmodel': 'wikitext', 'contentformat': 'text/x-wiki', '*': "'''Basic Training''' or bootcamp is a short intensive program for induction of recruits into an [[army]] or [[navy]]. In the [[United States]] a few military bases have special units devoted to basic training. The [[jargon]] of the service is introduced as well as the fundamentals of military discipline and the recruits are trained in the basic skills of their service. A great deal of emphasis is placed on proper wearing of the uniform, grooming, and drill.\r\n\r\nArmy recruits are often instructed in the firing and care of [[gun|guns]]; Navy recruits get a short introduction to [[fire fighting]].\r\n\r\nSee [[saltpetre]]"}}, 'comment': ''}]}}}}) http_cache.add_revisions_batch_doc([123443], {'batchcomplete': '', 'query': {'pages': {'64216': {'pageid': 64216, 'ns': 0, 'title': 'Basic training', 'revisions': [{'revid': 123443, 'parentid': 0, 'user': 'Fredbauder', 'userid': 744, 'timestamp': '2002-07-25T05:00:59Z', 'size': 622, 'slots': {'main': {'contentmodel': 'wikitext', 'contentformat': 'text/x-wiki', '*': "'''Basic Training''' or bootcamp is a short intensive program for induction of recruits into an [[army]] or [[navy]]. In the [[United States]] a few military bases have special units devoted to basic training. The [[jargon]] of the service is introduced as well as the fundamentals of military discipline and the recruits are trained in the basic skills of their service. A great deal of emphasis is place upon proper wearing of the uniform, grooming, and drill.\r\n\r\nArmy recruits are often instructed in the firing and care of [[gun|guns]]; Navy recruits get a short introduction to [[fire fighting]].\r\n\r\nSee [[saltpetre]]"}}, 'comment': 'basic indeed'}]}}}}) http_cache.add_users_batch_doc(['Fredbauder'], {'batchcomplete': '', 'query': {'users': [{'userid': 744, 'name': 'Fredbauder', 'editcount': 2319, 'registration': '2002-02-28T06:13:53Z', 'groups': ['*', 'user', 'autoconfirmed'], 'gender': 'unknown'}]}}) features = extractor.extract(123456, model.features, http_cache=http_cache) print("\n\nRESULTS\n\n") for feature in features: print(feature)
Should be easier to use and to understand, lemme know :)
This has been worked on in various tasks, we decided to:
- add async support for the mwapi package
- move the outlink code (preprocess) to it.
- Move the revscoring model.py code (preprocess) to it using also the MWApiCache
Testing was done and we realize that async in preprocess works (namely scale) much better than regular blocking calls.
Change 805388 abandoned by Kevin Bazira:
[machinelearning/liftwing/inference-services@main] editquality: refactor setting of the HTTP host header into its own method
Reason:
refactored revscoring model servers into one common class