It has been changed to default in celery 4 and has been recommended by celery experts both for security reasons and better debug (pickle is binary)
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ladsgroup | T178441 Upgrade celery to 4.1.0 for ORES | |||
Resolved | Ladsgroup | T206333 Change default serializer of celery from pickle to json | |||
Resolved | Ladsgroup | T206334 Make ScoreReqeust json seriablizable | |||
Resolved | Halfak | T215406 Rebuild all models with revscoring 2.3.3 | |||
Resolved | Ladsgroup | T215671 Implement better defaults for autolabel utlity |
Event Timeline
We need to do this but it got deprioritized over very much needed operational improvements.
Mentioned in SAL (#wikimedia-operations) [2018-11-28T10:11:25Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@9b9ba06]: T206333
Mentioned in SAL (#wikimedia-operations) [2018-11-28T10:26:13Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@9b9ba06]: T206333 (duration: 14m 48s)
The task tracker needs more work:
"Traceback (most recent call last):\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py\", line 50, in _reraise_errors\n yield\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py\", line 221, in dumps\n payload = encoder(data)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py\", line 69, in dumps\n **dict(default_kwargs, **kwargs))\n File \"/usr/lib/python3.5/json/__init__.py\", line 237, in dumps\n **kw).encode(obj)\n File \"/usr/lib/python3.5/json/encoder.py\", line 198, in encode\n chunks = self.iterencode(o, _one_shot=True)\n File \"/usr/lib/python3.5/json/encoder.py\", line 256, in iterencode\n return _iterencode(o, 0)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py\", line 59, in default\n return super(JSONEncoder, self).default(o)\n File \"/usr/lib/python3.5/json/encoder.py\", line 179, in default\n raise TypeError(repr(o) + \" is not JSON serializable\")\nTypeError: {<datasource.revision.parent.text>, <datasource.revision.page.namespace.id>, <datasource.revision.user.info.groups>, <datasource.revision.comment>, <datasource.revision.timestamp>, <datasource.revision.user.id>, <datasource.revision.user.info.registration>, <datasource.revision.text>} is not JSON serializable\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"./ores/wsgi/routes/v3/util.py\", line 101, in process_score_request\n score_response = scoring_system.score(score_request)\n File \"./ores/scoring_systems/scoring_system.py\", line 59, in score\n response = self._score(request)\n File \"./ores/scoring_systems/celery_queue.py\", line 195, in _score\n return super()._score(*args, **kwargs)\n File \"./ores/scoring_systems/scoring_system.py\", line 118, in _score\n inprogress_results=inprogress_results)\n File \"./ores/scoring_systems/celery_queue.py\", line 100, in _process_missing_scores\n request.to_json(), list(missing_models), rev_id, root_cache)\n File \"/srv/ores/venv/lib/python3.5/site-packages/celery/app/task.py\", line 412, in delay\n return self.apply_async(args, kwargs)\n File \"/srv/ores/venv/lib/python3.5/site-packages/celery/app/task.py\", line 535, in apply_async\n **options\n File \"/srv/ores/venv/lib/python3.5/site-packages/celery/app/base.py\", line 728, in send_task\n amqp.send_task_message(P, name, message, **options)\n File \"/srv/ores/venv/lib/python3.5/site-packages/celery/app/amqp.py\", line 552, in send_task_message\n **properties\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/messaging.py\", line 169, in publish\n compression, headers)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/messaging.py\", line 252, in _prepare\n body) = dumps(body, serializer=serializer)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py\", line 221, in dumps\n payload = encoder(data)\n File \"/usr/lib/python3.5/contextlib.py\", line 77, in __exit__\n self.gen.throw(type, value, traceback)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py\", line 54, in _reraise_errors\n reraise(wrapper, wrapper(exc), sys.exc_info()[2])\n File \"/srv/ores/venv/lib/python3.5/site-packages/vine/five.py\", line 178, in reraise\n raise value.with_traceback(tb)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py\", line 50, in _reraise_errors\n yield\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py\", line 221, in dumps\n payload = encoder(data)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py\", line 69, in dumps\n **dict(default_kwargs, **kwargs))\n File \"/usr/lib/python3.5/json/__init__.py\", line 237, in dumps\n **kw).encode(obj)\n File \"/usr/lib/python3.5/json/encoder.py\", line 198, in encode\n chunks = self.iterencode(o, _one_shot=True)\n File \"/usr/lib/python3.5/json/encoder.py\", line 256, in iterencode\n return _iterencode(o, 0)\n File \"/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py\", line 59, in default\n return super(JSONEncoder, self).default(o)\n File \"/usr/lib/python3.5/json/encoder.py\", line 179, in default\n raise TypeError(repr(o) + \" is not JSON serializable\")\nkombu.exceptions.EncodeError: {<datasource.revision.parent.text>, <datasource.revision.page.namespace.id>, <datasource.revision.user.info.groups>, <datasource.revision.comment>, <datasource.revision.timestamp>, <datasource.revision.user.id>, <datasource.revision.user.info.registration>, <datasource.revision.text>} is not JSON serializable\n"
but result backend definitely can go json and it actually smaller.
Change 477289 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] ores: Use json for result serializer
Change 477302 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/puppet@production] ores: Change result serializer to json
Change 477289 merged by Alexandros Kosiaris:
[operations/puppet@production] ores: Accept json for result serializer
Change 477302 abandoned by Ladsgroup:
ores: Change result serializer to json
Reason:
It's being done through deployment.
Mentioned in SAL (#wikimedia-operations) [2018-12-17T11:48:20Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@18d3657]: T206333 T211267
Mentioned in SAL (#wikimedia-operations) [2018-12-17T12:02:34Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@18d3657]: T206333 T211267 (duration: 14m 14s)
the last bit is root_caches which is a dictionary on its own:
{'datasource.revision.text': "Lots of text", 'datasource.revision.user.id': 66, 'datasource.revision.user.info.doc': {'groups': ['extendedconfirmed', '*', 'user', 'autoconfirmed'], 'editcount': 95329, 'registration': '2001-08-21T08:34:39Z', 'name': 'Bryan Derksen', 'gender': 'male', 'userid': 66}, 'datasource.revision.comment': 'homo sapiens link', 'datasource.revision.id': 34568, 'datasource.extractor.dependents': {<datasource.revision.text>, <datasource.revision.parent.text>, <datasource.revision.user.id>, <datasource.revision.comment>, <datasource.revision.user.info.registration>, <datasource.revision.timestamp>, <datasource.revision.page.namespace.id>, <datasource.revision.user.info.groups>}, 'datasource.revision.doc': {'revid': 34568, 'page': {'ns': 0, 'title': 'Neanderthal', 'pageid': 27298083},'slots': {'main': {'contentformat': 'text/x-wiki', '*': "Lots of text", 'contentmodel': 'wikitext'}}, 'user': 'Bryan Derksen', 'timestamp': '2002-03-21T18:03:09Z', 'comment': 'homo sapiens link', 'parentid': 34566, 'size': 2087, 'userid': 66}, 'datasource.revision.user.info.registration': Timestamp('2001-08-21T08:34:39Z'), 'datasource.revision.timestamp': Timestamp('2002-03-21T18:03:09Z'), 'datasource.revision.page.namespace.id': 0, 'datasource.revision.parent.doc': {'revid': 34566, 'page': {'ns': 0, 'title': 'Neanderthal', 'pageid': 27298083}, 'slots': {'main': {'contentformat': 'text/x-wiki', '*': "Lots of text"}
Three things here are not json-able: datasource.revision.user.info.registration, datasource.revision.user.info.registration and datasource.extractor.dependents I will give it a try.
Merged the revscoring change, but I moved this back to "Active" because I guess we'll need to test in ORES.
I'm rebuilding models of articlequality and put it in another branch, will deploy it on ores-staging and change the config to see how things would work.
I rebuilt all the models, uploaded them on ores-staging and changed the serialization to json, it's still not working. There's something missing here. I'm debugging.
Okay, It seems I made some progress:
Traceback (most recent call last): File "./ores/wsgi/routes/v3/util.py", line 101, in process_score_request score_response = scoring_system.score(score_request) File "./ores/scoring_systems/scoring_system.py", line 59, in score response = self._score(request) File "./ores/scoring_systems/celery_queue.py", line 196, in _score return super()._score(*args, **kwargs) File "./ores/scoring_systems/scoring_system.py", line 104, in _score request, missing_model_set_revs) File "./ores/scoring_systems/scoring_system.py", line 151, in _extract_root_caches model_set, rev_ids, injection_caches=request.injection_caches) File "./ores/scoring_context.py", line 173, in extract_root_dependency_caches for rev_id, (error, _) in zip(rev_ids, error_root_vals): File "/srv/ores/venv/lib/python3.5/site-packages/revscoring/extractors/api/extractor.py", line 110, in _extract_many if self.revision & all_dependents: File "/srv/ores/venv/lib/python3.5/site-packages/revscoring/dependencies/dependent.py", line 150, in __and__ return self._dependents.union(*self._dependent_sets) & other TypeError: unsupported operand type(s) for &: 'set' and 'list'
After merging the #429 PR on revscoring and releasing 2.3.2 Now I get this on ores-staging:
Traceback (most recent call last): File "/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py", line 50, in _reraise_errors yield File "/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py", line 221, in dumps payload = encoder(data) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py", line 69, in dumps **dict(default_kwargs, **kwargs)) File "/usr/lib/python3.5/json/__init__.py", line 237, in dumps **kw).encode(obj) File "/usr/lib/python3.5/json/encoder.py", line 198, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode return _iterencode(o, 0) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py", line 59, in default return super(JSONEncoder, self).default(o) File "/usr/lib/python3.5/json/encoder.py", line 179, in default raise TypeError(repr(o) + " is not JSON serializable") TypeError: <datasource.revision.comment> is not JSON serializable During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./ores/wsgi/routes/v3/util.py", line 101, in process_score_request score_response = scoring_system.score(score_request) File "./ores/scoring_systems/scoring_system.py", line 59, in score response = self._score(request) File "./ores/scoring_systems/celery_queue.py", line 196, in _score return super()._score(*args, **kwargs) File "./ores/scoring_systems/scoring_system.py", line 118, in _score inprogress_results=inprogress_results) File "./ores/scoring_systems/celery_queue.py", line 101, in _process_missing_scores request.to_json(), list(missing_models), rev_id, root_cache) File "/srv/ores/venv/lib/python3.5/site-packages/celery/app/task.py", line 412, in delay return self.apply_async(args, kwargs) File "/srv/ores/venv/lib/python3.5/site-packages/celery/app/task.py", line 535, in apply_async **options File "/srv/ores/venv/lib/python3.5/site-packages/celery/app/base.py", line 728, in send_task amqp.send_task_message(P, name, message, **options) File "/srv/ores/venv/lib/python3.5/site-packages/celery/app/amqp.py", line 552, in send_task_message **properties File "/srv/ores/venv/lib/python3.5/site-packages/kombu/messaging.py", line 169, in publish compression, headers) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/messaging.py", line 252, in _prepare body) = dumps(body, serializer=serializer) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py", line 221, in dumps payload = encoder(data) File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__ self.gen.throw(type, value, traceback) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py", line 54, in _reraise_errors reraise(wrapper, wrapper(exc), sys.exc_info()[2]) File "/srv/ores/venv/lib/python3.5/site-packages/vine/five.py", line 178, in reraise raise value.with_traceback(tb) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py", line 50, in _reraise_errors yield File "/srv/ores/venv/lib/python3.5/site-packages/kombu/serialization.py", line 221, in dumps payload = encoder(data) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py", line 69, in dumps **dict(default_kwargs, **kwargs)) File "/usr/lib/python3.5/json/__init__.py", line 237, in dumps **kw).encode(obj) File "/usr/lib/python3.5/json/encoder.py", line 198, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode return _iterencode(o, 0) File "/srv/ores/venv/lib/python3.5/site-packages/kombu/utils/json.py", line 59, in default return super(JSONEncoder, self).default(o) File "/usr/lib/python3.5/json/encoder.py", line 179, in default raise TypeError(repr(o) + " is not JSON serializable") kombu.exceptions.EncodeError: <datasource.revision.comment> is not JSON serializable
And this is the root_cache datasource:
{'datasource.revision.comment': 'Wiktionary', 'datasource.revision.text': "Like for hr:, I tried to put interwiki between the bosnian version and the others versions, in both direction for de:, en: and fr:. 'Hope it helps and I don't built to much bullshit... [[User:Traroth|Traroth]] 09:58, 12 Sep 2003 (CEST)\r\n\r\nMy other Wikipedia pages :\r\n\r\n[http://meta.wikipedia.org/wiki/User:Traroth Meta]\r\n[http://fr.wikipedia.org/wiki/Utilisateur:Traroth French]\r\n[http://en.wikipedia.org/wiki/User:Traroth English]\r\n[http://de.wikipedia.org/wiki/Benutzer:Traroth German]\r\n[http://hr.wikipedia.org/wiki/User:Traroth Croatian]\r\n[http://cz.wikipedia.org/wiki/Wikipedista:Traroth Czech]\r\n[http://wiktionary.org/wiki/User:Traroth Wiktionary]", 'datasource.revision.doc': {'parentid': 182, 'size': 653, 'page': {'pageid': 120, 'ns': 2, 'title': 'Korisnik:Traroth'}, 'userid': 10, 'comment': 'Wiktionary', 'slots': {'main': {'contentmodel': 'wikitext', 'contentformat': 'text/x-wiki', '*': "Like for hr:, I tried to put interwiki between the bosnian version and the others versions, in both direction for de:, en: and fr:. 'Hope it helps and I don't built to much bullshit... [[User:Traroth|Traroth]] 09:58, 12 Sep 2003 (CEST)\r\n\r\nMy other Wikipedia pages :\r\n\r\n[http://meta.wikipedia.org/wiki/User:Traroth Meta]\r\n[http://fr.wikipedia.org/wiki/Utilisateur:Traroth French]\r\n[http://en.wikipedia.org/wiki/User:Traroth English]\r\n[http://de.wikipedia.org/wiki/Benutzer:Traroth German]\r\n[http://hr.wikipedia.org/wiki/User:Traroth Croatian]\r\n[http://cz.wikipedia.org/wiki/Wikipedista:Traroth Czech]\r\n[http://wiktionary.org/wiki/User:Traroth Wiktionary]"}}, 'timestamp': '2003-09-16T10:57:52Z', 'revid': 212, 'user': 'Traroth'}, 'datasource.extractor.dependents': [<datasource.revision.comment>, <datasource.revision.text>, <datasource.revision.parent.text>, <datasource.revision.user.info.registration_str>, <datasource.revision.page.namespace.id>, <datasource.revision.user.info.groups>, <datasource.revision.timestamp_str>, <datasource.revision.user.id>], 'datasource.revision.parent.text': "Like for hr:, I tried to put interwiki between the bosnian version and the others versions, in both direction for de:, en: and fr:. 'Hope it helps and I don't built to much bullshit... [[User:Traroth|Traroth]] 09:58, 12 Sep 2003 (CEST)\r\n\r\nMy other Wikipedia pages :\r\n\r\n[http://meta.wikipedia.org/wiki/User:Traroth Meta]\r\n[http://fr.wikipedia.org/wiki/Utilisateur:Traroth French]\r\n[http://en.wikipedia.org/wiki/User:Traroth English]\r\n[http://de.wikipedia.org/wiki/Benutzer:Traroth German]\r\n[http://hr.wikipedia.org/wiki/User:Traroth Croatian]\r\n[http://cz.wikipedia.org/wiki/Wikipedista:Traroth Czech]", 'datasource.revision.user.info.registration_str': None, 'datasource.revision.page.namespace.id': 2, 'datasource.revision.parent.doc': {'parentid': 181, 'size': 599, 'page': {'pageid': 120, 'ns': 2, 'title': 'Korisnik:Traroth'}, 'userid': 10, 'comment': '', 'slots': {'main': {'contentmodel': 'wikitext', 'contentformat': 'text/x-wiki', '*': "Like for hr:, I tried to put interwiki between the bosnian version and the others versions, in both direction for de:, en: and fr:. 'Hope it helps and I don't built to much bullshit... [[User:Traroth|Traroth]] 09:58, 12 Sep 2003 (CEST)\r\n\r\nMy other Wikipedia pages :\r\n\r\n[http://meta.wikipedia.org/wiki/User:Traroth Meta]\r\n[http://fr.wikipedia.org/wiki/Utilisateur:Traroth French]\r\n[http://en.wikipedia.org/wiki/User:Traroth English]\r\n[http://de.wikipedia.org/wiki/Benutzer:Traroth German]\r\n[http://hr.wikipedia.org/wiki/User:Traroth Croatian]\r\n[http://cz.wikipedia.org/wiki/Wikipedista:Traroth Czech]"}}, 'timestamp': '2003-09-16T10:21:02Z', 'revid': 182, 'user': 'Traroth'}, 'datasource.revision.user.info.groups': ['*', 'user', 'autoconfirmed'], 'datasource.revision.user.info.doc': {'name': 'Traroth', 'userid': 10, 'gender': 'unknown', 'registration': None, 'editcount': 43, 'groups': ['*', 'user', 'autoconfirmed']}, 'datasource.revision.user.id': 10, 'datasource.revision.timestamp_str': '2003-09-16T10:57:52Z', 'datasource.revision.id': 212}
https://github.com/wikimedia/revscoring/pull/430 Should fix it ^ @Halfak I'm not sure if it's correct, please take a look.
Okay, after rebuilding bswiki models with revscoring 2.3.3, It works. Now I rebuild all models with that.
Change 492013 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/services/ores/deploy@master] Change default task serializer of celery from pickle to json
Change 492013 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Change default task serializer of celery from pickle to json
Mentioned in SAL (#wikimedia-operations) [2019-02-21T18:42:36Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@2d84709]: Change default task serializer of celery from pickle to json (T206333)
Mentioned in SAL (#wikimedia-operations) [2019-02-21T18:59:29Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@2d84709]: Change default task serializer of celery from pickle to json (T206333) (duration: 16m 54s)
Change 492033 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/services/ores/deploy@master] Drop accepting pickle altogether
Change 492033 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Drop accepting pickle altogether
Mentioned in SAL (#wikimedia-operations) [2019-02-21T19:52:43Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@5d937b1]: Drop accepting pickle altogether (T206333)
Mentioned in SAL (#wikimedia-operations) [2019-02-21T20:06:01Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@5d937b1]: Drop accepting pickle altogether (T206333) (duration: 13m 17s)