Page MenuHomePhabricator

Upgrade xgboost in knowledge_integrity
Closed, ResolvedPublic

Description

Hi folks!

After upgrading to KServe 0.11 we started noticing a regression in performances, and after a long debugging session we ended up in:

https://github.com/dmlc/xgboost/pull/9651

More context in: T347550

The change was included in xgboost 2.0.1, so I was wondering if knowledge_integrity's xgboost dependency could be upgraded as well or if there are blockers.

Details

Due Date
Nov 30 2023, 5:00 AM

Event Timeline

Ilias is working in T349844 on a proposal for the upgrade, but if anybody from Research could do it beforehand we'd be grateful :)

fkaelin moved this task from Backlog to Staged on the Research board.
fkaelin set Due Date to Nov 30 2023, 5:00 AM.

Hi, I've opened an MR for dropping support for Python 3.7 in KI since this was already on the roadmap after its EOL in June 2023 and it also helps support this change (xgboost 2.x requires minimum Python 3.8).

fkaelin subscribed.

@isarantopoulos the python 3.8 is merged, and here is the MR for the xgboost bump.

Knowledge Integrity v0.5.0 has been released which now depends on xgboost 2.x. Upgrading xgboost also required serializing the classifier with the new version so the version for RevertRiskModel has also been bumped to v3. I've shared the model file with @achou. The SHA512 sum for RevertRiskModel v3 is:

fb6d76b105b7e8198cee47f779c69f1bd85be61075061665bdd0811e8d52e1d4f793dacb4a00fc3776ace8c518f1fa5f653879cec81777854cf235b0483156e7 *revert_risk_language_agnostic_model_v3.pkl

Going to resolve this now but please feel free to reopen if something does not look right!

Thanks a lot folks for this work! And also @MunizaA thanks a lot for the sha512! \o/

I verified the sha512 checksum. The model file has been uploaded to Swift:

aikochou@stat1005:~$ s3cmd -c /etc/s3cmd/cfg.d/ml-team.cfg ls s3://wmf-ml-models/revertrisk/language-agnostic/20231117132654/
2023-11-17 13:26       167741  s3://wmf-ml-models/revertrisk/language-agnostic/20231117132654/model.pkl

The model is also downloadable publicly at https://analytics.wikimedia.org/published/wmf-ml-models/revertrisk/language-agnostic/20231117132654/ (thanks @elukey for creating the script to automate this step; it works like a charm :)