Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | achou | T314810 Deploy NSFW model to production | |||
Resolved | achou | T313526 Deploy NSFW model using experimental local docker kserve container |
Event Timeline
Current iteration of the model is on Github here: https://github.com/htried/Image-Content-Filtration/tree/statbox-retrain-test.
Training data was compiled using this Github repo (which pulls images largely from Reddit): https://github.com/alex000kim/nsfw_data_scraper, supplemented with ~2,000 potentially NSFW images from commons which were pulled based on their associated wikidata category tags. The github repo splits images into five categories: porn, hentai, sexy, neutral, and (SFW) drawings. For our purposes, {porn, hentai, sexy, commons} ➡️ NSFW (~12,000 images), and {neutral, drawings} ➡️ SFW (~35,000 images). These images were then split into non-overlapping train, validation, and test sets randomly (85%, 10%, and 5% of the data, respectively).
Following along with @Harshineesriram's previous work, I used an instance of Mobilenet that was pre-trained on Imagenet to enable faster convergence, and retrained the last few layers over 25 epochs. Training was conducted on stat1005 on CPU.
Test results on the 5% of images not used in training or validation (2,332 images total):
overall AUC: 0.9825 with threshold 0.9: accuracy: 0.9623 confusion matrix: tp: 587 fn: 47 fp: 41 tn: 1657 fpr: 0.0241 fnr: 0.0741 precision: 0.9347 recall: 0.9259 f1: 0.9303
Current weights are available on Github at trained/hal-retraining_run_3.h5, and evaluation of model on a proportionate test set of data that was removed from the dataset prior to training is available at notebooks/test_model.ipynb.
Also tagging @Milimetric on this task so that he can follow along/chime in.
Hi @Htriedman! This is the KServe documentation https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/KServe. There is a section called "New service". If you want we can plan a meeting to discuss about it. :)
In the documentation, there are steps that shows you how to deploy a model on a local docker instance of KServe. You can also have a look and try by yourself. If you have any question, please let me know.
Based on Hal's work https://gitlab.wikimedia.org/htriedman/image-content-filtration-serve, we are able to get the local docker instance up and running totally fine. The model seems to be working great locally and on our ml-sandbox clusters. The next step is to deploy the service on Lift Wing. I also added a parent task to track the production deployment on Lift Wing.