Page MenuHomePhabricator

Deploy NSFW model using experimental local docker kserve container
Closed, ResolvedPublic

Event Timeline

calbon added a subscriber: AikoChou.

Current iteration of the model is on Github here: https://github.com/htried/Image-Content-Filtration/tree/statbox-retrain-test.

Training data was compiled using this Github repo (which pulls images largely from Reddit): https://github.com/alex000kim/nsfw_data_scraper, supplemented with ~2,000 potentially NSFW images from commons which were pulled based on their associated wikidata category tags. The github repo splits images into five categories: porn, hentai, sexy, neutral, and (SFW) drawings. For our purposes, {porn, hentai, sexy, commons} ➡️ NSFW (~12,000 images), and {neutral, drawings} ➡️ SFW (~35,000 images). These images were then split into non-overlapping train, validation, and test sets randomly (85%, 10%, and 5% of the data, respectively).

Following along with @Harshineesriram's previous work, I used an instance of Mobilenet that was pre-trained on Imagenet to enable faster convergence, and retrained the last few layers over 25 epochs. Training was conducted on stat1005 on CPU.

Test results on the 5% of images not used in training or validation (2,332 images total):

overall AUC: 0.9825

with threshold 0.9:
accuracy:	0.9623

confusion matrix:
tp:	587	fn:	47
fp:	41	tn:	1657

fpr:		0.0241
fnr:		0.0741
precision:	0.9347
recall:		0.9259
f1:		0.9303

Current weights are available on Github at trained/hal-retraining_run_3.h5, and evaluation of model on a proportionate test set of data that was removed from the dataset prior to training is available at notebooks/test_model.ipynb.

Also tagging @Milimetric on this task so that he can follow along/chime in.

Hi @Htriedman! This is the KServe documentation https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/KServe. There is a section called "New service". If you want we can plan a meeting to discuss about it. :)

In the documentation, there are steps that shows you how to deploy a model on a local docker instance of KServe. You can also have a look and try by yourself. If you have any question, please let me know.

Based on Hal's work https://gitlab.wikimedia.org/htriedman/image-content-filtration-serve, we are able to get the local docker instance up and running totally fine. The model seems to be working great locally and on our ml-sandbox clusters. The next step is to deploy the service on Lift Wing. I also added a parent task to track the production deployment on Lift Wing.