The goal of this task is to get a working, query-able KubeFlow instance that replicates the functionality of our article topic models.
For example, if you provide a revision ID to ORES, it gives you a topic prediction. https://en.wikipedia.org/w/index.php?title=Ann_Bishop_(biologist)&oldid=937084701 links to a specific version of the article about Ann Bishop. We can provide that ID (937084701) to ORES and ask for a topic prediction with this query: https://ores.wikimedia.org/v3/scores/enwiki/937084701/articletopic
We get a result that looks like this:
[...] "prediction": [ "Culture.Biography.Biography*", "Culture.Biography.Women", "History and Society.History", "STEM.Medicine & Health", "STEM.STEM*" ], "probability": { "Culture.Biography.Biography*": 0.9897817848322134, "Culture.Biography.Women": 0.9723014590702798, "Culture.Food and drink": 0.00035026153227330815, "Culture.Internet culture": 0.00018265332725578013, "Culture.Linguistics": 0.000538261749894609, "Culture.Literature": 0.03619774139697521, "Culture.Media.Books": 0.002291345684133623, "Culture.Media.Entertainment": 0.0006008800036119385, "Culture.Media.Films": 0.000301603482794946, "Culture.Media.Media*": 0.006322510257768613, "Culture.Media.Music": 0.0003659734842755226, "Culture.Media.Radio": 9.197231374768e-05, "Culture.Media.Software": 0.00015020779276450603, "Culture.Media.Television": 0.00030685931095892125, "Culture.Media.Video games": 3.5082704979932084e-05, "Culture.Performing arts": 0.0010131663084430853, "Culture.Philosophy and religion": 0.07665653521788206, "Culture.Sports": 0.0047852147462091885, "Culture.Visual arts.Architecture": 0.0016610955992546084, [...]
Our pipeline looks roughly like this:
- Model
- Extracted features and labels