Page MenuHomePhabricator

Request to host article-country model on Lift Wing
Closed, ResolvedPublic14 Estimated Story Points

Description

What use case is the model going to support/resolve?
Expanding the region component of the articletopic taxonomy to be country-specific. This work is happening as part of the WE2.1.1 hypothesis in the FY 24-25 Annual Plan.

Note that I'm currently thinking of this as a stand-alone model as opposed to an adjustment to the articletopic model. I'm open to discussing if we want to merge the two though I assume because they operate in different ways, it's probably smarter to keep it more modular and separate.

Do you have a model card?
Yes: https://meta.wikimedia.org/wiki/Machine_learning_models/Proposed/Article_country

What team created/trained/etc.. the model? What tools and frameworks have you used?
Research -- very basic rule-based model at this stage so almost all the dependencies are actually in the form of API calls or pre-computed data that the model needs access to. The one exception is the shapely Python library for determining if a given lat-lon point is within a country.

What kind of data was the model trained with, and what kind of data the model is going to need in production (for example, calls to internal/external services, special datasources for features, etc..) ?
For an input Wikipedia article, the following API calls are needed:

  • Single call to Wikibase API for Wikidata properties (wbgetentities)
  • Single call to Mediawiki API for categories (categories)
  • Single call to Mediawiki API for pagelinks (links)

To transform these API results into predictions, the following data-dependencies are required (outside of a few hard-coded parameters in the code):

  • GeoJSON of countries for determining lat-lon from Wikidata -> country (25M)
  • TSV of categories for determining category -> country (2M)
  • Actual list of countries (54K)

As you can see, small footprint. Main challenge is just having a good way of updating them. The GeoJSON and list of countries should be pretty static but the TSV of categories is something that ideally could be updated on a regular cadence (open to discussion about what's feasible). Not listed above but probably should be separated out (as opposed to hard-coded in) is a simple dictionary of tf-idf transformation values for each of the ~250 countries. This is pretty static but also should probably be occasionally refreshed.

If you have a minimal codebase that you used to run the first tests with the model, could you please share it?
https://github.com/wikimedia/research-api-endpoint-template/blob/region-api/model/wsgi.py

There is one major change between the above API code and how I expect it would work on LiftWing. The model has a step where it gathers all the wikilinks in the article (as represented by their Wikidata IDs) and then maps them to whatever countries they are associated with in order to determine if there's any countries that are prevalent enough in the links to be elevated to a prediction. This mapping of link-QIDs -> countries requires having the groundtruth of the model available for all Wikipedia articles as a fast look-up. Otherwise to make a prediction for a single article, the model might have to generate predictions first for e.g., the 50 other articles it is linked to (which obviously is not feasible). In the API above, I have solved that by quite having a simple SQLite database of all the articles and their predicted countries that I use for this wikilink inference stage. That's only 715MB so not an awful dependency but large enough to not be ideal. In the LiftWing API, I was envisioning actually depending on the Search API to serve this purpose. The goal is to have a pipeline similar to articletopic that loads the country predictions into the Search index. Then with a single call to the Mediawiki API, we can gather these predictions for all of a page's links and use these instead of the static database dependency (example API call). This would require some collaboration with Search to hopefully do an initial loading of the Search index and decide on the tag name that we're going to use etc. but would greatly simplify the LiftWing component.

State what team will own the model and please share some main point of contacts (see more info in '''Ownership of a model''').
Research

What is the current latency and throughput of the model, if you have tested it?
Fast and I haven't done any optimization. The three separate API calls listed above could be done in parallel, which likely would help a bit.

Is there an expected frequency in which the model will have to be retrained with new data? What are the resources required to train the model and what was the dataset size?
Retraining really is just refreshing the two core data dependencies on occasion. I'm open to discussion but even every six months or year would be fine:

  • Category -> country mapping. This is something that I can work with Research Engineering to build an Airflow job for recalculating.
  • Country tfidf values. Same as above, just a simple data job that I can work with Research Engineering to make into an Airflow job.

Have you checked if the output of your model is safe from a human rights point of view? Is there any risk of it being offensive for somebody? Even if you have any slight worry or corner case, please tell us!
More details in the model card but a quick summary. The model is combining three signals. Two of these (Wikidata properties and Wikipedia categories) are very "safe" in that they're just passing along decisions made by Wikimedians and the model doesn't add any further risk of harm. The model is inferring some countries from the wikilinks which does risk false positives that might be objectionable to some. To take a "trivial" case, the French article for meatball suggests that Tunisia is the associated country because of a Tunisian Cuisine navigation box with a ton of links that's at the bottom of the article. But in all these cases, we can at least point to which links lead to the inference so again I think the risk of offense is quite low. And the intended use of this model is as a filter so we won't be e.g., saying that "meatballs are Tunisian" but instead if someone requested a list of foods filtered to Tunisia, then meatball might show up on that list.

The other challenge is what is a "country". For this, we're using our official internal countries list that's based on ISO codes so at least it's defensible even if some might wish it to be slightly different.

Everything else that is relevant in your opinion.

  • The model isn't quite ready for deployment -- I'm working on evaluation at the moment. But I'm creating this task with the hope of opening up discussion on whether you all see any potential issues in hosting so we can hopefully address them early before committing to a specific approach. Ideally we would work on deployment towards the end of Q1 (late September).
  • I would also like a stream for this model to incorporate the predictions into the Search index. I probably lean towards adjusting the existing articletopic stream to also call this new model and then merge the predictions if that's possible. Otherwise it could be a separate stream.
  • Right now, the model outputs just the country name -- e.g., American Samoa. This is different from the articletopic models which actually output a hierarchy of names. The equivalent for this model would be e.g., Oceania.Polynesia.American Samoa. This would be a very minor change as there's a direct mapping between country names and their full continent.subcontinent.country name, but I'm not sure at this point which is preferred by the end users.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/deployment-chartsmaster+14 -0
operations/puppetproduction+18 -0
operations/deployment-chartsmaster+1 -1
operations/deployment-chartsmaster+2 -2
machinelearning/liftwing/inference-servicesmain+2 -1
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+8 -2
machinelearning/liftwing/inference-servicesmain+9 -7
machinelearning/liftwing/inference-servicesmain+6 -5
machinelearning/liftwing/inference-servicesmain+15 -0
operations/deployment-chartsmaster+41 -1
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+43 -0
machinelearning/liftwing/inference-servicesmain+38 -23
operations/deployment-chartsmaster+1 -1
machinelearning/liftwing/inference-servicesmain+17 -8
machinelearning/liftwing/inference-servicesmain+13 -12
machinelearning/liftwing/inference-servicesmain+43 -12
machinelearning/liftwing/inference-servicesmain+1 -1
machinelearning/liftwing/inference-servicesmain+1 -15
machinelearning/liftwing/inference-servicesmain+59 -0
operations/deployment-chartsmaster+2 -0
operations/deployment-chartsmaster+18 -0
machinelearning/liftwing/inference-servicesmain+45 -3
integration/configmaster+15 -0
machinelearning/liftwing/inference-servicesmain+107 -0
machinelearning/liftwing/inference-servicesmain+828 -0
integration/configmaster+0 -15
integration/configmaster+15 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1080592 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] locust: add article-country load test

https://gerrit.wikimedia.org/r/1080592

Change #1080592 merged by Kevin Bazira:

[machinelearning/liftwing/inference-services@main] locust: add article-country load test

https://gerrit.wikimedia.org/r/1080592

Thanks @kevinbazira !! Predictions are looking good. Some questions / thoughts:

  • I see the note about shifting to more async calls and I think that makes sense. The point_in_country method involves some local compute and the current API calls should all be quite fast so likely no noticeable effect right now but once we add in the wikilink support, that has the potential for being a slightly slower API call.
  • I think we can probably get rid of the support for someone including an article's QID in their request (code) just to simplify the code a little and reduce the possibility of weird behavior where someone accidentally gives the wrong QID for a given article. I originally included that option because the model was at that time operating purely with information from Wikidata (so it was feasible that someone would just want a result based on a Wikidata item and not one of the articles that it was linked from) but as the model has grown to include the category-based predictions and wikilink-based predictions, that makes less sense. I should have just removed it myself.
  • The main thing I assume if the next step is the event stream is to decide on the official response schema. I don't have strong feelings on the exact keys/structure to this but a few thoughts:
    • My default is to align with how the articletopic model works (example copied below). I usually try to include the "source" of the predictions in the response. For articletopic, that's just the Wikipedia article itself. For this model, it's both the Wikipedia article and it's QID, so we probably want to modify to have both "article": ... and "item": ...
    • Then the results is a list of predictions with confidence scores. In this case, we essentially have 100% confidence for outputs from Wikidata and categories but will have something closer to real confidence values for the wikilink-based predictions when they are incorporated. Given that we can directly track the exact source of each prediction without additional effort, I think that could be great information to include too. That had been under separate keys for each source but I wonder if we should merge it more directly. So maybe something like: "results": [{"country": "United States", "score": 1, "source": {"wikidata": ["P27"], "category": ["Category:21st-century American women writers"]}... ? Thoughts?
{
  "prediction": {
    "article": "https://en.wikipedia.org/wiki/Douglas_Adams",
    "results": [
      {
        "topic": "Culture.Media.Media*",
        "score": 0.6723417043685913
      },
      {
        "topic": "Culture.Biography.Biography*",
        "score": 0.5156299471855164
      }
    ]
  }
}

Change #1081809 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: remove support for QID input

https://gerrit.wikimedia.org/r/1081809

Change #1081809 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: remove support for QID input

https://gerrit.wikimedia.org/r/1081809

Hey @kevinbazira -- I just discovered a failure mode in my original code. When an item lacks a Wikidata item, an exception gets thrown that isn't caught because the model expects the claims parameter to be iterable but it's actually just None. I think this is an easy fix: change the fall-back from claims = None to claims = {} in this line: https://github.com/wikimedia/machinelearning-liftwing-inference-services/blob/main/src/models/article_country/model_server/utils.py#L149

I don't know that this example will continue to work (eventually it might get a Wikidata item) but for now:

isaacj@stat1008:~$ time curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-country:predict" -X POST -d '{"lang": "en", "title": "Battle of Kotli"}' -H  "Host: article-country.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1
{"error":"TypeError : argument of type 'NoneType' is not iterable"}
real	0m0.257s
user	0m0.016s
sys	0m0.005s

For checking, here's the result from my API (which I just fixed): https://wiki-region.wmcloud.org/regions?lang=en&title=Battle_of_Kotli

{"qid":null,"countries":["India","Pakistan"],"wikidata":[],"links":[{"country":"India","count":13,"prop-tfidf":0.37032013022246335},{"country":"Pakistan","count":9,"prop-tfidf":0.37174443841562665},{"country":"United Kingdom","count":2,"prop-tfidf":0.04740911557243624},{"country":"China","count":1,"prop-tfidf":0.029944384156266955}],"categories":[]}

Change #1083916 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: initialize claims as a dict

https://gerrit.wikimedia.org/r/1083916

Change #1083939 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: add support for async api calls

https://gerrit.wikimedia.org/r/1083939

Change #1083916 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: initialize claims as a dict

https://gerrit.wikimedia.org/r/1083916

Change #1083939 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: implement async call for get_claims()

https://gerrit.wikimedia.org/r/1083939

Change #1084766 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: implement async call for title_to_categories()

https://gerrit.wikimedia.org/r/1084766

Change #1084766 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: implement async call for title_to_categories()

https://gerrit.wikimedia.org/r/1084766

Change #1085350 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: implement async call for title_to_qid()

https://gerrit.wikimedia.org/r/1085350

Change #1085350 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: implement async call for title_to_qid()

https://gerrit.wikimedia.org/r/1085350

Change #1085570 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update article-country image in experimental ns

https://gerrit.wikimedia.org/r/1085570

Change #1085570 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update article-country image in experimental ns

https://gerrit.wikimedia.org/r/1085570

@Isaac thank you for testing the article-country experimental endpoint and sharing the issues you found. We have:

  • removed support for QID input
  • initialized claims as a dict
  • added support for async API calls

Regarding the article-country response schema, please see below the current vs proposed response schemas for both single-country and multi-country responses. Let us know whether we should proceed with implementing the proposed schemas.

1. Single-Country Responses

Current Schema
{
  "item": "Q72334",
  "countries": [
    "United States"
  ],
  "wikidata": [
    {
      "P27": {
        "0": "country of citizenship"
      },
      "country": "United States"
    }
  ],
  "categories": [
    {
      "country": "United States",
      "categories": "Category:21st-century American women writers"
    }
  ]
}
Proposed Schema
{
  "prediction": {
    "article": "https://en.wikipedia.org/wiki/Toni_Morrison",
    "wikidata_item": "Q72334",
    "results": [
      {
        "country": "United States",
        "score": 1,
        "source": {
            "wikidata_property": "P27",
            "categories": ["Category:21st-century American women writers"]
        }
      }
    ]
  }
}

2. Multi-Country Responses

Current Schema
{
  "qid": "Q3392",
  "countries": [
    "Democratic Republic of the Congo",
    "Egypt",
    "Eritrea",
    "Kenya",
    "South Sudan",
    "Sudan",
    "Tanzania",
    "Uganda"
  ],
  "wikidata": [
    {
      "P17": {
        "0": "country"
      },
      "country": "Sudan"
    },
    {
      "P17": {
        "0": "country"
      },
      "country": "Egypt"
    },
    {
      "P17": {
        "0": "country"
      },
      "country": "Uganda"
    },
    {
      "P17": {
        "0": "country"
      },
      "country": "South Sudan"
    },
    {
      "P17": {
        "0": "country"
      },
      "country": "Tanzania"
    },
    {
      "P17": {
        "0": "country"
      },
      "country": "Eritrea"
    },
    {
      "P17": {
        "0": "country"
      },
      "country": "Kenya"
    },
    {
      "P17": {
        "0": "country"
      },
      "country": "Democratic Republic of the Congo"
    },
    {
      "P625": "coordinate location",
      "country": "Sudan"
    }
  ],
  "categories": [
    {
      "country": "Sudan",
      "categories": "Category:Rivers of Sudan"
    },
    {
      "country": "Uganda",
      "categories": "Category:Rivers of Uganda"
    },
    {
      "country": "Egypt",
      "categories": "Category:Rivers of Egypt|Category:National parks of Egypt|Category:Water transport in Egypt"
    },
    {
      "country": "South Sudan",
      "categories": "Category:Rivers of South Sudan"
    }
  ]
}
Proposed Schema
{
  "prediction": {
    "article": "https://en.wikipedia.org/wiki/River_Nile",
    "wikidata_item": "Q3392",
    "results": [
      {
        "country": "Democratic Republic of the Congo",
        "score": 1, /* can't be one for all countries */
        "source": {
            "wikidata_property": "P17",
            "categories": []
        }
      },
      {
        "country": "Egypt",
        "score": 1,
        "source": {
            "wikidata_property": "P17",
            "categories": ["Category:Rivers of Egypt|Category:National parks of Egypt|Category:Water transport in Egypt"]
        }
      },
      {
        "country": "Eritrea",
        "score": 1,
        "source": {
            "wikidata_property": "P17",
            "categories": []
        }
      },
      {
        "country": "Kenya",
        "score": 1, /* can't be one here */
        "source": {
            "wikidata_property": "P17",
            "categories": []
        }
      },
      {
        "country": "South Sudan",
        "score": 1,
        "source": {
            "wikidata_property": "P17",
            "categories": ["Category:Rivers of South Sudan"]
        }
      },
      {
        "country": "Sudan",
        "score": 1,
        "source": {
            "wikidata_property": "P625",
            "categories": ["Category:Rivers of Sudan"]
        }
      },
      {
        "country": "Tanzania",
        "score": 1, /* can't be one here */
        "source": {
            "wikidata_property": "P17",
            "categories": []
        }
      },
      {
        "country": "Uganda",
        "score": 1,
        "source": {
            "wikidata_property": "P17",
            "categories": ["Category:Rivers of Uganda"]
        }
      }
    ]
  }
}

We have: removed support for QID input, initialized claims as a dict, added support for async API calls

@kevinbazira thanks!

Schemas are looking a good! A few quick notes/questions:

  • I saw the comment in the schema about /* can't be one for all countries */ -- can you tell me more about that? I will say that anything we put for category/wikidata-based scores is somewhat arbitrary because we don't have a model that's calibrated with confidence scores. I think it would be useful to more highly rank items with multiple supporting signals -- e.g., Egypt goes higher in the above example because it has 3 categories and 1 Wikidata property (4 pieces of support) whereas DRC only has a Wikidata property (1 piece of support). I'm not certain of the best way to do that. Even a single property really is strong evidence of a country's relevance so worth inclusion in the final output. Options:
    • We could allow the score variable to go higher than 1 (e.g., DRC is a 1 in the above example while Egypt is a 4). I don't love this though because it's not really the expected behavior for score when compared to other models and it's unbounded.
    • We could do the above but normalize so the highest-ranked one receives a 1 and lowest-ranked one receives a 0.5 (as the default cut-off we have for a prediction being relevant). So in that case, Egypt would be a 1 and DRC would be a 0.5 and Uganda (with a Wikidata property and category so 2 pieces of support) would be 0.66.
    • Something else?
  • Both wikidata_property and categories could be multiple values, so maybe we make both of them into lists? So taking the Egypt example above, it would be:
...
            "wikidata_properties": ["P17"],
            "categories": ["Category:Rivers of Egypt",
                           "Category:National parks of Egypt",
                           "Category:Water transport in Egypt"]
        }
...
  • As a note for later: we'll eventually have wikilinks as a signal too. For that, in theory we could literally list out all the links that pointed to a given country but I am worried that would start to be overly verbose (Wikidata propeties and categories will at most be a few for any given country but I could easily find examples with 50+ wikilinks that point to a given country). So instead maybe we just can put the amount of "support" (number between 0 and 1) there when that logic gets incorporated?

Change #1088214 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: update response schema

https://gerrit.wikimedia.org/r/1088214

Change #1088214 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: update response schema

https://gerrit.wikimedia.org/r/1088214

Change #1089646 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: normalize score based on categories and properties

https://gerrit.wikimedia.org/r/1089646

Change #1089646 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: normalize score based on categories and properties

https://gerrit.wikimedia.org/r/1089646

Change #1093006 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update article-country response schema

https://gerrit.wikimedia.org/r/1093006

I saw the comment in the schema about /* can't be one for all countries */ -- can you tell me more about that?

I was wondering if all countries having a score of 1 indicates that they each hold equal relevance in the results. Given the expected behavior of the score in comparison to other models, how will users or tools be able to differentiate between the results?

Your suggestion to rank items based on multiple supporting signals (categories and Wikidata properties), is a good solution. We have implemented it using the following steps:

  1. First, we calculate the sum of the number of categories and the number of Wikidata properties. For example:
    • sum_egypt: 3+1 = 4
    • sum_drc: 0+1 = 1
    • sum_uganda: 1+1 = 2
  2. Next, we normalize the results so that the lowest score is 0.5 and the highest score is 1. For example:
    • normalizing [4, 1, 2] returns [1, 0.5, 0.66]

Below is the output with the normalized scores. As per your suggestion, the wikidata_properties and categories are returned with data types that support multiple values. model_name and model_version have been added to match other inference services. Please test the latest endpoint, and let us know whether we can proceed to production with this response schema.

1. Single-Country Response

kevinbazira@deploy2002:~$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-country:predict" -X POST -d '{"lang": "en", "title": "Toni_Morrison"}' -H  "Host: article-country.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1

{
  "model_name": "article-country",
  "model_version": "1",
  "prediction": {
    "article": "https://en.wikipedia.org/wiki/Toni_Morrison",
    "wikidata_item": "Q72334",
    "results": [
      {
        "country": "United States",
        "score": 1,
        "source": {
          "wikidata_properties": {
            "P27": {
              "0": "country of citizenship"
            }
          },
          "categories": [
            "Category:21st-century American women writers"
          ]
        }
      }
    ]
  }
}

2. Multi-Country Response

kevinbazira@deploy2002:~$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-country:predict" -X POST -d '{"lang": "en", "title": "River Nile"}' -H  "Host: article-country.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1

{
  "model_name": "article-country",
  "model_version": "1",
  "prediction": {
    "article": "https://en.wikipedia.org/wiki/River Nile",
    "wikidata_item": "Q3392",
    "results": [
      {
        "country": "Sudan",
        "score": 0.8333333333333333,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            },
            "P625": {
              "0": "coordinate location"
            }
          },
          "categories": [
            "Category:Rivers of Sudan"
          ]
        }
      },
      {
        "country": "Egypt",
        "score": 1,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": [
            "Category:Rivers of Egypt",
            "Category:National parks of Egypt",
            "Category:Water transport in Egypt"
          ]
        }
      },
      {
        "country": "Uganda",
        "score": 0.6666666666666666,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": [
            "Category:Rivers of Uganda"
          ]
        }
      },
      {
        "country": "South Sudan",
        "score": 0.6666666666666666,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": [
            "Category:Rivers of South Sudan"
          ]
        }
      },
      {
        "country": "Tanzania",
        "score": 0.5,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": []
        }
      },
      {
        "country": "Eritrea",
        "score": 0.5,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": []
        }
      },
      {
        "country": "Kenya",
        "score": 0.5,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": []
        }
      },
      {
        "country": "Democratic Republic of the Congo",
        "score": 0.5,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": []
        }
      }
    ]
  }
}

Change #1093006 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update article-country response schema

https://gerrit.wikimedia.org/r/1093006

Change #1098414 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: deploy article-country to the article-models ns

https://gerrit.wikimedia.org/r/1098414

Change #1098414 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deploy article-country to the article-models ns

https://gerrit.wikimedia.org/r/1098414

Thanks @kevinbazira! Some feedback but this is a big improvement and thanks for implementing the logic:

  • I get {"error":"ValueError : min() arg is an empty sequence"} for articles that lack countries. I think this is from the normalize_sums function so I presume you'll want to skip it when there aren't results.
  • Question though I defer to you on schema choices: why are the Wikidata properties represented as nested dictionaries while categories represented as a list? Personally the latter (a list of values) feels more compact and I think it's nice to have consistency in how they're represented but I'm open to whatever you think is best. So I guess I was expecting wikidata_properties in the below example to be something like "wikidata_properties": ["P17"] or maybe "wikidata_properties": [{"P17":"country"}] if you think it's important to preserve the labels too.
"source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": [
            "Category:Rivers of Egypt",
            "Category:National parks of Egypt",
            "Category:Water transport in Egypt"
          ]
        }
  • It would be nice to sort the results so they're ranked by score in update_scores. So in the multi-country example you gave, Sudan and Egypt would swap positions. I think Python's standard sorted function will do the trick.
  • Sorry this is me changing the logic but I'm realizing that instead of always setting the highest support as 1 and lowest support as 0.5, we should instead freeze the minimum support at 1 piece of evidence. So instead of min_val, max_val = min(sums), max(sums) in normalize_sums, you'd do: min_val, max_val = 1, max(sums). Motivating example was Rio Grande article below where Mexico has 2 pieces of supporting evidence and US has 3 but currently the resulting scores suggest they're pretty far apart (my suggested fix would switch them from [0.5, 1] to [0.75, 1]):
$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-country:predict" -X POST -d '{"lang": "en", "title": "Rio Grande"}' -H  "Host: article-country.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1

{
  "model_name": "article-country",
  "model_version": "1",
  "prediction": {
    "article": "https://en.wikipedia.org/wiki/Rio Grande",
    "wikidata_item": "Q160636",
    "results": [
      {
        "country": "United States",
        "score": 1.0,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            },
            "P625": {
              "0": "coordinate location"
            }
          },
          "categories": [
            "Category:Mexico\u2013United States border"
          ]
        }
      },
      {
        "country": "Mexico",
        "score": 0.5,
        "source": {
          "wikidata_properties": {
            "P17": {
              "0": "country"
            }
          },
          "categories": [
            "Category:Rivers of Mexico"
          ]
        }
      }
    ]
  }
}

Change #1098901 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: sort results by score

https://gerrit.wikimedia.org/r/1098901

Change #1098901 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: sort results by score

https://gerrit.wikimedia.org/r/1098901

Change #1099158 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: handle empty country results gracefully

https://gerrit.wikimedia.org/r/1099158

Change #1099524 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: return wikidata_properties as a list

https://gerrit.wikimedia.org/r/1099524

Change #1099158 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: handle empty country results gracefully

https://gerrit.wikimedia.org/r/1099158

Change #1100009 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: normalize sums using a fixed minimum sum of 1

https://gerrit.wikimedia.org/r/1100009

Change #1099524 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: return wikidata_properties as a list

https://gerrit.wikimedia.org/r/1099524

Change #1100009 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: normalize sums using a fixed minimum sum of 1

https://gerrit.wikimedia.org/r/1100009

Change #1100571 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update article-country image in experimental ns

https://gerrit.wikimedia.org/r/1100571

@Isaac thank you for sharing feedback and suggesting improvements. We have:

  • handled empty article-country results gracefully
  • returned wikidata_properties as a list of dicts
  • sorted results by score
  • normalized using a fixed minimum sum of 1

Below is a response from the latest article-country experimental endpoint with the above changes:

kevinbazira@deploy2002:~$ curl "https://inference-staging.svc.codfw.wmnet:30443/v1/models/article-country:predict" -X POST -d '{"lang": "en", "title": "Rio Grande"}' -H  "Host: article-country.experimental.wikimedia.org" -H "Content-Type: application/json" --http1.1

{
    "model_name": "article-country",
    "model_version": "1",
    "prediction": {
        "article": "https://en.wikipedia.org/wiki/Rio Grande",
        "wikidata_item": "Q160636",
        "results": [
            {
                "country": "United States",
                "score": 1.0,
                "source": {
                    "wikidata_properties": [
                        {
                            "P17": "country"
                        },
                        {
                            "P625": "coordinate location"
                        }
                    ],
                    "categories": [
                        "Category:Mexico\u2013United States border"
                    ]
                }
            },
            {
                "country": "Mexico",
                "score": 0.75,
                "source": {
                    "wikidata_properties": [
                        {
                            "P17": "country"
                        }
                    ],
                    "categories": [
                        "Category:Rivers of Mexico"
                    ]
                }
            }
        ]
    }
}

Please test this endpoint, and let us know whether we can proceed to production. Thanks!

Change #1100571 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update article-country image in experimental ns

https://gerrit.wikimedia.org/r/1100571

@kevinbazira one last thing and then I think good to move onto the last stage: English seems to be hard-coded as the language of the article in the output schema (code) but we do want that to reflect the language of the input article. That's just a minor thing and not affecting the actual predictions but I noticed it when checking other languages so wanted to raise. Once that little bug is handled, I'm happy to proceed to production and figuring out the next steps with the stream+wikilinks. Thanks as always for your work moving this forward!

Change #1101375 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[machinelearning/liftwing/inference-services@main] article-country: reflect input language in the response

https://gerrit.wikimedia.org/r/1101375

Change #1101375 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] article-country: reflect input language in the response

https://gerrit.wikimedia.org/r/1101375

Change #1101741 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update article-country deployment in the experimental ns

https://gerrit.wikimedia.org/r/1101741

Change #1101743 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] ml-services: update article-country deployment in the article-models ns

https://gerrit.wikimedia.org/r/1101743

Change #1101743 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update article-country deployment in the article-models ns

https://gerrit.wikimedia.org/r/1101743

Change #1101741 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update article-country deployment in the experimental ns

https://gerrit.wikimedia.org/r/1101741

Change #1102150 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/deployment-charts@master] APIGW: Add configuration to expose LW isvc article-country

https://gerrit.wikimedia.org/r/1102150

Change #1102201 had a related patch set uploaded (by Kevin Bazira; author: Kevin Bazira):

[operations/puppet@production] httpbb: add post deployment tests for the article-country endpoint

https://gerrit.wikimedia.org/r/1102201

Change #1102201 merged by Klausman:

[operations/puppet@production] httpbb: add post deployment tests for the article-country endpoint

https://gerrit.wikimedia.org/r/1102201

Change #1102150 merged by jenkins-bot:

[operations/deployment-charts@master] APIGW: Add configuration to expose LW isvc article-country

https://gerrit.wikimedia.org/r/1102150

@Isaac, thank you for the confirmation. The article-country inference service is now live in LiftWing production. It can be accessed through:
1.External endpoint:

$ curl "https://api.wikimedia.org/service/lw/inference/v1/models/article-country:predict" -X POST -d '{"lang": "en", "title": "Toni_Morrison"}' -H "Content-Type: application/json" --http1.1

2.Internal endpoint:

$ curl "https://inference.svc.codfw.wmnet:30443/v1/models/article-country:predict" -X POST -d '{"lang": "en", "title": "Toni_Morrison"}' -H  "Host: article-country.article-models.wikimedia.org" -H "Content-Type: application/json" --http1.1

3.Documentation:

As we prepare to work on the stream, please let us know in case there are any edge cases we may have missed. :)

So exciting to see -- thanks @kevinbazira ! Sounds like we can also now move the model card from Proposed to Production :) I've updated it so it refers to the new API Gateway documentation too.

As we prepare to work on the stream, please let us know in case there are any edge cases we may have missed. :)

Will do -- let me know if there's anything I can do to support the stream but hopefully much of the logic can be copied from the articletopic stream. Once the task is moving, I'll start coordinating with Search on the ingestion side too.