In the initial prototype of the article-country inference service, developed by the Research team, predictions were generated using 3 components (Wikidata Properties, Categories, and Wikilinks) to determine the country(ies) associated with a Wikipedia page. This prototype used a ~715MB SQLite database to manage Wikilink-related predictions, leading to a static and sizable dependency.
In T371897, the ML team productionized the article-country isvc and deployed it on LiftWing. This production version relies on 2 components (Wikidata Properties and Categories) to make predicitons. It integrates predictions into the Wikipedia Search index through the mediawiki.cirrussearch.page_weighted_tags_change.rc0 event stream, as shown in T382295.
In order for production version of this service to use Wikilinks to make predictions, we are going to use the classification.prediction.articlecountry weighted tags from the Wikipedia Search index instead of the static SQLite database dependency. This approach will enable the inference service to fetch up-to-date Wikilink predictions.
UPDATE
Following T385970#10548654, a meeting with the Search, Research, and ML teams (meeting notes) determined that relying on Wikilink predictions from the Wikipedia Search index is not viable because the cirrusdoc API is unstable.
Later on, the Research and ML teams evaluated the approaches shown below:
| # | Approach | Impact |
| 1. | Disable the Wikilinks feature temporarily | This would result in a loss of approximately 20–30% of predictions (varying by wiki). While it avoids instability issues, it sacrifices a significant portion of prediction coverage. |
| 2. | Revert to the static SQLite database (~715MB) | This option would restore most of the previous coverage despite being static. However, it introduces a larger dependency on LiftWing and poses challenges for timely updates, as current pipelines for this database are not optimal. |
| 3. | Conditionally use the Search API | This would involve using the Search API only when Wikidata/Categories predictions are unavailable, possibly via an initial GET request to verify the existence of a value. However, this still results in static Wikilink predictions and might lead to inconsistent or unexplained results. There was also skepticism about achieving an implementation acceptable to the Search Platform. |
After weighing these options, Option 2 was considered the most feasible. This approach aligns with practices already used by another model-server (reference-risk) that relies on a database dependency. The Research team has provided the SQLite database in P73436#294761, and we will integrate it into the article-country model-server to support Wikilink-related predictions.