Page MenuHomePhabricator

📊Add a list of instances to filter
Closed, ResolvedPublic3 Estimated Story Points

Description

Establish a filtered content list, and keep it under version control (this will act as a contract and allow change management/tracking

Acceptance criteria

  • A list of "instance of" items to be filtered is available under version control
  • All articles that match the filter list have been filtered out
  • Metrics for items that have been filtered out have been added in the data quality reports

Notes

  • This will require re-generating production datasets and shipping them to the service instances

Event Timeline

Clarakosi set the point value for this task to 3.

Is this the final ticket for removing disambiguation pages from the results of the api?

Is this the final ticket for removing disambiguation pages from the results of the api?

@Cparle Yes, it is. There is one more item to complete but you can track the WIP patch here: https://github.com/mirrys/ImageMatching/pull/14

We've implemented the filtering for disambiguation, lists, and year pages. Below you can find the early rough metrics for this filtering.

wikisnapshotFinal number of recordsNumber of records filtered out
0arwiki2021-0262828711687
1arzwiki2021-02798965732
2bnwiki2021-02418013483
3cebwiki2021-021360118159779
4cswiki2021-0220258521375
5dewiki2021-0213750530798
6enwiki2021-022713441348229
7eswiki2021-0268394245943
8euwiki2021-021191496991
9fawiki2021-0234475920839
10frwiki2021-02911595118009
11hewiki2021-028063210749
12huwiki2021-0218269317233
13hywiki2021-021041758011
14itwiki2021-02689371104302
15kowiki2021-0230455225170
16plwiki2021-0256221564970
17ptwiki2021-02561953798
18ruwiki2021-0257811356031
19srwiki2021-0213607922447
20svwiki2021-02165164192064
21trwiki2021-0216172611959
22ukwiki2021-0211538810742
23viwiki2021-029593659476

Thank you! That filters out a substantial number of articles -- it looks like this work will make a big difference!