Page MenuHomePhabricator

Prototype article importance metrics
Closed, ResolvedPublic

Description

Based on article importance definitions gathered as part of this work, devise metrics where possible to operationalize these definitions as metrics. Though this task is primarily concerned with designing and computing these metrics, I will approach it from the standpoint of how these metrics could be incorporated into an interface for ranking lists of Wikipedia articles -- e.g., as ranking criteria or filters.

Event Timeline

Weekly update: no progress yet.

Weekly update:

  • Attended Cluster H to get a sense of challenges that the community sees around identifying high impact topics. That session emphasized the importance of tools for identifying topics in a consistent way across wikis but also flexibility for local communities to define what is important to them. Good things to think about when operationalizing these definitions, which are building on English Wikipedia conversations.

Weekly update:

  • Prepared proposal for contractor support for these metrics: https://docs.google.com/document/d/1k-zWN-x0RWBvfr3Eec2tM2jQqW19F37RkeZDcUQhQRo/edit#
  • Update from previous quarter but I have a meeting setup with the Android team next week to discuss the equity results around SuggestedEdits
  • Nothing concrete but continued thinking about what these article importance metrics will look like in practice:
    • I'll want to combine the criteria identified from Vital Articles (historical significance, cultural significance, everyday relevance, general interest, breadth, non-genericism, non-redundancy, contribution to completeness, contribution to neutrality/equity) with themes that have arisen in the knowledge gaps taxonomy, movements strategy, and just general discussions/feedback around GapFinder, campaigns, etc. (gender, geography, topic, profession, various types of work needed, sections needed, real-world impact of misinformation, etc.). This is a lot obviously so some prioritization (har har) will probably have to be applied.
    • Some of these might be doable but pretty huge. For example, one approach to operationalizing "contribution to completeness" is to map Wikipedia articles to a hierarchy that groups articles by subject (but in a much more fine-grained manner than e.g., the ORES topic taxonomy). This is likely something like a cleaned version of the category network (e.g., all of the planets should be on the same level but the enwiki article for Jupiter is a member of several categories, only some of which are probably relevant: Solar System, Jupiter, Astronomical objects known since antiquity, Gas giants, Outer planets), parsing link templates (e.g., en:Template:United States topics), and/or sub-article relationships (e.g., History of the United States should be on the same level as Climate of the United States). None of those approaches are solved problems though. Assuming they were though, then operationalizing "contribution to completeness" would mean that if one of the articles in a level is recommended (or exists), then all should be.

Weekly update:

  • Presented to Android team and Research Weekly on equity analysis of SuggestedEdits: https://docs.google.com/presentation/d/1x5yjoq6RaDLdaUeuqx-j9SVK4hduDW5ypzAK9tWHP8M/edit?usp=sharing
  • Some good conversation came out of both of those meetings and a follow-up is set with JT/SN for next week to discuss further.
  • Some discussion around annual planning and what this project will look like next year. I need to sit down and do some more thinking about the trajectory though regardless this work touches a bunch of other projects (Knowledge Gaps, work around equity, any work with Campaigns) so I expect it will continue in some form.

Weekly update:

  • continued discussions around equity + content recommendations.
  • the article importance side of this will be my focus next week so hopefully good progress

Weekly update:

Thinking/planning: I took a step back to think about what I was trying to accomplish with this work and what has changed since the project was initially conceptualized almost a year ago. What started out mostly as a stand-alone research project now has more interconnections with other work going on. Very directly the Knowledge Gaps Taxonomy work (Content -- Important Topics) is closely related and the work from last quarter on our existing recommender systems has made clear there's broad interest in expanding this focus on content equity as a component of article importance. The use-cases for a more purposeful approach to article importance are also much clearer. There are clear learnings from the Campaigns work that it is not enough to identify what important topic spaces are -- e.g., Sustainable Development Goals -- but also tooling is needed to help participants find clear entry points that are relevant to their context. That is, article importance has both a more global component -- e.g., ability to identify articles that are relevant to topics that are deemed important such as gender equity or climate change -- and a more local component where individuals need to be able to filter down that global topic area to articles relevant to their particular context. I've also been taking cues from the Movement Strategy (Topics for Impact), which has emphasized the importance of not being rigid in what topics are considered important but building the tools for communities to define these at their level.

My early ideas for this work was that article importance was narrowly about how to rank a given set of articles by their priority for being high quality -- e.g., some version of PageRank for Wikipedia. Based on all of these developments/learnings, I'm recognizing just how contextual article importance is and that in many important use-cases such as recommender systems or campaigns, the specific ranking is less important than having the tools to correctly specify the topic space and filter down to a relevant set of articles. That is, for many use-cases, it's probably okay if the articles are presented in a random order so long as the population of articles being considered can be appropriately filtered to match an individual's or organizer's definition of importance. While this doesn't mean no ranking strategy is needed, it suggests that a simple ranking strategy would be effective if coupled with good filters to capture different aspects of importance / relevance.

Based on this (and the brainstorming described below), there are a couple of separate but related components to this work:

  • Useful topic taxonomies and tooling are essential to this work and not just a complementary project. There are a number of topic filters that are in various stages of development and should continue to be expanded upon. These are large projects though and won't be done in a single quarter. Examples:
    • People + Identity: gender is the most salient of these but various campaigns/wikiprojects have also focused on other identities such as race, age, sexuality, religion, or whether someone is indigenous. The Knowledge Gaps Taxonomy work will capture some of these.
    • Broad topic: the ORES topic taxonomy has worked well but we'll want to continue to expand our ability to support ad-hoc topic models and evolve the fixed taxonomy to meet community needs.
    • Geography: filtering by country is very common for campaigns and Wikiprojects. Work continues on building out a classifier for mapping articles to countries (T263646).
    • Profession: many people-based projects focus on a particular occupation area (e.g., scientists) or allow for filtering by occupation. When automatic, this is generally based on the Wikidata occupation properties. These have good coverage but would likely require some work to see whether they can be mapping into a simple taxonomy.
  • There are some additional basic research tools that I think will enable a number of additional filters.
    • Types of work needed -- i.e. not just about the subject of the article but what is known to be missing from it. Currently this mostly depends on the existence of templates and there are starts to expanding this -- e.g., citation needed prediction, link/image recommendation -- but more work will be needed.
    • Fine-grained matching of articles: aspects like completeness (e.g., if you have a high-quality article about Jupiter, you should have one about Saturn too) or identifying missing sections require the ability to identify articles that are closely related. Categories are probably a good approach for this.
  • For the actual ranking, sticking with simple approaches might be best. An initial idea that drove this project was the concept of misalignment -- i.e. content with higher demand but lower quality is higher priority to improve -- and this matches with some of the qualitative work around article importance that we've done under this project. I consider this to be the core article importance metric to be prototyped under this project that can then be combined with any of the filtering techniques to provide a good worklist.

Granular updates:

  • Gathered together potential importance criteria from Vital Articles analysis along with a few other sources (Movement Strategy, campaigns, WikiProjects) to help guide future prototypes
  • For each of the ~15 importance criteria, I identified sources that named it as a component of article importance and brainstormed how one might operationalize it
  • Write-up of the article importance work around Vital Articles continues with a hopeful submission to CSCW in April
  • Some of the identified missing tools will be addressed next year and the main focus of this work will be on the misalignment work with an understanding that that is incomplete as a tool without the additional fine-grained filters.

Weekly updates:

  • Spent some time on this but wasn't able to make much progress as part of the challenge was figuring out how to reconfigure my SWAP notebooks to be Newpyter friendly (the new setup). That's now figured out though so shouldn't cause any issues going forward
  • I'm working on how to generalize the quality model in English to other languages where we don't have access to quality scores/assessments for training a model. The data pipeline is in good shape but I'll have to find a good way to choose the model parameters and evaluate them

Weekly updates:

  • Prototyping notebook for computing all language-agnostic quality features for all wikis. Ran into some hiccups with the new Hadoop configuration and data (T278441 and T278551) but I think those are mostly figured out now.
  • For quality model, I'm testing capping features at their 95th percentile value and normalizing -- e.g., if only 5% of articles in a wiki have more than 10 images, then an article with 8 images should have the feature value 0.8 for images and an article with 12 images should have the feature value as 1 for images. All features then are bounded between 0 and 1, normalized with wiki-specific values, and hopefully using 95th-percentile removes extreme outliers.
    • I can already see some outliers -- e.g., wikis where almost no articles have references and therefore they all do "perfectly" on the reference metric. For something like references (or headings or images), might be some way to enforce a minimum threshold here too but that's something I'll return to later. Page length is very character-type dependent so that's hard to enforce a minimum threshold but thankfully text is the first thing an article gets so even the most-underdeveloped wikis have at least some slightly longer articles. See table below for 95th percentiles for each feature by wiki. What this makes most clear is that these quality scores will hopefully be useful for ranking articles w/i a wiki but definitely are not suited for comparing articles across a wiki.
+----------------+---------+------------------+------------------+---------------------+-----------------+
|wiki_db         |num_pages|95p_length        |95p_images        |95p_refs_per_len     |95p_headings     |
+----------------+---------+------------------+------------------+---------------------+-----------------+
|enwiki          |6260556  |26590.0           |14.0              |0.0028462998102466793|13.0             |
|cebwiki         |5546111  |9751.0            |5.0               |0.004062976130015236 |2.0              |
|svwiki          |3398512  |7820.0            |6.0               |0.004127115146512587 |4.0              |
|dewiki          |2543038  |18789.0           |13.0              |0.0021208907741251328|12.0             |
|frwiki          |2304459  |22453.0           |19.0              |0.00236352635790542  |16.0             |
|nlwiki          |2046933  |8214.0            |10.0              |0.0015220700152207   |6.0              |
|ruwiki          |1703247  |30631.0           |17.0              |0.0015754737066212476|11.0             |
|itwiki          |1677064  |19873.0           |25.0              |0.001964154186103609 |13.0             |
|eswiki          |1609903  |20725.0           |14.0              |0.0021014710297208045|12.0             |
|plwiki          |1460488  |13686.649999999907|16.0              |0.002200602270094973 |9.0              |
|warwiki         |1265000  |2600.0            |3.0               |0.005416666666666667 |2.0              |
|viwiki          |1261985  |8890.0            |7.0               |0.0016533480297602646|6.0              |
|jawiki          |1256122  |28894.949999999953|13.0              |0.002600780234070221 |17.0             |
|arzwiki         |1206509  |3523.0            |11.0              |0.0013989927252378287|6.0              |
|zhwiki          |1180437  |18080.199999999953|13.0              |0.0031185031185031187|12.0             |
|arwiki          |1104446  |15209.0           |17.0              |0.0017156262035643009|8.0              |
|ukwiki          |1077033  |20925.0           |16.0              |0.0012538470306622592|10.0             |
|ptwiki          |1058222  |18647.0           |16.0              |0.0024137314500268193|10.0             |
|fawiki          |771124   |10760.0           |9.0               |0.0013333333333333333|6.0              |
|cawiki          |672521   |13979.0           |16.0              |0.0026939655172413795|10.0             |
|srwiki          |643311   |41310.0           |10.0              |0.0012648621300278269|8.0              |
|idwiki          |563587   |11914.0           |9.0               |0.002188183807439825 |8.0              |
|nowiki          |551441   |10088.0           |11.0              |0.0023923444976076554|8.0              |
|kowiki          |534857   |14286.199999999953|11.0              |0.0017505251575472643|11.0             |
|fiwiki          |504132   |12951.0           |10.0              |0.0033068783068783067|9.0              |
|huwiki          |484462   |19272.949999999953|22.0              |0.0022579734688117415|12.0             |
|cswiki          |475348   |17763.649999999965|16.0              |0.0020060180541624875|12.0             |
|shwiki          |454726   |7485.0            |9.0               |0.0017528483786152498|6.0              |
|zh_min_nanwiki  |430768   |2052.0            |7.0               |9.519276534983341E-4 |4.0              |
|rowiki          |417277   |12521.199999999953|12.0              |0.0043397396156230625|9.0              |
|trwiki          |393166   |15913.0           |13.0              |0.003098373353989156 |9.0              |
|euwiki          |368295   |9312.0            |14.0              |0.002026342451874367 |11.0             |
|cewiki          |353901   |5616.0            |9.0               |8.411843876177658E-4 |5.0              |
|mswiki          |347140   |9075.0            |7.0               |0.0014955708095256356|7.0              |
|eowiki          |293038   |7605.149999999965 |12.0              |0.0012427506213753107|8.0              |
|hewiki          |289617   |24865.0           |19.0              |3.778575477045154E-4 |12.0             |
|hywiki          |281603   |21377.699999999895|11.0              |0.0017035775127768314|9.0              |
|bgwiki          |269533   |17974.399999999994|13.0              |0.0022271714922048997|9.0              |
|ttwiki          |265785   |4482.0            |8.0               |0.001771479185119575 |5.0              |
|dawiki          |265146   |11235.75          |11.0              |0.0019329896907216496|9.0              |
|azbwiki         |240077   |5616.0            |8.0               |0.0010755579456843238|4.0              |
|skwiki          |236047   |14462.0           |13.0              |0.001544799176107106 |9.0              |
|kkwiki          |232242   |6995.9499999999825|9.0               |0.0012376237623762376|5.0              |
|minwiki         |224563   |1895.0            |1.0               |6.706908115358819E-4 |3.0              |
|etwiki          |216990   |10074.0           |8.0               |0.003598740440845704 |9.0              |
|hrwiki          |210879   |11810.0           |9.0               |0.002366863905325444 |13.0             |
|bewiki          |201783   |15370.0           |12.0              |0.0017436791630340018|7.0              |
|ltwiki          |199105   |8301.0            |14.0              |0.008018327605956471 |7.0              |
|elwiki          |189017   |30724.0           |14.0              |0.001774622892635315 |12.0             |
|simplewiki      |183418   |9006.0            |8.0               |0.0023781212841854932|7.0              |
|azwiki          |178772   |11374.449999999983|11.0              |0.0029486099410278013|8.0              |
|glwiki          |171663   |15659.899999999994|13.0              |0.002008536279186543 |11.0             |
|slwiki          |171521   |14869.0           |14.0              |0.001836210062431142 |10.0             |
|urwiki          |163782   |8580.0            |9.0               |0.0011971268954509178|7.0              |
|nnwiki          |157495   |7672.0            |7.0               |0.0020026702269692926|7.0              |
|hiwiki          |149556   |22080.0           |8.0               |0.0012224938875305623|11.0             |
|kawiki          |149466   |19494.75          |9.0               |8.633093525179857E-4 |8.0              |
|thwiki          |142601   |36050.0           |24.0              |0.0013693940431359123|13.0             |
|uzwiki          |139916   |3220.25           |7.0               |0.001417434443656981 |4.0              |
|tawiki          |139574   |19923.350000000006|7.0               |0.0013231888852133643|9.0              |
|lawiki          |134936   |5280.5            |7.0               |0.0018399264029438822|7.0              |
|cywiki          |132611   |6856.5            |14.0              |0.0019814052735863436|6.0              |
|vowiki          |126354   |2383.0            |4.0               |0.0                  |2.0              |
|mkwiki          |113258   |23996.149999999994|14.0              |0.0013876040703052729|12.0             |
|astwiki         |108296   |28405.5           |20.0              |0.00215916101172116  |16.0             |
|zh_yuewiki      |107931   |5346.0            |7.0               |0.0014545454545454545|6.0              |
|lvwiki          |106263   |11747.799999999988|15.0              |0.0018944519621109607|8.0              |
|bnwiki          |104447   |33046.09999999996 |14.0              |0.001379462561386084 |15.0             |
|mywiki          |102874   |11730.699999999983|9.0               |9.562514941429596E-4 |6.0              |
|tgwiki          |102867   |6137.0            |7.0               |7.958615200955034E-4 |5.0              |
|afwiki          |96861    |11516.0           |10.0              |0.0017052115528082702|8.0              |
|mgwiki          |93801    |2533.0            |3.0               |0.0013333333333333333|4.0              |
|sqwiki          |91160    |10890.050000000003|8.0               |0.001743510267338241 |8.0              |
|ocwiki          |86657    |8357.199999999997 |16.0              |0.0019047619047619048|11.0             |
|bswiki          |85100    |15791.550000000032|19.0              |0.0031164298180004985|11.0             |
|ndswiki         |82479    |5244.099999999991 |4.0               |7.037297677691766E-4 |7.0              |
|kywiki          |80802    |5692.949999999997 |6.0               |7.91765637371338E-4  |4.0              |
|be_x_oldwiki    |73484    |15530.79999999993 |10.0              |9.628071176407659E-4 |9.0              |
|mlwiki          |73178    |22845.149999999994|11.0              |0.0015160703456640388|9.0              |
|newwiki         |73046    |7504.0            |2.0               |9.074410163339383E-4 |7.0              |
|tewiki          |70773    |23366.199999999983|6.0               |9.621552277100705E-4 |19.0             |
|mrwiki          |70768    |14733.299999999988|8.0               |6.969450574979672E-4 |8.0              |
|brwiki          |69389    |7422.0            |9.0               |0.0014914243102162564|12.0             |
|vecwiki         |67315    |2334.2999999999956|9.0               |0.0                  |3.0              |
|pmswiki         |65780    |1695.0            |18.0              |7.57002271006813E-4  |5.0              |
|jvwiki          |62818    |6619.0            |6.0               |0.0032485110990795887|6.0              |
|htwiki          |62483    |3078.0            |3.0               |5.302226935312832E-4 |11.0             |
|pnbwiki         |61321    |23046.0           |8.0               |0.0013583265417006249|10.0             |
|swwiki          |60857    |5313.199999999997 |6.0               |0.002028397565922921 |6.0              |
|suwiki          |60788    |4328.0            |3.0               |0.002044989775051125 |6.0              |
|lbwiki          |59368    |7296.299999999988 |8.0               |0.0023049061573921633|7.0              |
|tlwiki          |58487    |9116.699999999997 |8.0               |0.0022396416573348264|7.0              |
|bawiki          |55679    |19635.099999999984|12.0              |0.0011484823625922888|10.0             |
|gawiki          |54763    |5476.799999999988 |7.0               |0.0012492192379762648|6.0              |
|szlwiki         |53097    |2217.0            |7.0               |0.0040941658137154556|1.0              |
|iswiki          |52077    |9114.0            |7.0               |0.0019582245430809398|7.0              |
|cvwiki          |45779    |6885.0            |9.0               |0.0012590494176896443|7.0              |
|lmowiki         |45566    |3463.75           |9.0               |8.92458723784025E-4  |4.0              |
|fywiki          |45319    |10684.0           |11.0              |6.49772579597141E-4  |8.0              |
|scowiki         |42582    |8508.0            |13.0              |0.0021037868162692847|7.0              |
|wuuwiki         |41464    |1813.0            |2.0               |0.0                  |3.0              |
|diqwiki         |39948    |1657.0            |6.0               |0.00228310502283105  |5.0              |
|anwiki          |39551    |8743.5            |8.0               |0.0012547051442910915|9.0              |
|kuwiki          |38575    |4481.599999999991 |5.0               |0.0015012965743141805|5.0              |
|pawiki          |37310    |16650.94999999996 |8.0               |0.001718494271685761 |9.0              |
|yowiki          |33614    |3929.3499999999985|5.0               |0.0015200233849751534|3.0              |
|newiki          |32167    |16983.699999999997|10.0              |0.0013812154696132596|9.0              |
|barwiki         |31631    |10542.0           |8.0               |0.0014074595355383533|8.0              |
|iowiki          |30521    |4231.0            |5.0               |6.531678641410843E-4 |5.0              |
|guwiki          |29677    |14782.39999999998 |11.0              |0.0010964912280701754|7.0              |
|ckbwiki         |29093    |11310.999999999978|17.0              |0.0016591609386110452|7.0              |
|alswiki         |27698    |12433.949999999972|12.0              |0.0015005359056806003|11.0             |
|knwiki          |27608    |68769.79999999997 |10.0              |0.0010584250635055038|20.0             |
|nostalgiawiki   |27375    |6350.5999999999985|0.0               |0.0                  |0.0              |
|scnwiki         |26421    |3271.0            |3.0               |0.0                  |4.0              |
|bpywiki         |25249    |3852.0            |7.0               |6.379585326953748E-4 |16.0             |
|iawiki          |23127    |2885.4000000000015|5.0               |5.94883997620464E-4  |5.0              |
|quwiki          |23031    |5603.5            |19.0              |0.0011614401858304297|9.0              |
|mnwiki          |22141    |18483.0           |11.0              |0.0010256410256410256|9.0              |
|siwiki          |20628    |30549.549999999985|8.0               |0.0012360939431396785|11.0             |
|bat_smgwiki     |16997    |1563.199999999999 |3.0               |0.0                  |3.0              |
|nvwiki          |16651    |1792.0            |7.0               |0.0                  |1.0              |
|sdwiki          |15765    |11054.199999999993|6.0               |0.0020876826722338203|12.0             |
|xmfwiki         |15685    |10423.999999999996|13.0              |5.120327700972862E-4 |6.0              |
|orwiki          |15637    |16849.199999999975|9.0               |0.00203210729526519  |17.0             |
|cdowiki         |15513    |1943.3999999999996|4.0               |0.0                  |4.0              |
|amwiki          |15398    |4860.15           |5.0               |3.6589828027808267E-4|4.0              |
|ilowiki         |15390    |8313.899999999987 |11.0              |0.0027472527472527475|5.0              |
|gdwiki          |15332    |4277.449999999999 |10.0              |0.0024826216484607746|5.0              |
|yiwiki          |15223    |8061.5999999999985|10.0              |4.3122035360068997E-4|6.0              |
|napwiki         |14736    |2253.25           |5.0               |0.0                  |3.0              |
|sahwiki         |14565    |12254.399999999983|6.0               |0.0013713658804168952|7.0              |
|maiwiki         |14485    |8429.399999999998 |8.0               |0.00127000254000508  |18.0             |
|bugwiki         |14191    |625.0             |2.0               |0.0                  |1.0              |
|wawiki          |13891    |3705.0            |6.0               |0.0014326647564469914|8.0              |
|map_bmswiki     |13781    |1562.0            |3.0               |0.0012062726176115801|3.0              |
|hsbwiki         |13765    |4490.799999999996 |13.0              |0.0018281535648994515|8.0              |
|pswiki          |13671    |15108.0           |7.0               |0.0012755102040816326|7.0              |
|mznwiki         |13562    |4239.899999999998 |18.0              |8.992805755395684E-4 |6.0              |
|fowiki          |13559    |7349.299999999996 |11.0              |0.001984126984126984 |6.0              |
|liwiki          |13209    |9683.19999999999  |5.0               |4.655493482309125E-4 |8.0              |
|oswiki          |12942    |4609.0            |8.0               |0.0021691973969631237|4.0              |
|frrwiki         |12675    |5095.699999999993 |12.0              |0.002066115702479339 |5.0              |
|emlwiki         |12656    |6101.0            |7.0               |0.0012544428183148652|6.0              |
|avkwiki         |12420    |4466.049999999999 |22.0              |0.0                  |4.0              |
|acewiki         |12348    |1530.0            |3.0               |0.0011655011655011655|1.0              |
|gorwiki         |11864    |1892.7000000000007|3.0               |9.970089730807576E-4 |2.0              |
|bowiki          |11726    |29210.5           |6.0               |0.0                  |9.0              |
|sawiki          |11643    |24047.999999999993|7.0               |3.845660812716318E-4 |18.0             |
|bclwiki         |11011    |8848.0            |6.0               |0.0021008403361344537|12.0             |
|zh_classicalwiki|10666    |4717.75           |5.0               |0.002586206896551724 |5.0              |
|mrjwiki         |10527    |2959.0999999999967|4.0               |8.143322475570033E-4 |3.0              |
|mhrwiki         |10321    |10219.0           |9.0               |0.0014603870025556773|6.0              |
|hifwiki         |10125    |3477.9999999999964|5.0               |0.0017391304347826088|5.0              |
|kmwiki          |10107    |40989.799999999974|7.0               |8.85079424232543E-4  |12.0             |
|hakwiki         |9525     |2456.0            |5.0               |0.0                  |13.0             |
|roa_tarawiki    |9314     |4069.950000000006 |11.0              |8.305647840531562E-4 |5.0              |
|testwiki        |9227     |34031.49999999997 |3.0               |0.0034797738147020443|10.0             |
|pamwiki         |8985     |8398.399999999994 |9.0               |0.001095290251916758 |5.0              |
|crhwiki         |8895     |2320.2999999999993|6.0               |0.0                  |3.0              |
|hywwiki         |8853     |25214.79999999999 |11.0              |0.0016406890894175555|11.0             |
|shnwiki         |8798     |14827.999999999993|6.0               |6.142506142506142E-4 |32.0             |
|nsowiki         |8356     |1593.0            |4.0               |6.788866259334691E-4 |3.0              |
|aswiki          |8164     |29479.549999999967|12.0              |0.0017337699865151224|12.0             |
|ruewiki         |8073     |7817.199999999999 |5.0               |0.0017825311942959   |6.0              |
|sewiki          |7954     |2749.1999999999935|6.0               |0.0024706609017912293|5.0              |
|zuwiki          |7659     |2143.199999999999 |4.0               |0.00265017667844523  |2.0              |
|hawiki          |7616     |7525.5            |11.0              |0.0027397260273972603|7.0              |
|lijwiki         |7608     |4105.099999999995 |7.0               |4.6707146193367583E-4|12.0             |
|ugwiki          |7606     |23390.0           |3.0               |2.527805864509606E-4 |10.0             |
|bhwiki          |7437     |9285.2            |7.0               |0.001603592046183451 |5.0              |
|vlswiki         |7384     |6091.649999999995 |11.0              |2.722570106180234E-4 |6.0              |
|tkwiki          |7308     |8562.199999999983 |4.0               |4.780114722753346E-4 |5.0              |
|miwiki          |7205     |4047.19999999999  |6.0               |0.0                  |17.0             |
|nds_nlwiki      |7203     |6881.3999999999905|7.0               |6.839945280437756E-4 |8.0              |
|nahwiki         |7170     |2416.399999999994 |6.0               |6.553532008830022E-4 |5.0              |
|sowiki          |7137     |9013.399999999998 |13.0              |0.0015811109939917781|6.0              |
|scwiki          |7085     |7352.399999999998 |18.0              |9.111617312072893E-4 |6.0              |
|snwiki          |7074     |2337.149999999995 |2.0               |0.0                  |3.0              |
|vepwiki         |6658     |5851.15           |10.0              |7.102272727272727E-4 |7.0              |
|ganwiki         |6505     |1638.3999999999978|3.0               |0.0                  |9.0              |
|banwiki         |6475     |7931.099999999988 |8.0               |0.0026400704018773834|41.0             |
|glkwiki         |6455     |2643.2999999999993|3.0               |0.0011890606420927466|4.0              |
|myvwiki         |6408     |9142.699999999993 |11.0              |0.0017805915520823028|6.0              |
|abwiki          |6237     |1240.5999999999995|2.0               |0.0014858841010401188|15.0             |
|kabwiki         |6115     |2972.3            |7.0               |0.0016666666666666668|5.0              |
|cowiki          |5973     |5584.199999999997 |6.0               |6.169031462060457E-4 |8.0              |
|satwiki         |5862     |14998.199999999997|12.0              |0.0014102162331557505|11.0             |
|fiu_vrowiki     |5786     |2701.5            |4.0               |0.0                  |14.0             |
|iewiki          |5548     |2661.6499999999996|4.0               |0.0                  |3.0              |
|kvwiki          |5522     |8076.749999999999 |6.0               |2.1110407430863416E-4|13.0             |
|csbwiki         |5404     |2924.8499999999995|5.0               |3.69890882189754E-4  |5.0              |
|pcdwiki         |5172     |7273.699999999999 |15.0              |0.0014184397163120568|8.0              |
|aywiki          |5139     |5084.099999999999 |8.0               |7.358351729212656E-4 |16.0             |
|udmwiki         |5050     |6378.050000000006 |6.0               |3.497726477789437E-4 |4.0              |
|gvwiki          |5043     |6596.499999999998 |10.0              |0.0012391573729863693|5.0              |
|pagwiki         |4946     |1739.5            |3.0               |0.0                  |4.0              |
|zeawiki         |4774     |4115.699999999999 |5.0               |0.0                  |4.0              |
|lfnwiki         |4677     |7935.199999999999 |5.0               |0.0                  |6.0              |
|frpwiki         |4613     |3786.7999999999975|10.0              |0.003153153153153153 |24.0             |
|lowiki          |4584     |17982.59999999996 |11.0              |7.974481658692185E-4 |10.0             |
|nrmwiki         |4581     |2532.0            |8.0               |0.0                  |7.0              |
|kwwiki          |4539     |2764.399999999998 |6.0               |0.002188183807439825 |3.0              |
|dvwiki          |4314     |13393.449999999986|4.0               |7.471607890017931E-5 |8.0              |
|lezwiki         |4198     |16571.49999999999 |12.149999999999636|0.0011585248117397182|8.0              |
|gomwiki         |4195     |23509.59999999999 |5.0               |0.0024183796856106408|15.0             |
|gnwiki          |4134     |9312.849999999999 |8.0               |0.0021321961620469083|8.0              |
|mwlwiki         |4111     |28146.0           |12.0              |0.0022304832713754648|15.0             |
|stqwiki         |4107     |6119.999999999989 |7.0               |3.980891719745223E-4 |6.0              |
|olowiki         |3903     |2945.599999999995 |4.0               |0.0015538290788013318|4.0              |
|szywiki         |3858     |6944.149999999992 |3.0               |0.0                  |10.0             |
|mtwiki          |3772     |26253.199999999993|31.0              |0.0016535758577924762|14.0             |
|rmwiki          |3762     |33809.14999999996 |16.0              |8.264462809917355E-4 |21.0             |
|awawiki         |3710     |5931.899999999995 |3.0               |6.855816896951799E-4 |3.0              |
|dtywiki         |3604     |12130.849999999997|9.0               |0.0013297872340425532|7.0              |
|ladwiki         |3586     |6339.25           |13.0              |0.0017615971814445098|9.0              |
|bjnwiki         |3584     |6382.999999999998 |4.0               |0.0017118707537580912|5.0              |
|arywiki         |3571     |4697.0            |8.0               |0.0015284677111196026|4.0              |
|furwiki         |3556     |5025.25           |5.0               |0.0                  |10.0             |
|koiwiki         |3505     |8799.799999999988 |5.0               |2.090738030524775E-4 |13.0             |
|extwiki         |3420     |5466.3499999999985|7.0               |0.001558846453624318 |6.0              |
|angwiki         |3374     |5020.049999999999 |7.0               |0.001072194424588992 |4.0              |
|dsbwiki         |3311     |5341.0            |18.0              |0.002173913043478261 |8.0              |
|lnwiki          |3304     |2877.349999999999 |5.849999999999909 |0.001277139208173691 |6.0              |
|cbk_zamwiki     |3243     |3502.19999999999  |4.0               |3.696857670979667E-4 |4.0              |
|piwiki          |3216     |971.0             |1.0               |0.0                  |16.0             |
|tyvwiki         |3180     |16844.249999999993|7.0               |0.0012746972594008922|8.049999999999727|
|kshwiki         |2905     |4166.39999999999  |4.0               |0.0                  |7.0              |
|gagwiki         |2888     |6723.9000000000015|75.0              |0.0                  |6.0              |
|pflwiki         |2716     |6527.5            |9.0               |7.863401482812851E-4 |8.0              |
|avwiki          |2587     |11585.299999999988|11.0              |9.775171065493646E-4 |8.0              |
|hawwiki         |2429     |2823.5999999999995|5.0               |0.0013183915622940012|4.0              |
|lgwiki          |2425     |6294.7999999999965|2.0               |0.0                  |3.0              |
|gcrwiki         |2378     |1671.15           |3.0               |0.0                  |7.0              |
|xalwiki         |2321     |4637.0            |5.0               |0.0017775520568816658|4.0              |
|rwwiki          |2219     |4896.399999999998 |5.0               |0.0010857763300760044|6.0              |
|igwiki          |2214     |10077.499999999996|4.0               |0.002664890073284477 |8.0              |
|bxrwiki         |2198     |15595.600000000008|13.0              |0.0014595496246872393|7.0              |
|papwiki         |2193     |5324.4000000000015|5.0               |0.0011769321302471558|5.0              |
|zawiki          |2116     |938.5             |3.0               |0.0                  |4.0              |
|pdcwiki         |2103     |2463.5999999999976|4.0               |0.0                  |3.0              |
|krcwiki         |2074     |13038.949999999997|13.0              |7.125044531528322E-4 |7.0              |
|test2wiki       |2041     |10619.0           |4.0               |0.004032258064516129 |7.0              |
|kaawiki         |2040     |4963.599999999998 |6.0               |9.66183574879227E-4  |4.0              |
|kbpwiki         |1916     |3610.5            |5.0               |0.0                  |13.0             |
|arcwiki         |1811     |2383.5            |3.0               |0.0                  |2.0              |
|novwiki         |1801     |5714.0            |4.0               |0.0                  |7.0              |
|towiki          |1753     |2148.3999999999965|4.0               |0.0                  |3.0              |
|inhwiki         |1722     |7497.499999999991 |7.0               |0.001597444089456869 |4.0              |
|jamwiki         |1720     |2662.5499999999984|6.0               |0.0017574692442882249|1.0              |
|tcywiki         |1691     |18457.0           |6.0               |0.0011045655375552283|12.0             |
|wowiki          |1671     |8737.0            |8.0               |0.0013513513513513514|7.0              |
|tpiwiki         |1664     |3246.649999999999 |5.0               |3.6536353671903543E-4|2.0              |
|kiwiki          |1612     |933.5999999999985 |2.0               |0.0                  |1.0              |
|kbdwiki         |1612     |9988.349999999995 |12.0              |6.097560975609756E-4 |7.0              |
|tetwiki         |1586     |4703.0            |7.0               |0.0028462998102466793|4.0              |
|nawiki          |1580     |1338.2499999999998|5.0               |0.0                  |3.0              |
|akwiki          |1571     |2794.5            |2.0               |5.780346820809249E-4 |5.0              |
|atjwiki         |1470     |1783.55           |2.0               |0.0045871559633027525|4.0              |
|xhwiki          |1415     |7356.199999999998 |5.0               |0.0021413276231263384|6.0              |
|lldwiki         |1414     |7694.499999999986 |16.0              |0.001996007984031936 |6.0              |
|biwiki          |1407     |1145.7            |4.0               |0.0                  |1.0              |
|mdfwiki         |1355     |5854.399999999999 |5.0               |2.0132876988121604E-4|3.0              |
|mnwwiki         |1343     |60260.39999999997 |10.0              |0.0013064576334453153|15.0             |
|jbowiki         |1334     |9456.149999999983 |4.0               |0.0                  |7.0              |
|tywiki          |1332     |915.45            |3.4500000000000455|0.0                  |3.0              |
|roa_rupwiki     |1279     |2178.399999999998 |4.0               |1.144950767117014E-4 |9.0              |
|kgwiki          |1271     |1157.0            |6.0               |0.0                  |2.0              |
|lbewiki         |1257     |2472.4000000000005|4.0               |9.652509652509653E-4 |3.0              |
|omwiki          |1197     |8250.200000000003 |5.0               |9.624639076034649E-4 |5.0              |
|srnwiki         |1188     |2452.0499999999975|4.0               |0.0                  |6.0              |
|fjwiki          |1156     |1812.0            |4.0               |0.0                  |1.0              |
|smwiki          |1038     |3547.0499999999997|4.0               |6.191950464396285E-4 |14.0             |
|ltgwiki         |1008     |4648.999999999998 |7.0               |0.002152852529601722 |8.0              |
|nqowiki         |992      |19985.299999999996|5.0               |0.0                  |10.0             |
|chrwiki         |972      |2749.7499999999854|8.0               |0.00267379679144385  |3.0              |
|stwiki          |959      |4558.299999999988 |4.0               |0.0010575296108291032|2.0              |
|gotwiki         |957      |4345.599999999998 |3.0               |0.0                  |2.0              |
|klwiki          |869      |2479.399999999998 |7.0               |0.0                  |3.0              |
|pihwiki         |850      |2300.599999999995 |7.0               |0.0011001100110011   |2.0              |
|tnwiki          |844      |9040.149999999994 |4.0               |0.0018450184501845018|7.0              |
|nywiki          |830      |7664.749999999995 |4.0               |0.0026109660574412533|5.0              |
|twwiki          |791      |3165.0            |3.0               |0.002593116454502593 |4.0              |
|chywiki         |783      |871.3999999999999 |6.0               |0.0                  |1.0              |
|cuwiki          |780      |4775.449999999997 |12.049999999999955|0.0011441647597254005|3.0              |
|bmwiki          |759      |1933.4000000000026|8.0               |0.0                  |2.0              |
|tswiki          |729      |7353.000000000001 |51.0              |0.0025806451612903226|5.0              |
|tumwiki         |723      |938.4999999999994 |3.0               |0.0                  |0.0              |
|rmywiki         |716      |3411.25           |5.0               |0.0                  |7.0              |
|rnwiki          |715      |3252.7999999999893|2.0               |0.0                  |1.0              |
|ikwiki          |674      |795.5500000000003 |4.0               |0.0                  |1.0              |
|iuwiki          |634      |1248.4            |5.0               |0.0                  |2.0              |
|kswiki          |569      |1879.0000000000005|4.0               |0.0                  |2.0              |
|adywiki         |566      |3097.75           |4.0               |0.0                  |5.0              |
|sswiki          |560      |6051.399999999996 |6.0               |0.00338409475465313  |6.0              |
|chwiki          |547      |1600.9999999999952|4.699999999999932 |0.0                  |2.0              |
|pntwiki         |523      |5468.799999999999 |8.0               |2.6574541589157585E-4|4.0              |
|vewiki          |451      |2135.0            |3.0               |0.001392757660167131 |1.0              |
|eewiki          |388      |5256.549999999984 |4.649999999999977 |0.0013966480446927375|5.0              |
|tiwiki          |373      |6209.999999999998 |6.0               |3.3022917905026086E-5|2.0              |
|ffwiki          |368      |9853.75           |11.0              |9.313877677739833E-4 |5.0              |
|dinwiki         |305      |2780.4000000000005|7.0               |0.0                  |2.0              |
|dzwiki          |295      |13353.900000000012|8.0               |0.0                  |13.0             |
|sgwiki          |295      |1310.1000000000017|63.0              |0.0                  |2.0              |
|crwiki          |175      |1270.9999999999977|4.299999999999983 |0.0                  |1.0              |
+----------------+---------+------------------+------------------+---------------------+-----------------+

Weekly updates:

  • Looked into moving from a straight linear regression to essentially just a weighted average of the four features currently being used. That leads to some sacrifice in accuracy on English (from 0.913 to 0.867 linear correlation between predicted scores and ORES scores) but has a few nice properties:
    • Automatically bounded between 0 and 1 (because each feature is bounded between 0 and 1)
    • Very easy to interpret
  • Plots below of linear regression and "weighted average" approach (which is just a linear regression w/ no intercept and the weights normalized so they sum to 1):
    • Linear Regression:
      • Screen Shot 2021-04-02 at 11.32.11 AM.png (676×642 px, 102 KB)
    • Weighted-Average:
      • Screen Shot 2021-04-02 at 11.32.22 AM.png (672×632 px, 116 KB)
  • Started process of pulling in data from a few other languages to test generalizability of weights learned for English. Two options that I'm pursuing:
    • ORES scores are available in bulk for a few wikis on HDFS: euwiki, glwiki, and a few others. These are likelihoods for each article quality class from that wiki so would need to be mapped to a float between 0 and 1 if I was to use the same modeling approach.
    • Groundtruth data of WikiProject assessments from PageAssessments is available for Arabic and French (and Turkish and Hungarian) on MariaDB. These also would need to be mapped to floats because they are the native wiki's article quality classes.

Weekly update:

  • Resolving this task. I had wanted to get a bit further down the proof-of-concept road before I committed to a metric but that's really the work of building an API (goal for Q4)
  • We submitted a CSCW paper on this project yesterday. Building on my thoughts in T272175#6894768, one of the really interesting aspects to come out of that work/discussion was that there really isn't a good way to model article importance (which is a measure of an ideal distribution of quality of the projects in line with encyclopedic values) separately from the current state, which is a reflection of editor/reader interest. Classic measures for importance like PageRank or pageviews are highly wrapped up with the current state. Evaluations of editor effort (in terms of # of edits) and article importance (in terms of Vital Articles) show some pretty big gaps, indicating that implicit data sources are always going to be an incomplete proxy and you really need explicit crowdsourced ratings from "experts" (e.g., in the form of Vital Articles or WikiProject Importance ratings) to guide what content should be prioritized to be high quality on Wikipedia. This feeds into this project not just being about developing a ranking but exposing filters for editors to narrow down content to what they see as important, better annotation systems for WikiProjects (which is outside the scope of this work), and the work to tie edit recommender systems to campaigns (which are in effect community-driven importance assessments).
  • In the absence of those crowdsourced ratings, however, reader demand is the other component. In line with past thoughts, a misalignment metric (reader interest vs. current quality) is not actually the ideal measure of article importance but it is simple and can complement importance ratings along with content filters. Warncke-Wang et al. used a binned version of this metric because they had access to quality classes. Because I'm building language-agnostic models, the reader demand and quality scores are both fixed to [0-1] so the difference is nicely scoped to [-1,1]. This can always be mapped back to qualitative classes for the purposes of explaining recommendations -- e.g., C-class article w/ FA-level article pageviews. Continuing the buildout of this into an API will occur in Q4.