Page MenuHomePhabricator

Newcomer tasks: Discrepancy in article counter on "Select your interests"
Closed, ResolvedPublicBUG REPORT

Description

There two type of cases

  1. another filter(s) count is ignored in the total count.
  2. the total count is off by one

(1) On cswiki betalabs Homepage Suggested module select the second option in the Easy filter - Přidání odkazů mezi články.

  • Click on select topic and select "Ekonomie" - the article count will be 10.
  • Select "Podnikání" - the count still 10.
  • De-select "Ekonomie" - the count will be 4.

(2) On cswiki betalabs Homepage Suggested module select the second option in the Easy filter - Přidání odkazů mezi články.

Selected topicsCount when selected separatelyCount when selected together
Podnikání 4
Politika4
Podnikání + Politika = 7
Selected topicsCount when selected separatelyCount when selected together
Zdraví1
Společenské4
Zdraví + Společenské = 4

Note: It could be that two cases are, in fact, one case - the article might be counted twice if they belong to different topics.
I looked what articles were selected in the (2) issue -"Milton Friedman" article was fetched when Zdraví was selected and when Společenské selected (separately).

Event Timeline

This issue is not a blocker for the release of V1.1.0, but we should address it afterward.

kostajh changed the subtype of this task from "Task" to "Bug Report".Jan 15 2020, 1:16 PM

Change 565459 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Use queue length, not total result count for initation dialog result counter

https://gerrit.wikimedia.org/r/565459

Can't reproduce the weird part - selecting Zdraví + Společenské vědy gives me 4 hits. Other than that, this is just due to (not) double-counting as you say. Which is currently somewhat incorrect as the tasks do get duplicated (the patch makes sure that we always report the number of task cards, not the number of matching articles, when the two differ), but once T243036: Newcomer tasks: rules for duplicate results is fixed this will be the correct behavior.

Change 565459 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Use queue length, not total result count for initation dialog result counter

https://gerrit.wikimedia.org/r/565459

Re-tested https://gerrit.wikimedia.org/r/565459 - the test cases from task description show the same incorrect results. I corrected the case for - it should 4, not 7 as it was reported before: (correction) Zdraví + Společenské = 4

This task is a sub task for T243036: Newcomer tasks: rules for duplicate results; moving it back to Ready for Development to wait on T243036.

This seems to me like a duplicate of T243036 but maybe I am misunderstanding the issue. Is there anything here that would be left after T243036 is done, of that could be done without resolving T243036?

This seems to me like a duplicate of T243036 but maybe I am misunderstanding the issue. Is there anything here that would be left after T243036 is done, of that could be done without resolving T243036?

Pinging @MMiller_WMF
I re-tested this issue - and it seems that actual count (the count of unique articles) is correct (see the details of testing below).

However, before closing this task as a non-issue, let's consider what users' expectations should be when a user browses through topics selection and looks at the counter? If I combine topics why I should see less number of articles that the sum of numbers of articles for each of the topics? Maybe to improve users' experience the counter should say "N unique topics"?

https://cs.wikipedia.beta.wmflabs.org/wiki/Speci%C3%A1ln%C3%AD:API_p%C3%ADskovi%C5%A1t%C4%9B#action=query&format=xml&list=growthtasks&gttasktypes=links&gttopics=business%7Ceconomics

Count for unique articles=10

Economics count =10Business count=4
title="Milton Friedman" tasktype="links" difficulty="easy" order="0">
title="Mexická ekonomická krize" tasktype="links" difficulty="easy" order="1">
title="Ekonomika Robinsona Crusoe" tasktype="links" difficulty="easy" order="2"
title="Ekonomický sektor" tasktype="links" difficulty="easy" order="3">title="Ekonomický sektor" tasktype="links" difficulty="easy" order="3"
title="Ekologická daň" tasktype="links" difficulty="easy" order="4">title="Ekologická daň" tasktype="links" difficulty="easy" order="0"
title="Kapitalismus dohledu" tasktype="links" difficulty="easy" order="5">
title="Ekonomika Svazijska" tasktype="links" difficulty="easy" order="6">title="Ekonomika Svazijska" tasktype="links" difficulty="easy" order="2"
title="Průmyslová politika Evropské unie" tasktype="links" difficulty="easy" order="7">title="Průmyslová politika Evropské unie" tasktype="links" difficulty="easy" order="1"
title="Helikoptérové peníze" tasktype="links" difficulty="easy" order="8">
title="Ekonomika měst a obchod ve středověké Anglii" tasktype="links" difficulty="easy" order="9">
Economics + Business
<growthtasks totalCount="14">
      <suggestions>
        <suggestion title="Ekonomický sektor" tasktype="links" difficulty="easy" order="0">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>100</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Ekonomika Robinsona Crusoe" tasktype="links" difficulty="easy" order="1">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>50</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Mexická ekonomická krize" tasktype="links" difficulty="easy" order="2">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>12.5</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Helikoptérové peníze" tasktype="links" difficulty="easy" order="3">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>11.111111111111</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Milton Friedman" tasktype="links" difficulty="easy" order="4">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>16.666666666667</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Průmyslová politika Evropské unie" tasktype="links" difficulty="easy" order="5">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>10</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Ekonomika měst a obchod ve středověké Anglii" tasktype="links" difficulty="easy" order="6">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>20</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Kapitalismus dohledu" tasktype="links" difficulty="easy" order="7">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>25</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Ekonomika Svazijska" tasktype="links" difficulty="easy" order="8">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>33.333333333333</_v>
            </_v>
          </topics>
        </suggestion>
        <suggestion title="Ekologická daň" tasktype="links" difficulty="easy" order="9">
          <topics>
            <_v>
              <_v>economics</_v>
              <_v>14.285714285714</_v>
            </_v>
          </topics>
        </suggestion>
      </suggestions>
    </growthtasks>

Honestly, I don't think anyone will care about it. Search result counts are often a bit strange since counting large sets are hard and search engines tend to cheat in various ways (try paging search results in Google or Gmail for example and watch how the counts fluctuate).

Honestly, I don't think anyone will care about it. Search result counts are often a bit strange since counting large sets are hard and search engines tend to cheat in various ways (try paging search results in Google or Gmail for example and watch how the counts fluctuate).

I agree (after thinking more about it). It'd be interesting to hear some users' feedback if there would be any.

@Tgr @Etonkovidova -- thanks for thinking about this. I also agree that as long as the number is accurate, it will be fine. I think it is the correct behavior for articles that exist in multiple topics to only be counted once.