Page MenuHomePhabricator

Impact module: Add "Reference added" count
Open, Needs TriagePublic

Description

This task covers adding a "References added" count to the new Impact module.

User story & summary:

As a newcomer, I want to see a count of references I have added to articles in my Impact module, so that I can understand Wikipedia values verifiable accuracy and citing reliable, authoritative sources.

As a librarian participating 1Lib1Ref campaign, I want to see a references added count in my Impact module, so that I can track my work and share how many references I have added.

Design:

TBD

Acceptance Criteria:

Given I'm a logged in editor using a regular account (not a temporary account),
When I'm viewing my New Impact Module on my homepage, Special:Impact, or viewing my Impact module transcluded on another page,
Then I can see a count of how many references I have added to articles

Event Timeline

@Urbanecm_WMF - This task is request to help support the 1Lib1Ref campaign.

Before I discuss this further with the team and consider if we should move to design, can you give me a sense of if this is technically feasible? (And if yes, a guesstimate at how much work it would be). I assume it might be impossible to calculate a perfect count, but perhaps a count of edits that a user made that included additions that edited text within <ref> </ref>?

@Urbanecm_WMF - This task is request to help support the 1Lib1Ref campaign.

Before I discuss this further with the team and consider if we should move to design, can you give me a sense of if this is technically feasible?

It definitely wouldn't be trivial, to say the least. That said, there is a fairy straightforward approach that we could take here.

(And if yes, a guesstimate at how much work it would be). I assume it might be impossible to calculate a perfect count, but perhaps a count of edits that a user made that included additions that edited text within <ref> </ref>?

I think the easiest solution here would be to introduce a new tag for all edits that add a reference (similar to what happens when you rollback an edit or when your edit is reverted; in both cases, see "Tags: Rollback" or "Tags: Reverted" in the diff view). In the tag metadata, we can then store the total number of references added in that edit. Once the tag is there, we can use its metadata to determine the total number of references added in the Impact module.

A couple of things to be aware of when considering this approach:

  1. The approach I suggested is based on the latest approach selected in T266067: [L] Create edit tags to measure multimedia edits to Wikipedia articles, which attempted to count added images (from a technical perspective, counting images is comparable to counting references). The work done in that task took several attempts (that accidentally managed to break certain things). Even the latest approach (patch) was ultimately reverted, because "This was inaccurate/incomplete & looks like we will not need this after all" (emphasis mine; see T286362: [XL] mw-add-media and mw-remove-media tags are added to edits without changes in media for details). I'm not yet sure whether what happened with images would happen with references, but it certainly might. FTR, Structured Data was the team that counted images, but I'm unsure whether looping them in would be helpful, as this was ~2 years ago.
  2. The tag would have to be added in the Cite extension, which is owned by Editing-team. I don't see any reason why they might object to adding the tag I suggested above, but we should probably give them a chance to do so before doing things.
  3. We would need to complete the work of adding the tag before the measuring itself is supposed to start. In other words, the Impact module will not see references added before we add the measuring code. I think this point impacts at least planning the work (and potentially also design).

With the above considerations in mind, I think implementation would take about 2 weeks.

Does this help @KStoller-WMF? Is there anything else I can help here?

Thank you, @Urbanecm_WMF! It is very helpful to not only understand the time estimate, but also the history of a similar effort. I'll follow up with @SEgt-WMF and I will also follow up with the Editing team if this is something we decide to move forward. Thank you so much!

Sounds perfect! Moving this away from Needs Discussion/Analysis for now, since this needs specifications rather than discussion, but do feel free to move back if you think it is appropriate.

Thank you'all for your work on this <3
Good, diverse, crowdfunded & assessable references, is our strongest card in the face of AI/LLM's hallucinations, and I believe as @KStoller-WMF said that this would help to highlight how important references are to the wikis!
Let me know how I can help with specifications.
Just in case it is useful: In the Outreach Dashboard references were also counted via ORES using the features' data (if I'm not mistaken it is under feature.wikitext.revision.ref_tags).
Fortunately this data will also be available via LiftWing

[...]
Just in case it is useful: In the Outreach Dashboard references were also counted via ORES using the features' data (if I'm not mistaken it is under feature.wikitext.revision.ref_tags).
Fortunately this data will also be available via LiftWing

Oh, interesting! Thanks for highlighting this. Unfortunately, using ORES data from Impact module context doesn't seem to be a straightforward operation (currently, it is based on a number of SQL queries, and counts are obtained using revision tags), but this is helpful to know.

I also can't locate the data in the link you submitted -- am I missing something here? I'm wondering whether ORES-powered revision counting works across all wikis, or only on wikis where ORES is supported. If the latter, this would be a problem for Growth's usecase, as we operate on all Wikipedias.

Sorry, I missed your comment @Urbanecm_WMF ! As far as I know ORES-powered revision counting only works where ORES is supported, this is a problem for many languages, and the reason why we thought LiftWing could be an option.
For instance, Spanish, French & other languages references are not being counted in #1Lib1Ref campaigns in the Outreach Dashboard. Fortunately, @Gabinaluz is currently working on solving this issue

Hey all!
Yes, I'm currently working on improving how the "references added" metric is calculated in the Wiki Ed Dashboard. We already deployed it to staging and we're doing the last tests. Changes should be in production soon (likely by next week).

In case this is useful for you: as part of the project, I built a reference-counter flask API, hosted in Toolforge. This API was built to easily retrieve the number of references existing in a given revision. It works for every wiki except for wikidata. The counting is based on wikitext and takes ref tags and shortened footnote templates into account.
You can read more details about the API in this doc.
Let me know if I can be of any assistance.