Page MenuHomePhabricator

Qualitative Checks article-updating with Realtime API
Closed, ResolvedPublic13 Estimated Story Points

Description

User Story: As a product manager, I need to be able to look at chunks of data coming from our APIs in a readable/comparing-friendly format so that they can identify bugs, errors, or inaccuracies.

QA Question: Does this upcoming credibility signals feature work as anticipated?

Comprehensiveness:

  • Do the signals work for all revision of kinds?
  • Do the signals work for every language?

Accuracy:

  • Are the revision responses returning as intended/consistently?
  • Is the credibility data accurate to the revision?

What we need to test this?:

Streaming data:

  1. Easier: I would like a zip file with 30 minutes of all revisions passed through the Streaming API (with credibility signals in v2).
  2. Harder: I would like a google sheet (formatted like below) with 30 minutes of all revisions passed through the Streaming API (with credibility signals in v2).
headercolumnforeachfieldinv2payload.....
a revision from the streaming api
a revision from the streaming api
a revision from the streaming api
a revision from the streaming api
a revision from the streaming api

Static specific article-based data:

30 articles (listed below) of different types, structures, and editor interest for manual testing. We would like these ported into either of the same options as the streaming data, but separate files/spreadsheets. Please pull the most current version of each of these.

Related Objects

Event Timeline

A nice to have @Marcelo.castillo would be to be able to do this same process easily in the future. We imagine this type of testing will become much more common as credibility signals "grows" and designing this to be repeatable would be helpful.

Lena.Milenko changed the task status from Open to In Progress.Aug 23 2022, 12:02 AM

Update —

After a slack conversation with Marcelo, we decided to adjust this ticket. The suggested list in the original description above failed to deliver data for a 30-minute read of the stream, causing confusion.

So, we decided to adjust the request to include 30min worth of ESWIKI and 30m worth of ENWIKI updates, for the MOST edited entries of the week to ensure data capture, over the checkers mastery of the content.

Marcelo.castillo changed the point value for this task from 5 to 13.Sep 13 2022, 3:14 PM

Update — After working with the Support team to refine this ticket based on issues or organizing and ingest heavy amounts of data, we realized more specifics were missing.
Support delivered this clean CSV, containing a few signals and useful information. But we need more for better testing.

In the next iteration, columns A - C will remain the same. Column D, which is currently the URL of the en.wiki entry that carried the edit, will be come the URL of the edit itself, as in this example of the Papua New Guinea entry. This URL gives the PM access to the diff of the entry, to better analyze the edit itself.

After sharing the Master Schema with support, we decided to expand the scope of this ticket in order to include more signals.

protection
version.scores
version.scores.damaging
version.scores.damaging.prediction
version.scores.damaging.probability
version.scores.damaging.probability.false
version.scores.damaging.probability.true
version.scores.goodfaith
version.scores.goodfaith.prediction
version.scores.goodfaith.probability
version.scores.goodfaith.probability.false
version.scores.goodfaith.probability.true
version.diff.dictionary_words
version.diff.dictionary_words.increase
version.diff.dictionary_words.decrease
version.diff.dictionary_words.sum
version.diff.dictionary_words.proportional_increase
version.diff.dictionary_words.proportional_decrease
version.diff.non_dictionary_words
version.diff.non_dictionary_words.increase
version.diff.non_dictionary_words.decrease
version.diff.non_dictionary_words.sum
version.diff.non_dictionary_words.proportional_increase
version.diff.non_dictionary_words.proportional_decrease
version.diff.uppercase_words
version.diff.uppercase_words.increase
version.diff.uppercase_words.decrease
version.diff.uppercase_words.sum
version.diff.uppercase_words.proportional_increase
version.diff.uppercase_words.proportional_decrease
version.diff.non_safe_words
version.diff.non_safe_words.increase
version.diff.non_safe_words.decrease
version.diff.non_safe_words.sum
version.diff.non_safe_words.proportional_increase
version.diff.non_safe_words.proportional_decrease
version.diff.informal_words
version.diff.informal_words.increase
version.diff.informal_words.decrease
version.diff.informal_words.sum
version.diff.informal_words.proportional_increase
version.diff.informal_words.proportional_decreaseversion.diff.dictionary_words
version.diff.dictionary_words.increase
version.diff.dictionary_words.decrease
version.diff.dictionary_words.sum
version.diff.dictionary_words.proportional_increase
version.diff.dictionary_words.proportional_decrease
version.diff.non_dictionary_words
version.diff.non_dictionary_words.increase
version.diff.non_dictionary_words.decrease
version.diff.non_dictionary_words.sum
version.diff.non_dictionary_words.proportional_increase
version.diff.non_dictionary_words.proportional_decrease
version.diff.uppercase_words
version.diff.uppercase_words.increase
version.diff.uppercase_words.decrease
version.diff.uppercase_words.sum
version.diff.uppercase_words.proportional_increase
version.diff.uppercase_words.proportional_decrease
version.diff.non_safe_words
version.diff.non_safe_words.increase
version.diff.non_safe_words.decrease
version.diff.non_safe_words.sum
version.diff.non_safe_words.proportional_increase
version.diff.non_safe_words.proportional_decrease
version.diff.informal_words
version.diff.informal_words.increase
version.diff.informal_words.decrease
version.diff.informal_words.sum
version.diff.informal_words.proportional_increase
version.diff.informal_words.proportional_decrease

Based on conversations with @FNavas-foundation and data served by the version of the Firehose API currently running on production, T2 was unable to produce data samples containing all of the fields requested above. At this time, Firehose is not serving any diff related data.

Data samples were compiled and sent to @FNavas-foundation for evaluation.

The team also worked on creating a consumer that can easily export credibility data over a fixed time-frame, filtering by project and category. Code solution is currently awaiting feedback.

AnnaMikla changed the task status from In Progress to Open.Oct 28 2022, 12:27 PM
AnnaMikla changed the task status from Open to In Progress.

Had a chat with Luvo and seems like there is a bit of confusion being caused by the 30 minute clause in the original ticket. The scope of this ticket has changed from a manual testing mechanism to a integration testing framework that can be used for validating our page update and code for content integrity signals.

So I am removing the requirement for having the data available for the past 30 minutes.

We had a feedback session with Stephan on on the 14th of November. Stephan mentioned that there might be a different way to get the credibility signals than our approach, that has less code. And that a handler might not be required. It was also stressed that instead of having a txt with list of articles as an input for mocks and having separate mocks for a new handler, the urls should just be added on the main mock file. So in the meantime a new draft merge request can be made and Stephan will take a look at it, a sync-up might be possible sometime next week to see next steps.

Had a sync-up meeting with Stephan today. He advised us to use the control centre for getting data, instead of directly consuming from the topics.

Daria_Kevana changed the task status from In Progress to Open.Jan 26 2023, 1:00 PM

The Due Date set for this open task passed a while ago. Resetting.