Page MenuHomePhabricator

[Curious Facts] improvements to issue descriptions
Closed, ResolvedPublic

Description

Problem:
User were having issues understanding what exactly the issue is that the Curious Facts tool found. We can try to improve the description of the issues to make this easier.

Example:
"Chau Yee Ping (Q56611515) has educated at (P69): Lam Tai Fai College (Q11105870), not found in: organization (Q43229)." was not understood by the tester.

Acceptance criteria:

  • Issue description texts are more understandable by editors

Event Timeline

@GoranSMilovanovic could you point me to the messages that generate the text for the issue descriptions? Then I can make some wording suggestions.

@Lydia_Pintscher

The issue descriptions are generated in the final phase of the knowledge extraction modules in the Qurator Curious Facts system. We are talking R code. That makes anyone's intervention there quite complicated - unless I do it. My suggestion is to let me handle the problem, and the provide a set of examples in a separate document and then you can intervene there.

Issue descriptions are problematic because different English auxiliary verbs need to be used with different Wikidata properties. E.g. someone has something vs. someone is something and similar. What happens in Curious Facts is far from real natural language generation, but the descriptions need to be generated somehow. It is simple, but still specific cases have to be predicted and handled properly. The system takes care about the proper verb-property matches, but I have obviously missed a particular case or two that need to be handled in a specific way.

Since knowledge extraction for Curious Facts is quite complicated and always imply a lengthy update from the Analytics Cluster, my suggestion is to handle the issue in the same run as T277564 (take separators into account for single value constraints).

The issue descriptions are generated in the final phase of the knowledge extraction modules in the Qurator Curious Facts system. We are talking R code. That makes anyone's intervention there quite complicated - unless I do it. My suggestion is to let me handle the problem, and the provide a set of examples in a separate document and then you can intervene there.

👍

@Lydia_Pintscher

  • A generic fix was applied across all types issue descriptions and they should change as soon as the lengthy update procedure finishes;
  • then we can evaluate them and see if they need further improvement;
  • I will ping here as soon as the update is done.

@Lydia_Pintscher

  • The update is now complete, but it seems like the changes in descriptions did not apply correctly to all reported anomalies.
  • Inspecting now.

@Lydia_Pintscher

  • I will now re-run parts of the update to see what could have gone wrong with some of the issue descriptions.

Ok @Lydia_Pintscher @WMDE-leszek

  • following the latest update of the Curious Facts system I am facing a serious production side problem related to the {data.table} package;
  • the Curious Facts dashboard will be down for some time
  • prioritizing this now.

@Lydia_Pintscher @WMDE-leszek

  • Ok, I have figured out the problem.
  • Implementing the fix now. It might take some time.

@Lydia_Pintscher @WMDE-leszek

Ok; the generic fix for issue descriptions is now in place; the system is back online from Wikidata Analytics.

@GoranSMilovanovic Hello, We made some changes to how the issue description should look. Please let us know if there are any other scenarios that we overlooked.

@amy_rc @WMDE-leszek

Hey do the changes in T277551#7091927 and discussed in the doc precede or follow my interventions announced in T277551#7069216?

I am making some comment in the doc for you right now.

Thank you @GoranSMilovanovic . Only the ISSUE DESCRIPTION topic is related to this ticket. The others are already done.

Only the ISSUE DESCRIPTION topic is related to this ticket. The others are already done.

@amy_rc I understand that, but did you (or anyone else) check the system following the changes announced in T277551#7069216 (this ticket, Friday, May 7)?

Only the ISSUE DESCRIPTION topic is related to this ticket. The others are already done.

@amy_rc I understand that, but did you (or anyone else) check the system following the changes announced in T277551#7069216 (this ticket, Friday, May 7)?

Yes. @Lydia_Pintscher and I tested it after reading T277551#7069216. We then made the corresponding notes.

@amy_rc @WMDE-leszek

Yes. @Lydia_Pintscher and I tested it after reading T277551#7069216. We then made the corresponding notes.

You must have missed something in that case because all issue descriptions are already replaced with their generic versions that avoid linguistic confusions such as

"Chau Yee Ping (Q56611515) has educated at (P69): Lam Tai Fai College (Q11105870), not found in: organization (Q43229)." was not understood by the tester.

given in the description of this ticket.

I tried out the tool and came up with some notes, which I compiled into a table. As demonstrated in the third column 'should be like', we wish to improve description in this manner. Can you have a look ? Please let me know if you have any questions.

image.png (327×927 px, 119 KB)

@amy_rc The suggestions given in T277551#7137941 imply a full system update.
I am currently implementing the changes and running the update on the fly to speed up the process.
Necessarily changes in the codebase will happen once the update is completed.
Reporting back here as soon as the new issue descriptions are available from the Curious Facts dashboard.

@amy_rc

  • Part 1 (the so called m1 anomalies): solved; issue descriptions fixed in accordance w. T277551#7137941
  • Note: changes will not be visible on the dashboard before the full system update is completed;
  • Now running a manual update for the so called m2 class of anomalies and fixing issue descriptions on the fly;
  • And then the m3 update remains to be done before the changes take effect.

Change 699487 had a related patch set uploaded (by GoranSMilovanovic; author: GoranSMilovanovic):

[analytics/wmde/WD/WikidataAnalytics@master] T277551

https://gerrit.wikimedia.org/r/699487

Change 699487 merged by GoranSMilovanovic:

[analytics/wmde/WD/WikidataAnalytics@master] T277551

https://gerrit.wikimedia.org/r/699487

@amy_rc

  • Full system update completed;
  • Issue descriptions are now fixed (local tests completed);
  • deploying soon; it will be ready for tests in an hour or so.

I've gone over the issue descriptions and tested. Looks good :) 🎉 🌺