Page MenuHomePhabricator

Solution of "unknown fields" from monument lists
Closed, ResolvedPublic

Description

There exists since a several years ago a problem with non-functional feedback to "Unused fields" lists.

E.g., Unknown_fields/monuments_cz_(cs) reports "Památkou_do" as an unknown field even though existence and meaning of that field were repeatedly reported.

Also attempts to rename some fields in the template were unsuccessful due to missing feedback reception by the database administrator.

Event Timeline

SJu raised the priority of this task from to Needs Triage.
SJu updated the task description. (Show Details)
SJu added a subscriber: SJu.

OK. Plan of attack is:

Add a violation counter per list and output the top 5 together with the unknown field list.

Change 378981 had a related patch set uploaded (by Jean-Frédéric; owner: Jean-Frédéric):
[labs/tools/heritage@master] Harvest the source page of unknown fields

https://gerrit.wikimedia.org/r/378981

I was going for T175359 and had forgotten we had already elaborated on this here (back in 2016! :)

My implementation differs from what we had discussed − I just store all violating pages without a counter-per-list, and displays all of them. I realise this might be a lot for some fields (the ones that should be muted) but I expect the report to still be readable since each page is only displayed as an unnamed external link. Happy to revisit if need be.

Change 379126 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@master] Harvest the source page of unknown fields

https://gerrit.wikimedia.org/r/379126

Change 379126 had a related patch set uploaded (by Lokal Profil; owner: Lokal Profil):
[labs/tools/heritage@master] Harvest the source page of unknown fields

https://gerrit.wikimedia.org/r/379126

This is of course an alternative implementation of https://gerrit.wikimedia.org/r/#/c/378981/ which combined some of the code and ideas I'd prepared earlier with the code in that patch.

A test for (aq, en) with sample_size=1 was run and the output can be examined at :c:User:Lokal_Profil/unknown_stats.

The main difference is that this only outputs a sample of the source pages. Most issues should be typos limited to a few pages. Once we are talking loads of pages we probably need to update the config.

Change 379685 had a related patch set uploaded (by Jean-Frédéric; owner: Lokal Profil):
[labs/tools/heritage@master] Harvest the source page of unknown fields

https://gerrit.wikimedia.org/r/379685

Change 378981 abandoned by Jean-Frédéric:
Harvest the source page of unknown fields

Reason:
https://gerrit.wikimedia.org/r/#/c/379126 and https://gerrit.wikimedia.org/r/379685 are more advanced

https://gerrit.wikimedia.org/r/378981

Change 379126 abandoned by Lokal Profil:
Harvest the source page of unknown fields

Reason:
Superseeded by https://gerrit.wikimedia.org/r/#/c/379685

https://gerrit.wikimedia.org/r/379126

Change 379685 merged by jenkins-bot:
[labs/tools/heritage@master] Harvest the source page of unknown fields

https://gerrit.wikimedia.org/r/379685

Mentioned in SAL (#wikimedia-cloud) [2017-09-23T13:44:41Z] <JeanFred> Deploy latest from Git master: 30af42c (T117330)

Yay! I must say − I like how we iterated over the implementation on this :)

@Agathoclea @SJu This was long in the making (2 years ;-) but the data is finally coming through − reports such as https://commons.wikimedia.org/wiki/Commons:Monuments_database/Unknown_fields/monuments_de-nrw-k_(de) should now be more actionable. Let us know if more changes need to be made!

JeanFred claimed this task.

Yay! I must say − I like how we iterated over the implementation on this :)

The time had definitely come for this task when we both start working on it on the same day :) different starting points with different strengths came together well in the end 😁