Page MenuHomePhabricator

[WLM] Analyze existing lists
Closed, ResolvedPublic3 Estimated Story Points

Description

To find patterns and exceptions.
Fetch lists, parse all tables, check columns.

Event Timeline

Tobi_WMDE_SW raised the priority of this task from to Medium.
Tobi_WMDE_SW updated the task description. (Show Details)
Tobi_WMDE_SW set Security to None.
Tobi_WMDE_SW edited a custom field.
KasiaWMDE raised the priority of this task from Medium to High.Jul 9 2015, 12:45 PM

I've written a bot that analyzes the existing templates in the list pages. As @kai.nissen suspected, each German county uses its own format, with the majority of the pages using a template for each table row. As a next step, I'll analyze the table headers in counties that don't use templates.

You can see the result of my analysis here:
https://docs.google.com/a/wikimedia.de/spreadsheets/d/1pr0TdZE2sMKhP4OPUEQBy0tMtNRZx-2rItQP2hhZkAQ/edit?usp=sharing

Thanks to the work by @Andrew-WMDE I was able to analyze the pages that still use tables instead of templates. I had to revise my analysis of the template usage - there are some categories where templates and tables are mixed (highlighted in yellow instead of green).

The column for images in each table is easily identifiable: It's either "Foto", "Bild" or "Bild neu" (for tables with historic photos of monuments).
The column headings of unique ID of monuments are not that easy. I've thrown together a regex that matches all the different "id-like" column names:
(:?ID|Dok|Listen|Akten)-Nr\.?|Erfassungsnummer|ObjektID|(:?lfd?\. |ID-|Dokumenten-)Nummer|Denkmal-?(:?Nr|Nummer)|Nummer|Nr\.?|Zä\.?"
However, sometimes the id is not page-unique, sometimes fields have to be combined, it's definitely a nontrivial task.

That, together with 150-200 different table formats, leads me to the recommendation that we only support pages with templates for WLM 2015. For 2016 we could suggest to the community to write a bot that can parse the most common table formats and replaces them with templates, plus generating a list of pages with unrecognized headers.

Andrew-WMDE claimed this task.