Page MenuHomePhabricator

[WLM] Update-bot: parsing the table format
Closed, DeclinedPublic8 Estimated Story Points

Description

  • Determine which table format is used
  • Extract information for a specific table format

Event Timeline

Tobi_WMDE_SW raised the priority of this task from to High.
Tobi_WMDE_SW updated the task description. (Show Details)
Tobi_WMDE_SW set Security to None.
Tobi_WMDE_SW edited a custom field.

After running an analysis bot, that returned over 200 possible table formats, I'd recommend to split the functionality of parsing tables and converting them into agreed-upon templates (one for each county) into a separate bot. This bot could also create lists of tables that can't be parsed. This bot is probably out of scope for this sprint and maybe out of scope for WLM 2015.

The column headings of unique ID of monuments have a great variation. I've thrown together a regex that matches all the different "id-like" column names:
(:?ID|Dok|Listen|Akten)-Nr\.?|Erfassungsnummer|ObjektID|(:?lfd?\. |ID-|Dokumenten-)Nummer|Denkmal-?(:?Nr|Nummer)|Nummer|Nr\.?|Zä\.?"
However, sometimes the id is not page-unique, sometimes fields have to be combined, it's definitely a nontrivial task.

Removed from sprint after we decided to only support "compatible" lists.

Should we close/decline this task?

Using templates is the way forward and will make the lists more semantic and useful, so I've declined this.
In the future, we could suggest writing a limited-scope update bot or checker-bot equivalent for the communities that still use tables.