VE automatic citation tool should add section title from hash URL if possible
Open, Stalled, Needs TriagePublic

Description

Steps to reproduce:

  1. Open a page in VE, add a reference.
  2. Use https://developer.mozilla.org/en-US/docs/Web/HTML/Using_the_application_cache#Storage_location_and_clearing_the_offline_cache as URL.
  3. Generate and insert the reference.

Actual result: The title shown ignores the section title identified by the hash in the URL, just saying: "Using the application cache"

Expected result: As the hash identifies the section in a machine readable way (<h2 id="Storage_location_and_clearing_the_offline_cache">Storage location and clearing the offline cache</h2>), the section title "Storage location and clearing the offline cache" could and should be added automatically.

Schnark created this task.Jul 26 2016, 8:06 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 26 2016, 8:06 AM
Jdforrester-WMF changed the task status from Open to Stalled.Jul 26 2016, 7:05 PM
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

I'm not sure how this'd work. Not all <a> anchors are even remotely human-readable; this much-more readable example, though nice, is particularly rare, and parsing it into something that "looks right" (outside of MediaWiki installs) would probably need some sort of AI.

Also, it's not clear to me how we'd even know inside Citoid if the page we're hitting is a MediaWiki instance for the edge case where we could detect this. Any ideas?

Headlines are machine-readable: They are (or should be) <h1> to <h5>. So when the hash references an element that is a headline (like in the example) or an element that's a direct child of a headline (like in MediaWiki), then its text content should be retrieved.

Of course, there will be cases, where the element with the id is just some non-structural element somewhere in the text. In this case Citoid won't be able to get a sensible headline from it, but that's just the current behavior.

Adding the headline in those cases where it can be determined automatically would be an improvement over the current behavior.

Mvolz moved this task from Backlog to Extension on the Citoid board.Jul 29 2016, 1:01 PM

Other example: http://www.chromium.org/blink#vendor-prefixes This uses an <a name=""> inside an <h3>.