Page MenuHomePhabricator

Enhance extract_sections headings with level and (stripped) name attributes
Closed, ResolvedPublicFeature

Description

extract_sections returns sections in a list, which doesn't replicate the sections' hierarchy determined by heading levels. If a bot is supposed to understand the structure of an article, the hierarchy is crucial.

Providing our implementation will reduce the code needed by Pywikibot users.

Requirements:

  • The level of a heading is the maximal number of equals signs present both at its beginning and its end.
  • The stripped name does not contain the equals signs.

Event Timeline

The stripped name does not contain the equals signs.

The section headers aren't stripped:

import pywikibot
from pywikibot.textlib import extract_sections
from pprint import pprint
s = pywikibot.Site('wikipedia:en')
t ="""'''A''' is a thing.

== History of A ==
Some history...

== Usage of A ==
Some usage...

=== Some details ===
Details of Usage...

[[Category:Things starting with A]]
"""
x = extract_sections(t, s)
pprint(x.sections)
[_Section(title='== History of A ==', content='\nSome history...\n\n'),
 _Section(title='== Usage of A ==', content='\nSome usage...\n\n'),
 _Section(title='=== Some details ===', content='\nDetails of Usage...\n\n')]
x.sections[0].title
'== History of A =='

That's precisely why such a "stripped name" should exist. ("The stripped name does not contain the equals signs." is the description of the feature.)

That's precisely why such a "stripped name" should exist. ("The stripped name does not contain the equals signs." is the description of the feature.)

Ah, now I understand the FR

Xqt triaged this task as Medium priority.Jun 11 2023, 12:26 PM

Change 930899 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [IMPR] improvement for textlib.extract_sections() function

https://gerrit.wikimedia.org/r/930899

Change 930899 merged by jenkins-bot:

[pywikibot/core@master] [IMPR] improvement for textlib.extract_sections() function

https://gerrit.wikimedia.org/r/930899