Implement a reasonably elegant and non-labor-intensive means of describing/summarizing pages
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	• leucosticte
	Jan 4 2014, 8:39 AM

Description

It's handy to have a means of summarizing or describing page contents, so as to generate meta descriptions tags, blurbs for inclusion in feeds, etc. Several approaches have been tried:

Grabbing the first x characters of an article without regard to where sentences cut off (e.g. Extension:Blurb or Extension:TextExtracts)
Using a template, e.g. {{PageSummary|'''[[Humility]]''' is a psychological state, that is the opposite of [[dominance]]. |Humility allows one to see the intrinsic value of others (as opposed to only extrinsic value), and is therefore the largest factor of [[empathy]]. A person with humility therefore sees minors as having intrinsic value, as contrasted with being objects of domination, which they are mostly regarded as being by the laws and practices of the status quo. Like dominance, humility is an innate psychological trait.}} See docs at http://childwiki.net/wiki/Template:PageSummary . This is implemented by Extension:BedellPenDragon Notice that there are two parameters here, parameter #1 for the first sentence of the lead and parameter #2 for the remainder of the lead.
Adding/modifying the description by means of a separate text box (Extension:Advanced_Meta) or separate page (Extension:ExplicitDescription) from the article text or Wikidata.

Ideally, we could implement a feature to automatically grab the first sentence of the lead; however, it's hard for software to detect the ends of sentences, since punctuation marks such as the period can appear in the middle of sentences ("Afterward, Mr. Brown went to the U.S. District Courthouse . . . and when he came back, everyone was gone.")

If you have any ideas on the best way to do this, feel free to post them. Thanks.

See also: T59669

Details

Reference: bz59641

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open	Feature	None	T58604 Add checkbox to Special:AllPages to hide pages lacking page summaries
		Duplicate		None	T61641 Implement a reasonably elegant and non-labor-intensive means of describing/summarizing pages

Event Timeline

• bzimport raised the priority of this task from to Lowest.Nov 22 2014, 2:17 AM

• bzimport added a project: MediaWiki-extension-requests.

• bzimport set Reference to bz59641.

• bzimport added a subscriber: Unknown Object (MLST).

• leucosticte created this task.Jan 4 2014, 8:39 AM

MobileFrontend implemented that for one of their APIs:
https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=extracts&format=json&exlimit=1&exintro=&explaintext=&titles=Barack_Obama

It was supposedly migrated to https://www.mediawiki.org/wiki/Extension:TextExtracts since.

Bug 5335 has been marked as a duplicate of this bug. ***

(In reply to Nathan Larson from comment #0)

Ideally, we could implement a feature to automatically grab the first
sentence of the lead; however, it's hard for software to detect the ends of
sentences, since punctuation marks such as the period can appear in the
middle of sentences ("Afterward, Mr. Brown went to the U.S. District
Courthouse . . . and when he came back, everyone was gone.")

If you have any ideas on the best way to do this, feel free to post them.
Thanks.

For TextExtracts, that would be bug 57669. And yeah, any insights on better sentence handling would be highly appreciated:)

Tgr updated the task description. (Show Details)Jun 26 2015, 12:54 AM

Tgr set Security to None.

Tgr updated the task description. (Show Details)

This is a common NLP problem called sentence segmentation. Instead of reinventing wheels, just grab some library like OpenNLP or NLTK and see how well it fares?

Tgr mentioned this in T37363: Anchors to first sentence in lead paragraph of articles.Jun 26 2015, 1:06 AM

Tgr mentioned this in T117082: Cached REST endpoint for extracts requests.Nov 9 2015, 10:52 PM

Tgr mentioned this in T156883: Automated article summaries.Feb 1 2017, 4:03 AM

Merged into T127038 to keep discussion centralized in one place.

Implement a reasonably elegant and non-labor-intensive means of describing/summarizing pagesClosed, DuplicatePublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Implement a reasonably elegant and non-labor-intensive means of describing/summarizing pages
Closed, DuplicatePublic
Actions

Related Objects
Search...