Page MenuHomePhabricator

Add abstract field to the article
Closed, ResolvedPublic8 Estimated Story Points

Description

We need to add abstract field with leads section text to the article object so that we can that present it in the snapshots, realtime, realtime batch and on-demand API(s).

Acceptance criteria
abstract field with lead section text is being produced in structured-data service

ToDo

  • add new abstract field to the schema
  • create function that will extract lead section text from HTML
    • should leave in the parser repository in wikimedia-enterprise/general with a method called GetAbsract
  • add extraction of the lead section to the articleupdate handler
Test Strategy

Main strategy would be unit testing for now, collect around 10 - 20 articles to run unit tests against.

Decisions to be made based on technical complexity
  • should we exclude everything between curly braces in the lead section or remove the noise like pronunciation? opt out to just removing it if cleaning the noise is to complex
Notes

Please refer to leads PoC to get the example of function and output.

Event Timeline

Protsack.stephan created this task.
Daria_Kevana changed the task status from Open to In Progress.Dec 22 2022, 10:36 AM
Daria_Kevana changed the task status from In Progress to Open.Jan 26 2023, 12:59 PM