⚓ T342109 {Machine Readability} Parsing Sections

Status	Assigned	Task
Resolved	None	T350102 OKR 4.3
Open	None	T342109 {Machine Readability} Parsing Sections
Declined	None	T342261 Improve our plain text parsing of Wikipedia sections
Resolved	ROdonnell-WMF	T346251 Investigate Parsing Sections
Resolved	ROdonnell-WMF	T346674 Migrate Section Demo code to Structure Contents endpoint dev
Open	None	T346676 Investigate Huggingface and populate plain text Wikipedia datasets (competitor analysis)
Open	None	T346677 Investigate Sections JSON for Right to Left languages (RtL)
Resolved	ROdonnell-WMF	T346678 Deploy Sections to Structured Contents endpoint prod
Resolved	Protsack.stephan	T350018 Add sections demo to Structured Contents Use Case
Resolved	Protsack.stephan	T351592 [Defect] Sections parser shows Squirels paragraphs twice

SDelbecque-WMF created this task.Jul 18 2023, 12:43 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 18 2023, 12:43 PM

SDelbecque-WMF updated the task description. (Show Details)Jul 18 2023, 1:00 PM

SDelbecque-WMF added a project: Wikimedia Enterprise - Machine Readability.Jul 19 2023, 12:26 PM

JArguello-WMF renamed this task from {Machine Readability} Parsing Sections to Parsing Sections .Jul 19 2023, 2:14 PM

JArguello-WMF updated the task description. (Show Details)

ROdonnell-WMF renamed this task from Parsing Sections to Parsing Sections - Migrate MR Section demo JSON to new MR API Prototype.Jul 19 2023, 2:31 PM

ROdonnell-WMF updated the task description. (Show Details)

ROdonnell-WMF edited projects, added Wikimedia Enterprise Engineering; removed Epic.Jul 19 2023, 3:01 PM

ROdonnell-WMF updated the task description. (Show Details)

SDelbecque-WMF renamed this task from Parsing Sections - Migrate MR Section demo JSON to new MR API Prototype to {Machine Readability} Parsing Sections - Migrate MR Section demo JSON to new MR API Prototype.Jul 27 2023, 1:25 PM

SDelbecque-WMF renamed this task from {Machine Readability} Parsing Sections - Migrate MR Section demo JSON to new MR API Prototype to {Machine Readability} Parsing Sections.Sep 7 2023, 1:56 PM

SDelbecque-WMF updated the task description. (Show Details)

Waiting on the ticket to move to the current Sprint (parser TXXX).

Apart from migrating "Section" logic, this ticket has more sub-tasks. I need clarification on specific coding tasks:

Is there a technical part to "First of all: investigate whether we need to include html (links, references) into sections or just headings and plain text."?
The dependency with "Credibility work", should this ticket add parsed references to structured-contents API, or is underway by Prabhat in other tickets?
What is the scope of work with "Investigate schema options"? What options are in and out of scope? What is the customer deliverable from this sub-task?

@prabhat for the "credibility work", there is a bit of overlap here with parsing Wikipedia reference links. Shall we do the reference parsing in this ticket? If we add the reference parsing to parser.go we can re-use it in the structured-contents API.

ROdonnell-WMF closed subtask T342261: Improve our plain text parsing of Wikipedia sections as Declined.Sep 11 2023, 11:18 AM

SDelbecque-WMF updated the task description. (Show Details)Sep 12 2023, 12:00 PM

SDelbecque-WMF updated the task description. (Show Details)

SDelbecque-WMF updated the task description. (Show Details)Sep 12 2023, 12:03 PM

SDelbecque-WMF mentioned this in T346247: {Machine Readability} Wikilink QIDs.Sep 13 2023, 1:57 PM

SDelbecque-WMF mentioned this in T346251: Investigate Parsing Sections .Sep 13 2023, 2:08 PM

SDelbecque-WMF added a subtask: T346251: Investigate Parsing Sections .

JArguello-WMF updated the task description. (Show Details)Sep 13 2023, 4:36 PM

ROdonnell-WMF mentioned this in T324671: {Machine Readability} Parsing Tables.Sep 18 2023, 6:20 PM

REsquito-WMF subscribed.Sep 19 2023, 9:55 AM

SDelbecque-WMF updated the task description. (Show Details)Sep 19 2023, 11:29 AM

SDelbecque-WMF updated the task description. (Show Details)

@ROdonnell-WMF do you have your answers already?

THe outstanding question is the credibility signals use of Reference links. Should we extract HTML references in the ticket or leave it for another ticket? The Section demo code does have reference extraction, so it's not much work to include it in the code migration ticket

We talked to @SDelbecque-WMF yesterday, my understanding is: if its low effort to add it, let's do it!

OK, I will include it

ROdonnell-WMF added a project: Epic.Sep 25 2023, 12:09 PM

@SDelbecque-WMF can you please update the description with the OKR thins one belongs to? Thanks!

JArguello-WMF closed subtask T346251: Investigate Parsing Sections as Resolved.Oct 10 2023, 4:33 PM

JArguello-WMF updated the task description. (Show Details)Oct 19 2023, 3:11 PM

JArguello-WMF added a parent task: T350102: OKR 4.3.Oct 31 2023, 6:42 AM

JArguello-WMF closed subtask T346674: Migrate Section Demo code to Structure Contents endpoint dev as Resolved.Nov 2 2023, 4:29 AM

JArguello-WMF closed subtask T346678: Deploy Sections to Structured Contents endpoint prod as Resolved.Nov 16 2023, 6:05 PM

ROdonnell-WMF added a subtask: T351592: [Defect] Sections parser shows Squirels paragraphs twice.Nov 18 2023, 9:52 PM

JArguello-WMF closed subtask T351592: [Defect] Sections parser shows Squirels paragraphs twice as Resolved.Dec 22 2023, 8:10 PM

JArguello-WMF moved this task from Roadmap (Initiatives Q4 FY23-24) to Machine Readability PB on the Wikimedia Enterprise board.Apr 9 2024, 2:54 PM