Page MenuHomePhabricator

Section Identification & Section Topics
Open, HighPublic

Description

Request Status: New Request; FY23
Request Type: project support request
Related OKRs: Product Platform O: Our platform and processes are ready and able to invite all of the world's population to join us. KR 1: Machines are able to recognize Wikimedia content and suggest relations to other Wikimedia content in at least 2 wikis, enabling experimentation with at least two new strategic features.

Request Title: Section Identification & Section Topics

  • Request Description: Identify the start and end of text that is considered a section in an article; Using bluelinks, create topics for the sections in an article
  • Indicate Priority Level: High
  • Main Requestors: Structured Data Team (@SWakiyama & @CBogen )
  • Ideal Delivery Date: Q3
  • Stakeholders: Structured Data team, Growth team, interested community members

Request Documentation

Document TypeRequired?Document/Link
Related PHAB TicketsYes<add link here>
Product One PagerYes<add link here>
Product Requirements Document (PRD)Yeshttps://docs.google.com/document/d/1OP8Rja4Mjli9SN9XwJkNsEo3SU6g3kbCGeWTLWAnDVI/edit?usp=sharing
Product RoadmapNo<add link here>
Product Planning/Business CaseNo<add link here>
Product BriefNo<add link here>
Other LinksNo<add links here>

Event Timeline

Arslan set Due Date to May 2 2022, 12:00 AM.May 2 2022, 3:16 AM

Summary as of May 10th

Sections

  • Section work to start in July
  • Once requirements are understood by the Product teams, Data Platform will engage with SD to work on a high level flows that will determine the necessary work to onboard the project
  • Likely too early to think about using a stream approach but we can discuss -> see Shared Event Platform experiment here

Image Suggestions for Sections

  • Image suggestions for sections to start in October
  • Depends on Sections implementation
  • Likely that this will follow a very similar pattern to article level image suggestions (we will need to have both pipelines running)
  • Data Platform work expected:
    • onboard Airflow job
    • new schema (similar to existing article image suggestions one but has section added to key)
    • new endpoint for data gateway
    • new feedback topic or just add section to existing?

For related context, Section Translation is focused on supporting translation at the article section level, and there has been some work to try to figure out equivalent sections for an article in two different languages (API doc and example result). Some recent work by @diego
and @MunizaA is described in T293511 expanded the approach.

I'm commenting this since the Language team is interested in more reliable ways to identify sections that cover the same aspects across languages. Also some of the work done so far may be useful to other teams interested in this area.

@lbowmaker: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

lbowmaker removed Due Date which was set to May 2 2022, 12:00 AM.Jul 19 2022, 3:59 PM

@lbowmaker can you please add WMF Language team as a stakeholder for this ticket as well please? It is of interest for the Section Translation project - details in comment from Pau.

@Arrbee , the Section-Topics data pipeline is currently in active development. Feel free to subscribe to T311745: [EPIC] Section topics data pipeline if you want more detailed progress. The pipeline will include extracted section titles.

Removing inactive task assignee who left WMF ages ago.
Please do {re/un}assign tasks as part of offboarding.