Page MenuHomePhabricator

Investigation: large scale testing of leads
Closed, ResolvedPublic13 Estimated Story Points

Description

We want to be able to easily point to or extract leads from the Wiki dataset.

  • figure out if structured page and leads are considered same effort or be treated separately (see below modelling)

To-Do

  • figure out the datasets for testing
  • figure out what to test against
  • figure out how to present test results (visual)
  • create ticket(s)

Event Timeline

Protsack.stephan raised the priority of this task from Medium to High.Nov 2 2022, 7:06 PM
AnnaMikla changed the task status from Open to In Progress.Nov 3 2022, 2:41 PM

Summary:

  • Which datasets be can uses to test out lead extraction?
    • give the ability to parse selected articles across projects (custom provided datatset)
    • give the ability to parse entire category (ability set depth limit, for example max 100 articles)
    • give the ability to parse entire project if needed (ability set depth limit, for example max 100 articles)
  • Testing against existing endpoints?
    • testing against summary looks like a good idea, in the beginning just figure out the way to put both of results back to back for human review
  • Modelling: packaging: separate endpoint or fit it current APIs, repeat this exercise for all APIs?
    • I think it should be a separate API endpoint and a separate workflow, something like /v2/*-article where we replace * with something, main reason here is that this will expand our payload a lot and conceptually different enough
  • Modelling: schema: is schema.org enough or do we need to expand?
    • we will need to expand on the schema in the future, but we are more than equipped to start with what schema.org currently provides
Daria_Kevana changed the task status from In Progress to Open.Nov 22 2022, 5:28 PM