Page MenuHomePhabricator

[RFC] Server side Design (API/Database schema)
Closed, ResolvedPublic

Description

This page is to discuss the server-side design of scribe tool with respect to database schema and API calls.

Event Timeline

Hi @Hadyelsahar. The current non-existing task description makes is hard for others to help or contribute, and for a triager/tester to figure out at some point in the future whether there is still a valid task. Please consider checking the recommendations on how to write a clear task. Thanks!

Hey @Aklapper, this is just a placeholder we will fill in the details later today.

The current schema we have has the following:

table_name: edit

FieldTypeNullKeyDefaultExtra
idint(11)NOPRINULLauto_increment
article_nametextNONULL
section_labelvarchar(200)YESNULL
contenttextYESNULL
urltextYESNULL
domainvarchar(100)YESNULL
lang_codevarchar(3)YESNULL
datedateNONULL

table_name: section

FieldTypeNullKeyDefaultExtra
idint(11)NOPRINULLauto_increment
labelvarchar(200)NOUNINULL
article_nametextYESNULL
lang_codevarchar(3)YESNULL

table_name: article

idint(11)NOPRINULLauto_increment
nametextNONULL
wd_q_idvarchar(20)YESUNINULL
lang_codevarchar(3)YESNULL

@Hadyelsahar @Lucie This is the rough structure used at the moment. Could we add Wd_QID column to the section(If it will help with getting translations)?

This comment was removed by Hadyelsahar.

I discussed with Hady today, and we would suggest the following changes:

  • General the language code can be longer than three characters, e.g. zh-hans.

Article

idint(11)NOPRINULLauto_inctement
nametextNONULL
wd_q_idvarchar(20)NOUNINULL
lang_codevarchar(6)NONULL
domaintextYESNULL
tagtextYESNULL
retrieved_datedateNONULL

Section

idint(11)NOPRINULLauto_inctement
article_idint(11)NONULL
nametextNONULL
order_numberintNONULL
content_selection_methodtextYESNULL
retrieved_datedateNONULL

Reference

idint(11)NOPRINULLauto_inctement
article_idint(11)NONULL
section_idint(11)YESNULL
urltextNONULL
publisher_nametextYESNULL
publication_titletextYESNULL
summarytextYESNULL
publication_datedateYESNULL
retrieved_datedateNONULL
content_selection_methodtextYESNULL

Statistics

idint(11)NOPRINULLauto_inctement
article_idint(11)NONULL
createdbooleanNONULL
sandboxbooleanNONULL
timestampdateNONULL
references_usedarray of reference_idsNONULL
sections_usedarray of section_idsNONULL
mobilebooleanNONULL

I would suggest the amendments for the following reasons:
Article

  • we will start with articles of domains, so it will be good to keep track of the domains we categorized the articles in. We also plan to create a whitelist of references, which makes it easier to match them.
  • the tag can be a place to keep track when and how we decided to add the article, or e.g. if we just used Wikidata to retrieve missing articles
  • retrieved_date lets us keep track of when we added this entry to the database (the same for the following tables)

Section

  • article_id instead of article_name makes sure we use ids as foreign keys across tables, rather than text which can be ambiguous (especially as Wikidata labels don't have to be unique)
  • order_number helps to decide which order the sections should be displayed in the article
  • content_selection method lets us keep track of how we created those section titles and be able to add new content selection types and e.g. test them

Reference

  • This table should collect all references we suggest for the article, therefore I changed the name from edit (if I understood your edit table correctly)
  • All other fields are based on the content we need for references (such as publication_title and its date), the summary field can be multiple sentences, summarizing the reference
  • The section field can be a number if the reference summary relates for a certain section, or NULL if the reference can be used across sections

Statistics

  • If possible, we should write to this table if someone decides to publish an article created using Scribe so we can keep track of what works well and what doesn't.
  • references_used and sections_used should be a list of all references and sections used from the ones suggested respectively (with that we can also track which creation methods work best)
  • we should also track whether people publish directly as an article or in their sandbox (given we provide them with those options)
  • mobile should track if the article was edited and created on mobile or desktop (given we can access that information)

@Lucie / @Frimelle: Hi! This task has been assigned to you a while ago. Could you maybe share an update? Do you still plan to work on this task?

If this task has been resolved in the meantime: Please update the task status (via Add Action...Change Status in the dropdown menu).
If this task is not resolved and only if you do not plan to work on this task anymore: Please consider removing yourself as assignee (via Add Action...Assign / Claim in the dropdown menu): That would allow others to work on this (in theory), as others won't think that someone is already working on this. Thanks! :)

Aklapper added a subscriber: Lucie.

Removing task assignee due to inactivity, as this open task has been assigned for more than two years. See the email sent to the task assignee on February 06th 2022 (and T295729).

Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.

If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".

Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.

Aklapper claimed this task.

Boldy closing as resolved as this task is in the Done workboard column