Page MenuHomePhabricator

[RFC] Server side Design (API/Database schema)
Open, Needs TriagePublic

Description

This page is to discuss the server-side design of scribe tool with respect to database schema and API calls.

Event Timeline

Hi @Hadyelsahar. The current non-existing task description makes is hard for others to help or contribute, and for a triager/tester to figure out at some point in the future whether there is still a valid task. Please consider checking the recommendations on how to write a clear task. Thanks!

Hey @Aklapper, this is just a placeholder we will fill in the details later today.

The current schema we have has the following:

table_name: edit

FieldTypeNullKeyDefaultExtra
idint(11)NOPRINULLauto_increment
article_nametextNONULL
section_labelvarchar(200)YESNULL
contenttextYESNULL
urltextYESNULL
domainvarchar(100)YESNULL
lang_codevarchar(3)YESNULL
datedateNONULL

table_name: section

FieldTypeNullKeyDefaultExtra
idint(11)NOPRINULLauto_increment
labelvarchar(200)NOUNINULL
article_nametextYESNULL
lang_codevarchar(3)YESNULL

table_name: article

idint(11)NOPRINULLauto_increment
nametextNONULL
wd_q_idvarchar(20)YESUNINULL
lang_codevarchar(3)YESNULL

@Hadyelsahar @Lucie This is the rough structure used at the moment. Could we add Wd_QID column to the section(If it will help with getting translations)?

This comment was removed by Hadyelsahar.

I discussed with Hady today, and we would suggest the following changes:

  • General the language code can be longer than three characters, e.g. zh-hans.

Article

idint(11)NOPRINULLauto_inctement
nametextNONULL
wd_q_idvarchar(20)NOUNINULL
lang_codevarchar(6)NONULL
domaintextYESNULL
tagtextYESNULL
retrieved_datedateNONULL

Section

idint(11)NOPRINULLauto_inctement
article_idint(11)NONULL
nametextNONULL
order_numberintNONULL
content_selection_methodtextYESNULL
retrieved_datedateNONULL

Reference

idint(11)NOPRINULLauto_inctement
article_idint(11)NONULL
section_idint(11)YESNULL
urltextNONULL
publisher_nametextYESNULL
publication_titletextYESNULL
summarytextYESNULL
publication_datedateYESNULL
retrieved_datedateNONULL
content_selection_methodtextYESNULL

Statistics

idint(11)NOPRINULLauto_inctement
article_idint(11)NONULL
createdbooleanNONULL
sandboxbooleanNONULL
timestampdateNONULL
references_usedarray of reference_idsNONULL
sections_usedarray of section_idsNONULL
mobilebooleanNONULL

I would suggest the amendments for the following reasons:
Article

  • we will start with articles of domains, so it will be good to keep track of the domains we categorized the articles in. We also plan to create a whitelist of references, which makes it easier to match them.
  • the tag can be a place to keep track when and how we decided to add the article, or e.g. if we just used Wikidata to retrieve missing articles
  • retrieved_date lets us keep track of when we added this entry to the database (the same for the following tables)

Section

  • article_id instead of article_name makes sure we use ids as foreign keys across tables, rather than text which can be ambiguous (especially as Wikidata labels don't have to be unique)
  • order_number helps to decide which order the sections should be displayed in the article
  • content_selection method lets us keep track of how we created those section titles and be able to add new content selection types and e.g. test them

Reference

  • This table should collect all references we suggest for the article, therefore I changed the name from edit (if I understood your edit table correctly)
  • All other fields are based on the content we need for references (such as publication_title and its date), the summary field can be multiple sentences, summarizing the reference
  • The section field can be a number if the reference summary relates for a certain section, or NULL if the reference can be used across sections

Statistics

  • If possible, we should write to this table if someone decides to publish an article created using Scribe so we can keep track of what works well and what doesn't.
  • references_used and sections_used should be a list of all references and sections used from the ones suggested respectively (with that we can also track which creation methods work best)
  • we should also track whether people publish directly as an article or in their sandbox (given we provide them with those options)
  • mobile should track if the article was edited and created on mobile or desktop (given we can access that information)