Page MenuHomePhabricator

What's in a name? - AuthorBot: Process and Progress
Closed, ResolvedPublic

Assigned To
Authored By
PangolinMexico
Jun 24 2022, 11:25 AM
Referenced Files
F35271386: chinese.png
Jun 24 2022, 11:25 AM
F35271387: german.png
Jun 24 2022, 11:25 AM
F35271388: global_top15.png
Jun 24 2022, 11:25 AM
F35271389: spanish.png
Jun 24 2022, 11:25 AM
Tokens
"Like" token, awarded by Feliciss.

Description

This thread will operate as a place to document progress in creating AuthorBot: A bot that automatically adds and corrects author names (and eventually author pages?) to scientific articles on Wikidata.

To see data regarding the most popular scientific databases on Wikidata, go here:
https://github.com/feliciss/wasian/blob/main/wasian/wikidata-database-analysis/graphs/
Some graphs of note:

global_top15.png (1×1 px, 54 KB)

spanish.png (600×1 px, 28 KB)

german.png (500×1 px, 29 KB)

chinese.png (600×1 px, 27 KB)

Progress on AuthorBot:

  • Bot queries scientific articles with missing author information:
    • Articles where P50 items are missing a P1932, P9687 or P9688 qualifier
    • Articles where P2093 items are missing a P9687 or P688 qualifier
    • Articles with no P2093 and P50 items
    • TO ATTEMPT LATER:
      • Articles with only P2093 items that could have author name strings replaced with authors
      • Articles with P2093 items that have author initials that could be replaced by full author names (Articles with a PubMed ID good candidates for this)
  • Bot is able to obtain and parse citations from a variety of academic databases:
    • PubMed ID (P698)
    • PubMed Central ID (P932)
    • Dimensions Publication ID (P6179)
    • CJFD Journal ID (P6769) - Very difficult. Attempt later.
    • ResearchGate ID (P5875)
    • ADS Bibcode (P819)
    • Bonus Databases:
      • DBLP Publication ID (P8978)
      • arXiv ID
      • OpenCitations Bibliographic Resource ID (P3181)
      • JSTOR Article ID (P888)
  • Bot is able to automatically add author names to articles from a variety of academic databases
    • PubMed ID (P698)
    • PubMed Central ID (P932)
    • Dimensions Publication ID (P6179)
    • CJFD Journal ID (P6769)
    • ResearchGate ID (P5875)
    • ADS Bibcode (P819)
    • Bonus Databases:
      • DBLP Publication ID (P8978)
      • arXiv ID
      • OpenCitations Bibliographic Resource ID (P3181)
      • JSTOR Article ID (P888)
  • Bot is able to automatically detect if an author page exists on Wikidata and add it instead of an author name string to scientific article pages on Wikidata
  • Bot is able to automatically create author name pages from author name strings

Some thoughts:

  • Both PubMed IDs and CJFD IDs should be prioritised due to the sheer volume of articles with said IDs and capacity to parse Chinese names accurately, respectively.
  • CFJD IDs are so horrible to deal with!

Event Timeline

I believe this task is related to your Outreachy project. Could you ensure any relevant information gets documented here https://www.mediawiki.org/wiki/Outreachy/Past_projects? Also, if there isn't anything else remaining in this task, please help close it and move any pending items to a separate one. TY!