This thread will operate as a place to document progress in creating AuthorBot: A bot that automatically adds and corrects author names (and eventually author pages?) to scientific articles on Wikidata.
To see data regarding the most popular scientific databases on Wikidata, go here:
https://github.com/feliciss/wasian/blob/main/wasian/wikidata-database-analysis/graphs/
Some graphs of note:
{F35271389}
{F35271388}
{F35271387}
{F35271386}
Progress on AuthorBot:
- [X] **Bot queries scientific articles with missing author information:**
-- [X] Articles where P50 items are missing a P9687 or P9688 qualifier
-- [X] Articles with no P2093 and P50 items
-- **TO ATTEMPT LATER:**
--- [] Articles with only P2093 items that //could// have author name strings replaced with authors
--- [] Articles with P2093 items that have author initials that could be replaced by full author names (Articles with a PubMed ID good candidates for this)
- [] **Bot is able to obtain and parse citations from a variety of academic databases:**
-- [] PubMed ID (P698)
-- [] PubMed Central ID (P932)
-- [] Dimensions Publication ID (P6179)
-- [] CJFD Journal ID (P6769)
-- [X] ResearchGate ID (P5875)
-- [X] ADS Bibcode (P819)
-- Bonus Databases:
--- [] DBLP Publication ID (P8978)
--- [] arXiv ID
--- [] OpenCitations Bibliographic Resource ID (P3181)
--- [] JSTOR Article ID (P888)
- [] **Bot is able to automatically add author names to articles from a variety of academic databases**
-- [] PubMed ID (P698)
-- [] PubMed Central ID (P932)
-- [] Dimensions Publication ID (P6179)
-- [] CJFD Journal ID (P6769)
-- [X] ResearchGate ID (P5875)
-- [X] ADS Bibcode (P819)
-- Bonus Databases:
--- [] DBLP Publication ID (P8978)
--- [] arXiv ID
--- [] OpenCitations Bibliographic Resource ID (P3181)
--- [] JSTOR Article ID (P888)
- [] **Bot is able to automatically detect if an author page exists on Wikidata and add it instead of an author name string to scientific article pages on Wikidata**
- [] **Bot is able to automatically create author name pages from author name strings**
Some thoughts:
- Both PubMed IDs and CJFD IDs should be prioritised due to the sheer volume of articles with said IDs and capacity to parse Chinese names accurately, respectively.
-- On this: unsure how to get CJFD IDs without manual parsing. Deal with this later.