Page MenuHomePhabricator

Write Structured Data into Commons's filepages using upload2commons command
Closed, ResolvedPublic

Description

Purpose

Currently, the upload2commons command "just" uploads the file to (Beta) Commons with wikitext including templates and variables. But without Structured Data on Commons (SDC).

This command should create some SDC statements, and ideally do so in a way that will make adapting this command to the actual Wikimedia Commons easy.
Copilot will help.

Approach

Ground break:

  • Update upload2commons to push captured with (P4082) = Lingua Libre (Q60024037) into the file's SDC.

Expand:

  • Define the relevant properties to push as structured data
  • Update upload2commons so these properties to be written ...
    • write wikipage: to new Commons audio file
    • update: on existing Commons audio file
List of Lingua Libre properties

For references, these are the list of possible properties on Lingua Libre v2.

image.png (366×375 px, 41 KB)

Example:

Selection (ask Yug)

!IMPORTANT : Not completed. Ask Yug to review chat.wav above and others files (signed, etc) more thoroughly.

Properties to push to SDC :

  • captured with (P4082) = Lingua Libre (Q60024037)
  • copyright status (P6216) = [ public domain (Q19652) , copyrighted (Q50423863) ]
  • copyright license (P275) = [ Creative Commons CC0 License (Q6938433) , Creative Commons Attribution 4.0 International (Q20007257) , Creative Commons Attribution-ShareAlike 4.0 International (Q18199165) ] Note: ccby (Q6905323); ccbysa (Q6905942).
  • transliteration or transcription (P2440) = recording writing (string pronounced or signed)
  • language of work (P407) = recording's Wikidata language
  • recording date (P10135) = YYYY-MM-DD
  • recordist (P10893) =
    • Qualifier: Wikimedia username (P4174) = <username>
  • Lingua Libre ID (P10369) : audio Qid from Lingualibre
  • spoken by (P10894) =
    • Qualifier: Lingua Libre ID (P10369) : locutor Qid from Lingualibre
    • Qualifier: author name string (P2093) : <username string>

If items generated with Wikidata or Lexeme generators :

  • Wikidata Q identifier (Q43649390) = recording wikidata or Lexeme Qid

If a SPOKEN language:

  • instance of (P31) = pronunciation file (Q108167708)
  • audio (P51) = pronunciation file (Q108167708)
  • audio transcription (P9533) = recording writing (string pronounced)

If a SIGNED language:

  • instance of (P31) = video recording (Q34508)
Task update:

Please test directly on Commons (prod) with small amounts of files.

Event Timeline

Yug renamed this task from Create SDC statements with upload2commons command to Write Structured Data with upload2commons command.Sep 28 2024, 9:58 PM
Yug triaged this task as High priority.
Yug renamed this task from Write Structured Data with upload2commons command to Write Structured Data into Commons's filepages using upload2commons command.Apr 9 2025, 2:26 PM
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)

Hello @Pushkar7077 , according to @0x010C, for upload2commons.py it is best to not reuse his Pywikibot migration code. He encougates you to rather to directly use the Commons API with your oath token.
As for the general outline he pointed out the following :

Using import requests's module and your Oauth 2.0 token, use the Commons API request : actionget type meta pageid, build the media's id (var Mid= "M"+pageid`), then an API request action=wbentities and pass an object with the { claims, claims, claims }.

For a representative kick starter, you may ask Gemini or other :

Using python 3, `import requests`, Wikimedia Commons API, the oauth token and a filename, how to fetch the file media id, then edit the structured data claims ? Occasionally, the claims may already exists.
Audio transcription (P9533) :  
	<string of the word>
	language of work: <language wikidata qid>
Creation date (P10135):		
	27 August 2018
	Timestamp	+2018-08-27T00:00:00Z
	Timezone	+00:00
	Calendar	Gregorian
	Precision	1 day
	Before	0
	After	0
Recordist (P10893):
	Wikimedia username (P4174): Davidgrosclaude
Spoken by (P10894):
	Lingua Libre ID (P10369) = Q1976
Yug closed this task as Resolved.EditedJul 1 2025, 5:17 PM
Yug claimed this task.

Resolved on local instance a few days back, see structured of this page