Page MenuHomePhabricator

Gather data for Creative Commons annual report
Closed, ResolvedPublic

Description

Results

all as of 16 November 2016

  • Wikipedia: 42.5 million articles across all Wikipedia projects licensed as CC-BY-SA 3.0 (100% of Wikipedia articles)
  • Wikidata: 24.5 million structured data items licensed as CC-0 (100% of Wikidata items)
  • Wiktionary: 25.9 million dictionary entries across all Wiktionary projects licensed as CC-BY-SA 3.0 (100% of Wiktionary entries)

Wikimedia Commons

Wikimedia Commons has 29.4 million files with a Creative Commons license or public domain mark (84.8% of all files on Commons).

Breakdown by license

licensenumber of files
CC-BY4 168 445
CC-BY-NC5 052
CC-BY-NC-ND3
CC-BY-NC-SA232
CC-BY-ND11
CC-BY-SA20 919 205
CC-SA2 775
CC-01 261 209
CC-PD3 073 061
total29 429 993

Breakdown by file type

typenumber of files
(bitmap) image27 776 590
(vector) drawing712 458
audio668 761
video82 076
"scriptable multimedia"300
document189 786
total29 429 971

Queries

select database() as wiki, ss_good_articles as articles
from site_stats;
multiquery ~/queries/num_articles.sql -h analytics-store.eqiad.wmnet -d ~/dblists/wikipedias.tsv --defaults-file=/etc/mysql/conf.d/analytics-research-client.cnf > ~/data/wikipedia_articles.tsv
multiquery ~/queries/num_articles.sql -h analytics-store.eqiad.wmnet -d ~/dblists/wiktionaries.tsv --defaults-file=/etc/mysql/conf.d/analytics-research-client.cnf > ~/data/wiktionary_entries.tsv
select license, count(*) as files
from
(select
  case
    when cats like '%CC-BY-NC-ND%' then 'CC-BY-NC-ND'
    when cats like '%CC-BY-NC-SA%' then 'CC-BY-NC-SA'
    when cats like '%CC-BY-ND%' then 'CC-BY-ND'
    when cats like '%CC-BY-SA%' then 'CC-BY-SA'
    when cats like '%CC-BY-NC%' then 'CC-BY-NC'
    when cats like '%CC-BY%' then 'CC-BY'
    when cats like '%CC-SA%' then 'CC-SA'
    when cats like '%CC-Zero%' then 'CC-0'
    when cats like '%CC-PD%' then 'CC-PD'
    else 'Other CC'
  end as license
  from
  (select cl_from, group_concat(cl_to) as cats
    from commonswiki.categorylinks
    inner join
      (select cat_title from commonswiki.category where
        cat_title like 'CC-%' and
        cat_title not like '%aircraft%' and
        cat_title not regexp 'CC-[[:upper:][:digit:]]{3}'
      ) cc_cats
    on cat_title = cl_to
    where cl_type = "file"
    group by cl_from
  ) cc_files
) licenses
group by license;
select img_media_type, count(*) as files
from
(select distinct cl_from
  from commonswiki.categorylinks
  inner join
    (select cat_title from commonswiki.category where
      cat_title like 'CC-%' and
      cat_title not like '%aircraft%' and
      cat_title not regexp 'CC-[[:upper:][:digit:]]{3}'
    ) cc_cats
  on cat_title = cl_to
  where cl_type = "file"
) cc_files
inner join commonswiki.page on cl_from = page_id
inner join commonswiki.image on page_title = img_name
group by img_media_type;

Event Timeline

nshahquinn-wmf raised the priority of this task from Medium to Needs Triage.Nov 16 2016, 2:14 AM
nshahquinn-wmf moved this task from Next up to Neil's in progress on the Contributors-Analysis board.
nshahquinn-wmf updated the task description. (Show Details)
nshahquinn-wmf updated the task description. (Show Details)