Page MenuHomePhabricator

Implement a SiteLookup based on a nested array structure.
Open, HighPublic

Description

As per T113034: RFC: Overhaul Interwiki map, unify with Sites and WikiMap, we want to move towards maintaining meta-information about other sites (aka interwiki info) in files, using the structure outlined in P3044. This structure would be stored in JSON or PHP files, and will be represented internally by nested arrays. The SiteLookup should:

  • provide access to the Site objects represented by nested arrays, see T135147: Make the domain model implemented by Site/SiteLookup/SiteStore more flexible
  • be able to load such data from JSON or PHP files (this could be in a separate class, if we want)
  • combine multiple such data-structures (deep-merge)
  • build indexes for efficient access by id or group. If such an index was already included in the data provided, it should be used.
NOTE: FileBasedSiteLookup exists. Perhaps it can be adopted. We would probably need compatibility code so it can continue to support the current/legacy JSON structure (see docs/sitescache.txt). However, the SiteLookup should perhaps not know about files at all, only about nested arrays.

The proposed structure, for reference:

1{
2 "enwiki": {
3 "type": "mediawiki",
4 "ids": {
5 "global": [ "enwiki", "some-old-alias" ],
6 "interwiki": "en",
7 "domain": "en.wikipedia.org"
8 },
9 "groups": {
10 "language": "en",
11 "size": "big",
12 "family": "wikipedia",
13 "db-cluster": "s1"
14 },
15 "paths": {
16 "article": "//en.wikipedia.org/wiki/$1",
17 "api": "https://en.wikipedia.org/w/api.php"
18 },
19 "props": {
20 "database": "enwiki",
21 "language": "en"
22 }
23 },
24 "enwiktionary": {
25 "type": "mediawiki",
26 "ids": {
27 "global": "enwiktionary",
28 "interwiki": [ "wiktionary", "wikt" ],
29 "domain": "en.wiktionary.org"
30 },
31 "groups": {
32 "language": "en",
33 "family": "wiktionary",
34 "db-cluster": "s2"
35 },
36 "paths": {
37 "article": "//en.wiktionary.org/wiki/$1",
38 "api": "https://en.wiktionary.org/w/api.php"
39 },
40 "props": {
41 "database": "enwiktionary",
42 "language": "en",
43 "capital-links": false
44 }
45 },
46 "commonswiki": {
47 "type": "mediawiki",
48 "ids": {
49 "global": [ "commonswiki", "commons" ],
50 "interwiki": [ "commons", "c" ],
51 "domain": "commons.wikimedia.org"
52 },
53 "groups": {
54 "language": "en",
55 "family": "commons",
56 "db-cluster": "s2"
57 },
58 "paths": {
59 "article": "//commons.wikimedia.org/wiki/$1",
60 "api": "https://commons.wikimedia.org/w/api.php"
61 },
62 "props": {
63 "database": "commonswiki",
64 "multilingual": true,
65 "transcludable": true
66 }
67 },
68 "bb": {
69 "type": "unknown",
70 "ids": {
71 "global": ["bb", "boingboing" ],
72 "interwiki": "bb",
73 "domain": "boingboing.net"
74 },
75 "groups": {
76 "language": "en"
77 },
78 "paths": {
79 "article": "https://boingboing.net/$1.html"
80 },
81 "props": {
82 "language": "en"
83 }
84 },
85 "_by_ids": {
86 "global": {
87 "some-old-alias": "enwiki",
88 "enwiki": "enwiki",
89 "enwiktionary": "enwiktionary",
90 "commonswiki": "commonswiki",
91 "commons": "commonswiki",
92 "bb": "bb",
93 "boingboing": "bb"
94 },
95 "interwiki": {
96 "en": "enwiki",
97 "wiktionary": "enwiktionary",
98 "wikt": "enwiktionary",
99 "c": "commonswiki",
100 "commons": "commonswiki",
101 "bb": "bb"
102 },
103 "domain": {
104 "en.wikipedia.org": "enwiki",
105 "en.wiktionary.org": "enwiktionary",
106 "commons.wikimedia.org": "commonswiki",
107 "boingboing.net": "bb"
108 }
109 },
110 "_by_groups": {
111 "language": {
112 "en": [ "enwiki", "enwiktionary", "bb" ],
113 "mul": [ "commonswiki" ]
114 },
115 "family": {
116 "wikipedia": [ "enwiki" ],
117 "wiktionary": [ "enwiktionary" ],
118 "commons": [ "commonswiki" ]
119 },
120 "db-cluster": {
121 "s1": [ "enwiki" ],
122 "s2": [ "enwiktionary", "commonswiki" ]
123 }
124 }
125}

Related Objects

Event Timeline

We should probably have some way to indicate that a mediawiki site is local or not

Krenair added a comment.EditedMay 12 2016, 5:02 PM

Or maybe that's just based on the presence of groups db-cluster or props database?

Why is language in both groups and props?

Also this would have to be per-wiki with the current format - en: is the interwiki prefix for enwiki on wikipedias, but on (for example) wikibooks it should point to enwikibooks

daniel updated the task description. (Show Details)May 12 2016, 5:08 PM
daniel removed daniel as the assignee of this task.Sep 19 2016, 2:00 PM
daniel claimed this task.Oct 31 2016, 4:22 PM
daniel added a project: Wikidata.