Page MenuHomePhabricator

Define configuration for ORES articletopic search
Closed, ResolvedPublic

Description

For the articletopic search, we'll need a table like

keywordORES name(s)Human name
economicsHistory_And_Society.Business and economicsEconomics
popular-cultureCulture.Internet culture, Culture.Language and literaturePopular culture

The keyword is what we use internally (e.g. in the API), and what you can use in the wiki's search field (in queries like dragon articletopic:film). Or that could be two different columns, if we feel the need from an UX point of view, but it seems unnecessary.
ORES names are the drafttopic keywords returned by ORES (which are basically wikiproject names) - those could be one-to-one or there could be multiple ORES categories for a keyword.

Event Timeline

ORES names are the drafttopic keywords returned by ORES (which are basically wikiproject names) - those could be one-to-one or there could be multiple ORES categories for a keyword.

@Halfak how will that work in other wikis? Will it use the same English topic names, or will those be wikiproject names in the local language? Or do you plan to switch to a different naming scheme on your side at some point, before deploying the non-English models?

The models use the exact same names in other wikis. There will need to be a localized mapping at the UI level.

From the spreadsheet in T244192#5850209:

1{
2 "africa": {
3 "group": "geography",
4 "oresTopics": [
5 "africa",
6 "central-africa",
7 "eastern-africa",
8 "northern-africa",
9 "southern-africa",
10 "western-africa"
11 ]
12 },
13 "architecture": {
14 "group": "culture",
15 "oresTopics": [
16 "architecture"
17 ]
18 },
19 "art": {
20 "group": "culture",
21 "oresTopics": [
22 "visual-arts"
23 ]
24 },
25 "asia": {
26 "group": "geography",
27 "oresTopics": [
28 "asia",
29 "central-asia",
30 "east-asia",
31 "south-asia",
32 "southeast-asia",
33 "west-asia"
34 ]
35 },
36 "biography": {
37 "group": "history-and-society",
38 "oresTopics": [
39 "biography"
40 ]
41 },
42 "biology": {
43 "group": "science-technology-and-math",
44 "oresTopics": [
45 "biology"
46 ]
47 },
48 "business-and-economics": {
49 "group": "history-and-society",
50 "oresTopics": [
51 "business-and-economics"
52 ]
53 },
54 "central-america": {
55 "group": "geography",
56 "oresTopics": [
57 "central-america"
58 ]
59 },
60 "chemistry": {
61 "group": "science-technology-and-math",
62 "oresTopics": [
63 "chemistry"
64 ]
65 },
66 "comics-and-anime": {
67 "group": "culture",
68 "oresTopics": [
69 "comics-and-anime"
70 ]
71 },
72 "computers-and-internet": {
73 "group": "science-technology-and-math",
74 "oresTopics": [
75 "internet-culture",
76 "software",
77 "computing"
78 ]
79 },
80 "earth-and-environment": {
81 "group": "science-technology-and-math",
82 "oresTopics": [
83 "geographical",
84 "earth-and-environment"
85 ]
86 },
87 "education": {
88 "group": "history-and-society",
89 "oresTopics": [
90 "education"
91 ]
92 },
93 "engineering": {
94 "group": "science-technology-and-math",
95 "oresTopics": [
96 "engineering"
97 ]
98 },
99 "entertainment": {
100 "group": "culture",
101 "oresTopics": [
102 "entertainment",
103 "radio"
104 ]
105 },
106 "europe": {
107 "group": "geography",
108 "oresTopics": [
109 "north-asia",
110 "eastern-europe",
111 "europe",
112 "northern-europe",
113 "southern-europe",
114 "western-europe"
115 ]
116 },
117 "fashion": {
118 "group": "culture",
119 "oresTopics": [
120 "fashion"
121 ]
122 },
123 "food-and-drink": {
124 "group": "history-and-society",
125 "oresTopics": [
126 "food-and-drink"
127 ]
128 },
129 "general-science": {
130 "group": "science-technology-and-math",
131 "oresTopics": [
132 "stem"
133 ]
134 },
135 "history": {
136 "group": "history-and-society",
137 "oresTopics": [
138 "history"
139 ]
140 },
141 "literature": {
142 "group": "culture",
143 "oresTopics": [
144 "literature",
145 "books"
146 ]
147 },
148 "mathematics": {
149 "group": "science-technology-and-math",
150 "oresTopics": [
151 "mathematics"
152 ]
153 },
154 "medicine-and-health": {
155 "group": "science-technology-and-math",
156 "oresTopics": [
157 "medicine-and-health"
158 ]
159 },
160 "military-and-warfare": {
161 "group": "history-and-society",
162 "oresTopics": [
163 "military-and-warfare"
164 ]
165 },
166 "music": {
167 "group": "culture",
168 "oresTopics": [
169 "music"
170 ]
171 },
172 "north-america": {
173 "group": "geography",
174 "oresTopics": [
175 "north-america"
176 ]
177 },
178 "oceania": {
179 "group": "geography",
180 "oresTopics": [
181 "oceania"
182 ]
183 },
184 "performing-arts": {
185 "group": "culture",
186 "oresTopics": [
187 "performing-arts"
188 ]
189 },
190 "philosophy-and-religion": {
191 "group": "history-and-society",
192 "oresTopics": [
193 "philosophy-and-religion"
194 ]
195 },
196 "physics": {
197 "group": "science-technology-and-math",
198 "oresTopics": [
199 "physics",
200 "space"
201 ]
202 },
203 "politics-and-government": {
204 "group": "history-and-society",
205 "oresTopics": [
206 "politics-and-government"
207 ]
208 },
209 "society": {
210 "group": "history-and-society",
211 "oresTopics": [
212 "society"
213 ]
214 },
215 "south-america": {
216 "group": "geography",
217 "oresTopics": [
218 "south-america"
219 ]
220 },
221 "sports": {
222 "group": "culture",
223 "oresTopics": [
224 "sports"
225 ]
226 },
227 "technology": {
228 "group": "science-technology-and-math",
229 "oresTopics": [
230 "technology"
231 ]
232 },
233 "transportation": {
234 "group": "history-and-society",
235 "oresTopics": [
236 "transportation"
237 ]
238 },
239 "tv-and-film": {
240 "group": "culture",
241 "oresTopics": [
242 "films",
243 "television"
244 ]
245 },
246 "video-games": {
247 "group": "culture",
248 "oresTopics": [
249 "video-games"
250 ]
251 },
252 "women": {
253 "group": "history-and-society",
254 "oresTopics": [
255 "women"
256 ]
257 }
258}

1{
2 "Culture.Biography.Biography*": "biography",
3 "Culture.Biography.Women": "women",
4 "Culture.Food and drink": "food-and-drink",
5 "Culture.Internet culture": "internet-culture",
6 "Culture.Linguistics": "linguistics",
7 "Culture.Literature": "literature",
8 "Culture.Media.Books": "books",
9 "Culture.Media.Entertainment": "entertainment",
10 "Culture.Media.Films": "films",
11 "Culture.Media.Media*": "media",
12 "Culture.Media.Music": "music",
13 "Culture.Media.Radio": "radio",
14 "Culture.Media.Software": "software",
15 "Culture.Media.Television": "television",
16 "Culture.Media.Video games": "video-games",
17 "Culture.Performing arts": "performing-arts",
18 "Culture.Philosophy and religion": "philosophy-and-religion",
19 "Culture.Sports": "sports",
20 "Culture.Visual arts.Architecture": "architecture",
21 "Culture.Visual arts.Comics and Anime": "comics-and-anime",
22 "Culture.Visual arts.Fashion": "fashion",
23 "Culture.Visual arts.Visual arts*": "visual-arts",
24 "Geography.Geographical": "geographical",
25 "Geography.Regions.Africa.Africa*": "africa",
26 "Geography.Regions.Africa.Central Africa": "central-africa",
27 "Geography.Regions.Africa.Eastern Africa": "eastern-africa",
28 "Geography.Regions.Africa.Northern Africa": "northern-africa",
29 "Geography.Regions.Africa.Southern Africa": "southern-africa",
30 "Geography.Regions.Africa.Western Africa": "western-africa",
31 "Geography.Regions.Americas.Central America": "central-america",
32 "Geography.Regions.Americas.North America": "north-america",
33 "Geography.Regions.Americas.South America": "south-america",
34 "Geography.Regions.Asia.Asia*": "asia",
35 "Geography.Regions.Asia.Central Asia": "central-asia",
36 "Geography.Regions.Asia.East Asia": "east-asia",
37 "Geography.Regions.Asia.North Asia": "north-asia",
38 "Geography.Regions.Asia.South Asia": "south-asia",
39 "Geography.Regions.Asia.Southeast Asia": "southeast-asia",
40 "Geography.Regions.Asia.West Asia": "west-asia",
41 "Geography.Regions.Europe.Eastern Europe": "eastern-europe",
42 "Geography.Regions.Europe.Europe*": "europe",
43 "Geography.Regions.Europe.Northern Europe": "northern-europe",
44 "Geography.Regions.Europe.Southern Europe": "southern-europe",
45 "Geography.Regions.Europe.Western Europe": "western-europe",
46 "Geography.Regions.Oceania": "oceania",
47 "History and Society.Business and economics": "business-and-economics",
48 "History and Society.Education": "education",
49 "History and Society.History": "history",
50 "History and Society.Military and warfare": "military-and-warfare",
51 "History and Society.Politics and government": "politics-and-government",
52 "History and Society.Society": "society",
53 "History and Society.Transportation": "transportation",
54 "STEM.Biology": "biology",
55 "STEM.Chemistry": "chemistry",
56 "STEM.Computing": "computing",
57 "STEM.Earth and environment": "earth-and-environment",
58 "STEM.Engineering": "engineering",
59 "STEM.Libraries & Information": "libraries-and-information",
60 "STEM.Mathematics": "mathematics",
61 "STEM.Medicine & Health": "medicine-and-health",
62 "STEM.Physics": "physics",
63 "STEM.STEM*": "stem",
64 "STEM.Space": "space",
65 "STEM.Technology": "technology"
66}

Tgr renamed this task from Define configuration for ORES drafttopic search to Define configuration for ORES articletopic search.Feb 14 2020, 12:54 AM
Tgr updated the task description. (Show Details)

Change 572135 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] [WIP] Add backend support for ORES topics

https://gerrit.wikimedia.org/r/572135

kostajh subscribed.

Updating status and assignee per the patch ^

Change 572135 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Add backend support for ORES topics

https://gerrit.wikimedia.org/r/572135

Moved the topic configuration to mw:MediaWiki:NewcomerTopicsOres.json (unlike our other configs, this one is not wiki-specific, at least for now).

Change 574633 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[operations/mediawiki-config@master] Add ORES topics related config for GrowthExperiments

https://gerrit.wikimedia.org/r/574633

Change 574633 merged by jenkins-bot:
[operations/mediawiki-config@master] Add ORES topics related config for GrowthExperiments

https://gerrit.wikimedia.org/r/574633

Mentioned in SAL (#wikimedia-operations) [2020-02-25T12:10:50Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: cdde3a2: db90d22 (T245525, T243359) (duration: 00m 58s)