Page MenuHomePhabricator

Run a full update of the monuments database
Closed, ResolvedPublic

Description

Now that the bot is fixed, we need to make sure the update pipeline works, and that the Monuments database is updated

Event Timeline

JeanFred claimed this task.
JeanFred raised the priority of this task from to Needs Triage.
JeanFred updated the task description. (Show Details)
JeanFred subscribed.

I think that Spanish Wikipedia hasn't been updated. You can see data from pt, gl and ca, but not es https://tools.wmflabs.org/wlm-maps/#7/41.034/-4.872

Yes, update is still in progress. Various countries have been processed for project es, but the monuments_all table will only get updated at the end of the full process. Will keep you posted!

@Emijrp Hmmm, so on one hand reprocessing mysteriously stalled, on the other hand it should have been fine. Can you check on your end ?

Mainland Spain is still empty https://tools.wmflabs.org/wlm-maps/#7/37.182/-4.982

What es: lists are you parsing? Perhaps the titles or the template schema changed?

What es: lists are you parsing? Perhaps the titles or the template schema changed?

Config is in monuments_config.py

'project' : u'wikipedia',
'lang' : u'es',
'headerTemplate' : u'Cabecera BIC',
'rowTemplate' : u'Fila BIC',
'commonsTemplate' : u'BIC',
'namespaces' : [104],

I’m rerunning the full processing again. Once this is done, I will focus on (es, es) to investigate further.

Ok, mystery solved:

370244:Working on countrycode "es" in language "es"
370245-ERROR: u'Namespace identifier(s) not recognised: 104'
370246-Unknown error occurred when processing country es in lang es

Looks like the same underlying error than T110420… Will have to investigate further...

It fails with 102, 104... special namespaces. It seems that you have to add them into the family file in families directory?

MariaDB [s51138__heritage_p]> SELECT COUNT(*) from monuments_all WHERE country='es' AND lang='es';
+----------+
| COUNT(*) |
+----------+
|      705 |
+----------+
1 row in set (0.00 sec)

So there *is* some data in the database for (es, es) (albeit not much). These should how up. Cursory looks show a lot of errors because no primary key is available.

Yay! Is the bot still running? I see some points in Spain map, but there are still many gaps. For example, this list is not parsed yet https://es.wikipedia.org/wiki/Anexo:Bienes_de_inter%C3%A9s_cultural_de_la_Comunidad_de_Madrid

Yay! Is the bot still running? I see some points in Spain map, but there are still many gaps. For example, this list is not parsed yet https://es.wikipedia.org/wiki/Anexo:Bienes_de_inter%C3%A9s_cultural_de_la_Comunidad_de_Madrid

Yes, the 705 are all that is parsed. I ran (es,es) manually and here is the dump : P1940.

MariaDB [s51138__heritage_p]> select count(*) from monuments_all where lang='es' and country='es';
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.00 sec)

Full update, including all additional tasks (until picture categorisation) has passed successfully. Yay! Closing as Resolved.