Page MenuHomePhabricator

Wiktionary Cognate Dashboard not updated
Open, MediumPublic

Description

New occurrence of this issue reported on Jully 3td, the current timestamp on https://wiktionary-analytics.wmcloud.org/Wiktionary_CognateDashboard/ indicates "last updated on: 2022-06-16" which means that the dashboard has not been updated for 2 weeks.

Event Timeline

GoranSMilovanovic renamed this task from Wiktionary Cognate Dashboard not update to Wiktionary Cognate Dashboard not updated.May 4 2020, 5:27 PM
GoranSMilovanovic moved this task from Wiktionary to Prioritized on the User-GoranSMilovanovic board.

Here it is:

Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
  duplicate 'row.names' are not allowed
Calls: rownames<- ... row.names<-.tbl_df -> NextMethod -> row.names<-.data.frame
In addition: Warning messages:
1: Setting row names on a tibble is deprecated. 
2: non-unique value when setting 'row.names': ‘character(0)’ 
Execution halted

Inspecting now.

Probable cause of the update failure: Wiktionary_CognateDashboard_UpdateProduction.R cannot find Wiktionaries encoded as

-210957740613563972

and

4182753792216835591

in the cgpa_title field of the cognate_wiktionary.cognate_pages in the cognate_wiktionary.cognate_sites table.

@Addshore You might be interested to take a look at this?

Intermediary fix applied; running the update cycle manually now; monitoring.

Update Mon May 4 19:52:42 UTC 2020:

  • Manual back-end update completed;
  • waiting for the Dashboard update to fetch new data;
  • monitoring.

Update Tue May 5 08:22:31 UTC 2020

  • dashboard slow to pick-up the changes;
  • change public path to: /srv/published/datasets/...;
  • monitoring.

Update Tue May 5 15:07 CET 2020:

  • the dashboard update stamp is not updated yet: Last updated on: 2020-04-08 07:26:18 UTC;
  • the public datasets timestamp, however, is updated;
  • known problems with {curl} from R; inspecting now;
  • the update cycle is stable, checked.

Update Tue May 5 20:47 CET 2020:

  • current dashboard update timestamp is now matched correctly: Last updated on: 2020-05-05 07:27:02 UTC
  • Q: Why does it take so much time for our curl calls from CloudVPS to grab the updated datasets and the update timestamp?

Anyways, the updates are back. Thanks @Lea_Lacroix_WMDE and the anonymous volunteer for notifying me on this.

Status: monitoring the dashboard updates for a day or two, closing the ticket if no problems occur in the meantime.

GoranSMilovanovic lowered the priority of this task from High to Medium.May 5 2020, 6:50 PM
GoranSMilovanovic claimed this task.

Conclusion:

  • the dashboard update procedure is fixed to guard against the possible inconsistencies between cognate_wiktionary.cognate_pages and cognate_wiktionary.cognate_sites tables;
  • the data acquisition works and is delivered daily on a regular schedule from stat1007;
  • the dashboard itself, hosted in CloudVPS, is somewhat slow (i.e. matter of hours) to pick-up the latest daily update - but this is a problem and needs to be handled separately.

Closing the ticket.

Lea_Lacroix_WMDE updated the task description. (Show Details)
Lea_Lacroix_WMDE added a subscriber: Otourly.

@Lea_Lacroix_WMDE Status:

  • The Wiktionary Cognate Dashboard update was restarted manually from the CloudVPS instance.
  • The updated data should be available from the dashboard in an hour or so, maybe earlier.
  • Monitoring.

@Lea_Lacroix_WMDE

You are welcome!

The curren update is now in place.

I will keep the ticket opened until I am sure that the updating procedure is running smoothly.

The updates are now all in place.

Otourly updated the task description. (Show Details)

The issue is most probably related to some R internal memory allocation problems/constraints on the stat1007 analytics client.

  • Running a manual update now;
  • Monitoring.

Possible action: migrate the update engine to stat1005 or stat1008 (more resourceful than stat1007).

  • Migrating the update engine to stat1005 definitely now (only 64Gb RAM on stat1007; processes killed).
  • Manual update from stat1008 completed;
  • The dashboard should be able to pick the results in the following hour or so; monitoring;
  • next step: installing crontab from stat1008; removing the update engine from stat1007.

Unfortunately, the problem is not related merely to the update engine; the following was run from the Wiktionary Cognate Dashboard's running docker container sudo docker-compose exec wiktionarycognate sh:

# cat nohup.out

Attaching package: ‘curl’

The following object is masked from ‘package:httr’:

    handle_reset

Error in curl_fetch_memory(URL, handle = h) : 
  Could not resolve host: analytics.wikimedia.org
Execution halted

So, for whatever reason, our https://analytics.wikimedia.org/published/datasets/wmde-analytics-engineering/Wiktionary/ does not seem to resolve? Strange.

Inspecting the issue; contacting the CloudVPS team in case this proves to be too mysterious. The dashboard side update (an hourly run daemon from the dashboard's container) has been running smoothly for years before this unlikely occurrence.

Until this issue is resolved, the dashboard will not be updated and will continue to fall back to its test datasets.

@Otourly Thanks for checking this out and catching the issue in the first place.

I don't know what exactly to think of the curl/Docker related problem.
Let's please keep this ticket open for a while and I will monitor the dashboard to see if the problem reappears.

Unfortunately, the dashboard did not update since Last updated on: 2021-07-26 21:48:24 UTC:

  • inspecting the issue now,
  • in case the curl related problem from CloudVPS persists I will be getting in touch with the relevant team (following my consulations with the WMDE Devops Guild on Mattermost who said: it is simply strange).

Edit. The problem seems to be related to an erroneous crontab setup on stat1008, however; fixed, let's wait and see what happens in the next daily update which is scheduled for tomorrow 06:00 UTC.

@Otourly

  • The dashboard is updated now;
  • I cannot see a reason why it should not update daily in the future, as expected;
  • please re-open this ticket if this happens again (it is a quite complex update, so I expect that something might go wrong in the future, but not too frequently),
  • and once again, thank you for catching this!
Otourly updated the task description. (Show Details)

@Tobi_WMDE_SW @Addshore @Manuel

Upon trying to connect to the wiktionary-cognate.wmde-dashboards Cloud VPS instance to fix this:

ssh wiktionary-cognate.wmde-dashboards.eqiad1.wikimedia.cloud

the following happened (verbose reduced):

Host key for secondary.bastion.wmflabs.org has changed and you have requested strict checking.
Host key verification failed.
kex_exchange_identification: Connection closed by remote host
Connection closed by UNKNOWN port 65535

Please advise. Thank you.

P.S. The same happens for ssh wikidata-analytics.wmde-dashboards.eqiad1.wikimedia.cloud - even more critical.

Hi @ItamarWMDE, could you please have a look at this?

Thank you @Manuel In the meantime I have followed @Tobi_WMDE_SW advice to clean up known hosts, and did

ssh-keygen -R wiktionary-cognate.wmde-dashboards.eqiad1.wikimedia.cloud

on Mac OS, but nothing changed. This is what worries me:

Host key for secondary.bastion.wmflabs.org has changed and you have requested strict checking.
Host key verification failed.
kex_exchange_identification: Connection closed by remote host

The key for secondary.bastion.wmflabs.org has changed?

Hmm, the odd thing - I tried it, and I'm getting the same error. I'll try to ask around and see what gives.

@GoranSMilovanovic after asking around, apparently the keys for bastion did indeed change on March 21st. See announcement here: https://lists.wikimedia.org/hyperkitty/list/cloud-announce@lists.wikimedia.org/thread/3L2BOWRQP4VLADXMNFUL4B3WUOFKLCOY/

Let me know if you need additional help replacing the key / fingerprint.

@ItamarWMDE Thank you. I read some suggestions on how to replace the existing fingertip, but I am really clumsy when it comes to things like this. Since we have a 1:1 scheduled for tomorrow 17:30 CET, I suggest you help me replace the fingertip in my existing SSH settings, and then we proceed to install the Wiktionary Cognate Dashboard in a new Cloud VPS project, all steps: spinning up a new virtual instance, setting up its proxy, copying the code, building the images, and deploying.

@ItamarWMDE @Tobi_WMDE_SW @Manuel

Also in relation to T301380:

  • I have spawned one g3.cores4.ram8.disk20 instance in the wmdeanalytics CloudVPS project; the instance name is wiktionary-cognate-1 (to make it different from the existing wiktionary-cognate instance in the wmde-dashboards project); please see screenshot attached:

wiktionary-cognate-1-cloudvps.png (1×2 px, 335 KB)

  • following this, I have tried to
ssh wiktionary-cognate-1.wmdeanalytics.eqiad1.wikimedia.cloud

and the response was:

Enter passphrase for key '/Users/goransm/.ssh/leadwikilabs': 
goransm@wiktionary-cognate-1.wmdeanalytics.eqiad1.wikimedia.cloud: Permission denied (publickey)

following my (certainly correct) entry of the key passphrase.

I have replicated the same problem within ten minutes of my initial attempt to ssh to the instance.

I think we might need to troubleshoot your ssh configuration, or your bastion proxy setup. Let's use the first part of our meeting tomorrow to do just that. It's another delay, but I don't see how we can continue without this.

@ItamarWMDE You know what I like the most? Problems that solve themselves spontaneously in time without human intervention. Namely, now I can ssh into the new CloudVPS instance, so tomorrow we can focus on the deployment of the Wiktionary Cognate Dashboard.

I will install Docker and Docker Compose there.

I really don’t know what it was done, but the update is still missing :(

We are working on it, don't worry. The whole Wiktionary Cognate Dashboard is undergoing a thorough change in the way it is being deployed. Next week, I predict, the system will be back online.

@Otourly

The Wiktionary Cognate dashboard is back and running from the following URL: https://wiktionary-analytics.wmcloud.org/

It still needs to update its datasets (the current timestamp is April 20), that will happen in the following hours.

Thank you for your patience.

@Tobi_WMDE_SW @Manuel Here is a new problem that causes this dashboard not to update since April 20, from the dashboards' ETL/processing engine running on the stat1008 Analytics Client (crontab, every day at 06:00):

[1] "Initiate Cognate Wiktionary Dashboard update on: 2022-04-25 06:00:01"
[1] "Export cognate_wiktionary.cognate_pages now."
ERROR 1044 (42000): Access denied for user 'research'@'10.%' to database 'cognate_wiktionary'
[1] "Export cognate_wiktionary.cognate_sites now."
ERROR 1044 (42000): Access denied for user 'research'@'10.%' to database 'cognate_wiktionary'
[1] "Export cognate_wiktionary.cognate_titles now."
ERROR 1044 (42000): Access denied for user 'research'@'10.%' to database 'cognate_wiktionary'

This means that some credentials for the SQL cognate_wiktionary database has changed in the meantime.

Please advise. Again, the dashboard's update runs from crontab, stat1008 (and since 2018, without a problem):

0 6 * * * export USER=goransm && nice -10 Rscript /home/goransm/Analytics/Wiktionary/Wiktionary_CognateDashboard/Wiktionary_CognateDashboard_UpdateLOG.log 2>&1

@Tobi_WMDE_SW @Manuel @Otourly

Whatever cause the problem in this dashboard's update engine described in T251792#7881692 has magically fixed itself.

I have checked the dashboard right now and the update timestamp says Last updated on: 2022-05-06 07:59:37 UTC.

I guess we close this ticket now.

Seems to work now! Thank you all! \o/

Otourly updated the task description. (Show Details)

@Otourly I am travelling and will be able to take a look at this after July 11.

@GoranSMilovanovic I hope your travel was great, did you find time to check this issue ?

Thank you @Otourly for reporting this! The new URL is https://wiktionary-analytics.wmcloud.org/app/Wiktionary_CognateDashboard and I am since working on getting this redirected automatically. Still, the last update seems to be 2022-06-16, so we need to look into what is the problem here.

@ItamarWMDE is this something we are working on already or is this new?

Removing task assignee due to inactivity as this open task has been assigned for more than two years. See the email sent to the task assignee on August 22nd, 2022.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome!
If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!