Move wm-bot instance to Trusty
Closed, ResolvedPublic

Description

With regard to T143349: Deprecate precise instances in Labs by 2017-03-31, wm-bot will be removed by the end of March. We need to move wm-bot's instance to Trusty.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 10 2017, 9:12 PM
Petrb added a comment.Feb 10 2017, 9:24 PM

OK, can we maybe create a new project for wm-bot so that we can finally nuke "bots" project? I know that some of you folks will again suggest to move it to tool labs, but that environment is just not flexible enough. It means massive refactoring of bot and also removal of many useful features.

So if possible, pls create a new project for wm-bot and we can create new server there.

I would support that change.

See T76375: New Labs project requests (tracking) for instructions to create a new project.

Paladox edited projects, added Tool-Labs; removed Labs.
Paladox added a subscriber: Paladox.
Restricted Application added a project: Labs. · View Herald TranscriptFeb 10 2017, 10:20 PM
Petrb added a subscriber: Andrew.Feb 16 2017, 2:03 PM

@Andrew is it possible to request some general purpose storage? We used to have /data/project for this in past, but this new project doesn't have this mount point and instances are created incredibly small. We need somewhere to store channel logs and Postgres

bd808 added a comment.Feb 17 2017, 4:20 PM

You can enable role::labs::lvm::srv on your instance and force a puppet run via sudo -i puppet agent --test --verbose. This will create a partition that fills the remainder of your instance's disk quota and mount it at /srv on the instance.

If that's not enough storage, we can help you attach to NFS as well which is what the old /data/project mounts were. Getting local instance disk beyond the default VM quotas requires @Andrew to create custom image sizes for you. That is possible, but we would need you to tell us how much storage you need and to make sure we have a labvert that is capable of providing that much space locally.

Petrb added a comment.Feb 18 2017, 7:19 PM

OK I think that for now it should be enough, maybe in the end, we don't need more than 20gb for these logs, I am just not sure about postgres.

The current real problem is that I can't get web proxy to work, I created in horizon: wm-bot2.wmflabs.org but it doesn't even resolve.

scfc added a subscriber: scfc.Feb 18 2017, 7:46 PM

[…]
The current real problem is that I can't get web proxy to work, I created in horizon: wm-bot2.wmflabs.org but it doesn't even resolve.

Negative DNS caching? Works for me:

[tim@passepartout ~]$ host wm-bot2.wmflabs.org
wm-bot2.wmflabs.org has address 208.80.155.156
[tim@passepartout ~]$
Petrb added a comment.Feb 19 2017, 6:37 PM

[…]
The current real problem is that I can't get web proxy to work, I created in horizon: wm-bot2.wmflabs.org but it doesn't even resolve.

Negative DNS caching? Works for me:

[tim@passepartout ~]$ host wm-bot2.wmflabs.org
wm-bot2.wmflabs.org has address 208.80.155.156
[tim@passepartout ~]$

It started resolving but still doesn't work, now I get 504 Gateway Time-out

@Petrb hi, what port are you using with the web proxy and have you enabled the port through the firewall?

Also why not move to debian jessie?

Petrb added a comment.Feb 19 2017, 6:40 PM

@Paladox I used image recommended by @Andrew you were right it was firewall blocking it

Petrb added a comment.Feb 19 2017, 6:41 PM

BTW I think 8 is Jessie

Ok thanks. Did enabling it in the firewall work?

Petrb added a comment.Feb 19 2017, 7:31 PM

Yes, bot is now up and running on Jessie, I need to test if web hooks work, which is now most important thing, then we need to migrate logs and SQL DB.

Please don't terminate old wm-bot instance yet, there may be something I forgot to copy from it.

Petrb added a comment.Feb 19 2017, 9:27 PM

There is one problem I need to run identd which requires public IP address, can we get 1 for wm-bot project? Previous bot also had it.

This comment was removed by Andrew.

Granted, as per T158520. lmk if you have trouble assigning it -- the horizon interface should be straightforward.

Is this the reason for "The requested URL /browser/index.php was not found on this server." at http://wm-bot.wmflabs.org/browser/index.php?display=%23wikimedia-collaboration , or should I file a separate task?

I think this is expected but in case it's not, no new content for #wikimedia-mobile has been logged for 10+ days: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-mobile/

Petrb added a comment.Wed, Mar 1, 10:57 AM

Hello, yes this is related to both issues, but wm-bot is still logging all channels, just the data are now stored somewhere else, I need to move the web services and merge the log files, should be done soonish

Petrb added a comment.Wed, Mar 1, 11:09 AM

Hi, browser is back, also please don't use old "bots.wmflabs.org" domain, it was deprecated some time ago, correct one is wm-bot.wmflabs.org

I can't guarantee there will be no more outages, 2 more instances need to be reinstalled

Thanks, I've asked for someone with channel permissions to update the URL to: https://wm-bot.wmflabs.org/logs/%23wikimedia-mobile/

Are there still pending tasks here, or is this resolved?

Hello,

wm-bot is depending on huggle-pg instance which was not yet migrated, it's not so easy, it's a postgres database, and it's pretty huge. Hopefully I will be able to resolve this this week, I am extremely busy these days

I mean, if huggle-pg is down, wm-bot itself will be still operational, but we lose access to SQL based IRC logs. We probably don't want that, the database is kinda useful and valuable, problem is that migrating to another server is complicated, it's a live SQL database to which new data are added every second, any outage even a second long is a problem here.

We can't reinstall, nor migrate it completely online, postgres isn't so advanced technology to do that, but still, I would like to have as small gap in IRC logs as possible, so whichever approach we take, it should that which takes smallest amount of time possible.

I would almost prefer doing a simple dist-upgrade, problem is that I already tried this on huggle instance, and ssh session timed out in progress and now I can't login back to instance because LDAP somehow broke on it. It's working, the service it provides is up and running but I can't ssh back to finish upgrade. I would rather avoid this problem with postgres instance

@Petrb what about migrating a postgress db to a mysql one? Will that work? mysql is advanced enough to support online migration as you won't need to take the db down.

I am not aware of any such feature and highly doubt it.

We got rid of mysql which was originally used, because of lack of many features that enterprise-grade rdbms like postgre or oracle provide, moving back to MySQL would be a massive step back.

also I don't see how you could migrate from postgres to mysql online, that also isn't possible, so this definitely would not work.

probably most easy way to do this would be to switch wm-bot to newly created DB on wm-bot-pg and then start migration of existing data, I will have a look into this later, but I am too busy now. I am not sure if this won't break the sequences though

The easiest and safest solution would probably be using something like Bucardo (live migration tutorial).

Continuous archiving or hot standby would probably be possibly be the best solution if you want/need a solution in PostgreSQL itself.

What about MariaDB?

Petrb added a comment.Mon, Mar 20, 1:58 PM

MariaDB is just as bad as MySQL,

anyway I've decided to take the approach of live update, it's most easy atm

Petrb closed this task as "Resolved".Mon, Mar 20, 2:35 PM

I just nuked wm-bot instance, in project bots. There is one more instance "botbot" that I will look in, maybe there is something I would like to archive for future, and then we can probably nuke whole "bots" project, which is by the way, probably second oldest project of wikimedia labs :)

Petrb added a comment.Mon, Mar 20, 2:35 PM

btw botbot is 14.04 so it doesn't block stuff