May 21 2020
2 months later and no response on this ticket. :(
Feb 13 2020
I'm using /data/project and the fact that I have shared home directories (and I'm accessing replica dbs). That's it (no dumps, no scratch).
Will do once I get to a computer later today.
Dec 25 2019
Ah, stop, it seems to have booted through and is back online.
Hm, I soft rebooted and I'm getting that dreaded A stop job is running for ... message :-(
I cannot ssh in.
Ugh, Merry Christmas :-)
Dec 19 2019
Just as a quick comment, I think multires tiles are absolutely necessary for the stitched panos that we have (>100 Megapixel). I would suggest not even thinking about a solution that does not permit this.
Nov 29 2019
Yes, sorry, that's the one. I could have sworn with over 40 comments the link would be on one of them :-)
The panorama viewer described in this ticket exists on tools cloud. I guess we can close this ticket and open new tickets for bugs and improvements.
Apr 1 2019
@Andrew , Yes, I am planning to update my code to use non-dot subdomains. But I'm waiting until the dash issue is resolved.
And the dash? (anyhow, just delete all wma proxies containing a dot at your convenience, please)
@bd808 horizon does not let you use a dash either!
Ugh, this is borked. I cannot delete the proxy entries with the . in the name. I get You have selected: . Please confirm your selection. This action cannot be undone. and after pressing Delete the entry is still there.
:-/ I need to delete/re-add proxies to point to the new instance. I have proyx urls like 1.wma.wmflabs.org and label.wma.wmflabs.org. But the new horizon interface wont let me add a two-level subdomain like label.wma (tells me to specify a name without dots in it).
maps-wma1 can be deleted. There is a new maps-wma instance. I still have to switch over some webproxies. But I'll do that today. I ported my tile render code to mapnik 3, but still have to do some minor things like converting my upstart scripts to systemd. This will only affect rendering of new tiles - but most of the world is already cached by now.
Mar 24 2019
I'm using /data/projects for convenience.
Mar 20 2019
A recompile was necessary after the trusty to stretch migration. The tile serving for existing images does work again now. I'm not sure about tile generation for new images (I'm currently on spring break with my family and have limited time to debug this). It should work, provides VIPS is installed on the stretch grid instances as well. Let me know if you encounter a newly uploaded image where the zoomviewer does not work.
Feb 11 2019
Ok, will do that later. No access from work.
Dec 14 2018
I'm working on redoing the maps-wma1 instance as maps-wma. This involves a region change and as a consequecnce it seems the /mnt/nfs/labstore1003-maps directory, which contains my home directory on the old instance is empty on the new instance. Same goes for the project directory. Will I have to copy everything over? Why is there no home dir?
Nov 25 2018
Yes, please go ahead. I'm on travel right now and cannot do it myself.
Nov 11 2018
Hm, my log now shows only successful DB queries. Now that I've made my code a bit more failure resistant I think we can just close the ticket for now. If this becomes a problem again I'll open a new ticket. It seems obvious to me that the query killer is not the issue here. The queries that succeed are all in the 25-40min range. Sorry for the disturbance.
Nov 10 2018
Ok, here is the log from the past 30h
Nov 8 2018
Dang, that test run succeeded (took 27min 3sec). Let me get back to this ticket tomorrow after a few updates ran. Let's see if I get any failures.
ok, let me add this and manually trigger a run...
Sep 25 2018
Alright, all instances rebuilt using Debian 9.5 (how long will that one be good for?)
Sep 20 2018
Let me take a look. I should rebuild these instances with a different memory size anyways. That service craps out once an hour :-/
Sep 18 2018
Fastcci is already running xenial. I updated it in place several months ago.
Sep 17 2018
See https://phabricator.wikimedia.org/T143349 (the instance was upgraded to xenial)
maps-wma1.maps.eqiad.wmflabs was upgraded in place long ago! Please remove it from your list.
Aug 6 2018
Hey all. yes, WMA uses lat lon (plate carree) projection. Tile services are an important part, but people are painfully unaware that tiles are just a small part of the WMA. It had 3D buildings, article markers with text labels, article summaries on hover, area comparison by reprojection, entity highlighting (area shading for the article subject), it showed all coordinates from teh current article (not just the main coordinate - nice in list articles). WMA had support for globes besides earth (moon, mars, mecury, io, etc.), it had client side rendered tiles at high zoom levels, it has user interface translations and article labels from several dozen projects (including thumbnails from commons).
May 18 2018
I'm looking into this. I can purge the cache, but there seems to be a fundamental problem that allows these files to get corrupted again and again.
Mar 2 2018
Looks like it was just cache corruption. This is weird; I don't know how it could happen short of actual filesystem corruption. I deleted the cache files and regenerated them. Looks fine now. I could add a "force purge" option, but I'm a little worried that could be abused as a DOS attack vector.
Yikes, that looks very messed up. I can rebuild the stack on labs and see if that fixes it.
Feb 3 2018
All righty! I deployed a new version that uses jsub to deploy the processing tasks on the grid. Unfortunately the -once parameter is still unreliable so I might have to add my own locking if it turns out to be a problem.
Feb 1 2018
Hello @chasemp as a matter of fact I can. I had written code for the panoramic image viewer reprojection that utilizes the grid. I should be able to apply the same to the zoomviewer. I'll work on it over the weekend if that's fine.
Dec 28 2017
@Dispenser, backup is fine, but have you thought about making GHEL work with the new setup?
Dec 14 2017
Death blow for GHEL coordinate extraction and WikiMiniAtlas. 🙁
Sep 19 2017
I have written a script that checks the tif file integrity (using imagemagick). It has already weeded out dozens of broken files (including the Napoleon). I will put that into the crontab to run weekly.
Sep 18 2017
@Shonagon, thanks for the heads up on the Napoleon image. Let me see if I can identify broken images and purge them automatically. I should probably use the method I developed for tiled 360 degree panoramics (image processing on the grid infrastructure) for the Zoomviewer, too. I think that would make it more robust.
I have deleted the cache file. Seems to work now.
Sep 14 2017
Which IE version are you using?
Sigh, zoomviewer used to be a _lot_ faster. I wonder what changed. I'll follow up on this. I see that somebody is trying to pull up the ordnance map which still needs to be preprocessed (the script should do that automatically). Let me know how that works.
Ok, running again. The webservice was hung:
Alright, I'll take a look. On mobile right now.
Please create a new task for this. A constantly open "Zoomviewer is down" task is misleading and counterproductive. You are welcome to reopen this when the zoomviewer is down again and needs attention from me (and has not been taken over by WMF yet).
I'm not away.
Sep 13 2017
Aug 29 2017
I merged and deployed the patch by @TheDJ , thanks!
Jul 7 2017
Lame. The webservice was down. A simple webservice start brought it back up. Do I really have to put in a cronjob that kick the service once in a while?
Ok, noted. I'll investigate.
Jun 23 2017
but at least @dschwen can comment directly in this task now
Mar 20 2017
Andrew, I managed to get my existing VM running again. You can lower the quota again.
Mar 17 2017
Ok, up till now I had no pressure to get on horizon, but I need to rebuild an instance now, and being unable to log in is becoming a major showstopper for me now. I'd really appreciate some help on this.
Mar 16 2017
Yeah, no change
I'll try that, but I seriously doubt this is the issue here. I use GA for a whole bunch of services and horizon is the only one that gives me grief. (and compared to https://time.is/ my phone is within one second)
Yes, it is still happening.
Ok, I think I'm back in business! Testing a bit more now.
Ok, next issue is that suddenly the column the_geom does not exist anymore in the land_polygons and coastlines tables....
Looks like the OSM data uses SRID 3857 and I compare to a Bounding Box with SRID 900913
Sure the query is here: https://github.com/dschwen/wikiminiatlas/blob/master/tiles/jsontile.php#L115
After fixing those to ST_SetSRID my PostGIS query now fails with
Nope. My stuff fails now with:
Feb 23 2017
@chasemp yes a few days downtime should be OK. I have a cache layer that should serve most of the requests.
Yes, it is used by me! I'm pulling data from that server for the client-side rendered tiles and 3D buildings in WikiMiniAtlas.
Feb 20 2017
Yes, shut down now, delete later.
Looking closer at the apache2 config it looks liek this is a debug/experimental server
Hmmm, Yeah. I did /etc/init.d/renderd stop and apache2ctl stop and the OSM widget tile service on de.WP still functioned just as good/bad (lots of 404 at high zooms either way).
Ok, there is an apache2 with modtile, a renderd, and a (small 34MB)_ postgres 9.1 database running on there.
modtile and renderd are installed from http://ppa.launchpad.net/kakrueger/osm-unstable/ubuntu/
For modtile and renderd there are trusty packages on that PPA.
I personally am not using this instance. It might be in use live for the OSM gadget on the german Wikipedia (does https://tiles.wmflabs.org/ point to that machine?)
If nobody else steps up I could try to log in and see if do-release-upgrade works there, too (it did for my instances). This will most likely require a rebuild of the map stack on that machine, though. But if the machine is lost otherwise it might just be worth the risk.
Feb 13 2017
Yeah, the pruning confused the system. I'll try to fix this.
Hey all! The _corruption_ i.e. missing tifs is probably a result of a requested cache purge that I performed a while ago. I'll take a look.
Feb 2 2017
Done. I put the find command into a small script and added it to the crontab (via jsub).
Jan 10 2017
@Andrew I upgraded the other two instances to Xenial. Given that the upgrade was rather painless (so far... I hope all the puppet stuff is still working as intended. Do I need to let puppet know I upgraded?!) I will just continue upgrading those instances when the time comes.
I have upgraded my remaining instances.
Jan 9 2017
Phew! Thanks, yes, much better :-)
NONONONO!!!!! CAN NOT BE DELETED. I upgraded it!!!!!!
fastcci-master was successfully upgraded to 14.04LTS, please take it off the list. I'll work on fastcci-worker1 next!
Yeah, well, still not working for me. Maybe somebody could take a look.
Jan 8 2017
I tried disabling and re-enabling 2fa. Still the same error.
Jan 6 2017
I'm upgrading them to trusty, will that work?
Jan 5 2017
Ok, stupid(?) question: Can't I just do a release upgrade (do-release-upgrade) and be in the clear?
Please do not remove the fastcci or maps-wma1 instances! They are being used.
Sep 2 2016
Jul 26 2016
Yes! Many thanks!
Jul 25 2016
Uuuuaaahhhh, now I'm getting ERROR: permission denied for relation coastlines
Jul 13 2016
We are currently importing OSM data without the -K|--keep-coastlines switch. I.e. the main tables do NOT contain any coastline data. Instead we are using postprocessed coastline data in a special table. HOWEVER this data is not automatically updated, and probably hasn't been updated in about two years at all!