Page MenuHomePhabricator

Eqiad Spare allocation: 1 hardware access request for OSM Maps project
Closed, DeclinedPublic

Description

Please temporary allocate one of the existing spares, e.g. this one. Approximate usage: 3-6 months. This is needed to test various implementation scenarios with the full OpenStreetMaps database.

Labs Project Tested: maps-team
Site/Location: Labs
Number of systems: 1
Service: Large scale map processing and rendering testing
Internal/External IP Address: internal only
VLAN:
Processor Requirements: unknown
Memory: unknown
Disks: 1+ TB, preferably SSD, but spins are ok
NIC(s):
Partitioning Scheme: 1 partition
Other Requirements:

Event Timeline

Yurik raised the priority of this task from to Needs Triage.
Yurik updated the task description. (Show Details)
Yurik added subscribers: Yurik, MaxSem, Tfinc.

@Yurik,

I don't see any kind of task history linked into this; has there been an operations team member working with you guys for this project?

Typically, I'll allocate machines based on both this request, and the knowlege of the project and overhead required from someone within the operations team. (If we aren't certain who in the team that is, the safe bet is typically Mark or Faidon (as they are our primary architects.)

Also, the large unknowns for memory and processor use are typically a sign of needing someone in ops to work with your team on the performance requirements of the system. This is why I ask if someone else in ops has worked on this with you previously?

Please advise,

ArielGlenn triaged this task as Medium priority.May 12 2015, 1:47 PM
ArielGlenn added a subscriber: ArielGlenn.

@Yurik, have you worked with anyone on ops in the past on the maps project? If so maybe we can get them to weigh in.

@ArielGlenn, thanks, we have discussed this issue with @mark & @akosiaris last week, so hopefully it will get resolved soon :)

@Yurik,

I still don't have the information on what exactly you plan to do with the system (in regards to how to assign the host hardware, networking, etc.) I'll list out some questions below, please review and answer to the best of your knowledge. (We'll get this figured out.)

  • Testing implementations of the database, does that require a public IP address for the host or a private? Will clients outside the datacenter need to communicate with it? (If so, it likely needs a public IP address.)
  • Will this system testing require any SSL certificates? If so, can it go behind a misc-web cluster for https termination, or does it need to terminate directly on the server?
  • If this was discussed with @mark and @akosiaris, perhaps one of them can address the above questions?
  • The disk requirements: "1+ TB, preferably SSD, but spins are ok" will result in spinning disks. Getting SSDs into that capacity requires ordering the disks, and unless you state otherwise, we tend to not do that for short term testing. (If testing supports use of SSD in future implementation, we can address that post-testing.)

I'm leaning towards allocating the following system:
Dell PowerEdge R420, single Intel Xeon E5-2450 v2 2.50GHz, 16GB Memory, (2) 500GB Disks

We have onsite spares of 1TB disks, so we can swap out the dual 500GB for dual 1TB. This is only a single cpu and 16GB of memory, so it is one of the lower end systems. (When folks have no idea what they need, I tend to err on the lower side, as we can simply scale upwards as needed to other spare misc systems.) Since this is short term testing, this seems like an acceptable solution.

Again though, if the above isn't good enough, please let me know! We want to get you guys the hardware needed, but just not be wasteful.

Please advise,

  • Testing implementations of the database, does that require a public IP address for the host or a private? Will clients outside the datacenter need to communicate with it? (If so, it likely needs a public IP address.)

No, it will be a DB/backend tile rendering machine only.

  • Will this system testing require any SSL certificates? If so, can it go behind a misc-web cluster for https termination, or does it need to terminate directly on the server?

No. Even if it were publicly visible, it's for pure testing stuff so wouldn't require TLS anyway.

  • The disk requirements: "1+ TB, preferably SSD, but spins are ok" will result in spinning disks. Getting SSDs into that capacity requires ordering the disks, and unless you state otherwise, we tend to not do that for short term testing. (If testing supports use of SSD in future implementation, we can address that post-testing.)

Ahem, Mark said we might have enough spare SSD. If we don't, let's stick with HDD as getting something is more important than getting someting eventually.

I'm leaning towards allocating the following system:
Dell PowerEdge R420, single Intel Xeon E5-2450 v2 2.50GHz, 16GB Memory, (2) 500GB Disks

We have onsite spares of 1TB disks, so we can swap out the dual 500GB for dual 1TB. This is only a single cpu and 16GB of memory, so it is one of the lower end systems. (When folks have no idea what they need, I tend to err on the lower side, as we can simply scale upwards as needed to other spare misc systems.) Since this is short term testing, this seems like an acceptable solution.

A more powerful config that would resemble the machines we're planning for producton deployment would've allowed performance investigation in absolute terms as opposed to relative, but this is moot without SSDs. @Yurik, do you think 16 gigs would be enough? 8 cores prolly is..

On the storage:

If it was SSDs, are the SSDs needed for OS and data, or only for the data partition?

We also tend to raid1 (or raid10) disks for data redundancy, so it would take a lot of spare SSDs to hit this. We can order them (as pointed out). Since this is an additional expense, I just want to make it clear, as it will require further approvals. (Basically @mark has to agree that we are going to purchase these on our budget, or determine which budget it gets pulled from.)

On the cpu/ram:

I have multiple options on the spares page, including more memory and cpu cores. If you guys have a rough minimum of your cpu core to memory ratio, it would help us narrow down a spec.

@RobH, We already had a meeting with @mark and a few more people from Ops. I think Mark had picked a system for us. IIRC, he also said that SSDs were not available right away, and since we need this system sooner than later, we opted to go for the spindisks until we have a production-ready code (which might actually be relatively soon).

For this to be an accurate test, we would rather have a system that otherwise resembles the hardware that we have budgeted - a high cpu, medium memory, very high disk IO (hence the ideally - SSD) machine. This will allow us to narrow down the bottlenecks more accurately, to estimate vector generation time, and many related tasks. The SSDs are only needed for the data, not OS (assuming Postgres can be configured to use SSD temp space if needed)

At this point we don't plan to expose the machine, instead using it as a backend sql server for the wmflabs maps instance(s).

@Yurik: I think we have the question's I sent to you covered now, in particular since I also chatted with Alex since my task update.

@akosiaris and @mark had indeed worked on this previously with @Yurik, so they will handle the actual allocation. (My understanding is labsdb1004 will be re-purposed for this, but this is NOT certain and Alex will confirm this!)

As such, I'm assigning this task to him for completion. (If we need to instead get someone else involved, or if I need to further work with @Yurik, just let me know.)

@Yurik, @MaxSem. The hardware is ready for use. The hostname is: pgsql.eqiad.wmnet. The TCP port is the standard postgresql one, you should not be needing to change anything as far as the port is concerned. The planet OSM synced database is gis and both of you have full rights on it. There is a .pgpass file in your home directory in the maps-team labs project that should allow you to connect easily with something like:

psql -U yurik -h pgsql.eqiad.wmnet gis

The database is updated once a day for now but we should be able to move down to a minutely sync status in production. Let me know it works and anything else you might need.

Excellent news! Over the weekend, I was shown a database in labs - osmdb.eqiad.wmnet. That db is
supposedly running on the real hardware with SSDs, and all maps labs
projects are using it. When I tried connecting to it with our new code, it
turned out that it was using ASCII client encoding, thus crushing mapnik.
Is the new machine's postgress using utf8? Also, does it import data using
hstorage?

Thanks!

Hello,

Yes, the postgresql cluster in pgsql.eqiad.wmnet has been initialized with UTF8 encoding. I am not sure what you mean by htables. If you are referring to hstore ( an additional hstore (key/value) column to postgresql tables) then the answer is yes.

yurik confirmed access on IRC and at T100548. Resolving this.

@akosiaris - so, is this just giving the maps team access to the psql on this machine, or is the 'machine assigned to them'?

I'm reopening this task - it was implemented as shared server as oppose to dedicating a spare to the Maps, and it has very old version of Postgis, which is unusable to us. This task could either be resolved with T101233, or as a new server allocation.

Postgis upgrade info: http://postgis.net/install/

Update: Thanks to @fgiunchedi, I got the result of select * from pg_stat_activity; -- P739

It seems some of the COPY, PREPARE, and transaction (idle) queries have been running since 2015-05-27 (two weeks ago, right after @akosiaris announced server availability). Are these the replication queries, or something is massively broken? Also, it turns out no indexes were created for OSM data. Max is running create index on planet_osm_polygon using gist(way);, but now I am worried it may conflict with the other copying queries, making index creation an eternal process.

Lastly, this is another reason why we need root - postgress does not allow users to cancel your own query. Google ERROR: must be superuser to signal other server processes