Wednesday, January 19, 2011

Day 1

 Well, it's day 1... just got my domain last night & already have it tied into google apps.  Hover.com... is kick-butt.  I highly recommend it to any mid-sized company to home-user person who wants to setup a domain.  I only ran into one small bug when setting up some DNS records, but I suspect it will be resolved quickly.  (not really that important anyway)

My goals for creating this site... is to have a repository I can dump random information that I may want to share with friends/family/random people in chat-channels... as a reference to take "enterprise-grade" technologies and make them more accessable to small-businesses.  (Technologies like clustering/HA of nearly all types, SANs, redundancies, Virtual Servers, etc...) as well as random tidbits that I've picked up along the way.  (I dabble in nearly everything "computing" related)

To start with... I am not a respector of brands.  I won't ever buy something simply because it's the "next" in the series from that manufacturer.  There are brands of products I've tested and been very disappointed with, and that may influence future purchases... but who wouldn't be influenced?  I am not necessirily loyal to Microsoft or any flavor of *nix.  I feel that each has their place in the world... sometimes one is better for the task than the others.  Because I don't prefer one solution or another, does not invalidate the option... but like all people in this world... I do have opinions too.  I can also appreciate the value of the mighty dollar.  Not everyone in the world has a bottomless pit of money to draw from... wait... who besides Bill-Gates does?

Anyhow... this is becoming a long & boring blog... so on that note... lets move on to something more interesting.

Terminal Services (revised)

... as promised... more TS info... this time with less fluff.  I started re-reading my previous post... and quickly decided... mmmkay too much fluff.  It needs a rewrite...  so... this time... I'm not going to be so "non-tech" friendly.  Forgive me if I glance over several things & get right to the meat of this.

In short... here's the goal of what we're looking for when it comes to  getting redundancy going...


Although there you see 5 servers in this model... we only have 2 to play with.  (in this particular environment) but hey.... the the process is still similar.  Lets cut this down a bit more & see what we end up with.

Woohooo 3 servers... makin' progress...  of course... we're not even touching the gateway services yet... and the session broker ends up becoming the weak link.   What if it dies? ... evil phone-calls.

Lemme break down the steps in simple terms:

The client does a DNS lookup for our server's name... in this example up above, "Farm1".  In production... this will be a FQDN, publicly accessible.
DNS server responds with a list of multiple answers... one for each "session host" server.
The client randomly picks one of the servers from the list and connects to it.  If that server is unavailable... the client will pick another one from the list.
The randomly selected server... talks to the session broker behind the scenes & asks where the client needs to go.
The session responds with where the client needs to go.
The Session host tells the client where to connect...
The client connects to the specific server...
Finally, the session host updates the session broker with it's new connection.
 Ideally... that's how the magic all works!  Woohoo!

Ok... lets add a few more pieces to this puzzle.
Those purple servers are Gateway Servers...  If you look at the previous diagrams... they use private IP addresses... the job of the gateway server... is to securely bridge that gap between the public internet... and the private network.  Why 2? .... and why doesn't one have any lines connected to it? ... well... round robin DNS is round robin DNS... int his particular example... the random server it found was the top one.  The client's connection sticks with that server.  Had the client randomly picked the other server... the lines would all point to the other server.


In order for this to work... the client needs an additional config option set to use the gateway servers.  Not really a big deal as you'll probably end up deploying a .rdp file or setting up a portal to your users.  When a gateway is thrown into the mix, the clients actually use a HTTPS connection rather than the standard RDP (tcp 3389) connection... and add a *needed* additional layer of security. 

So... the process changes a bit.
Client does a DNS lookup for the gateway server.
Client gets multiple servers to connect to... randomly picks one...
Client connects to random gateway server using basic HTTPS authentication... and start a RPC over HTTP session.
The Gateway Server does another round-robin DNS lookup for the Session Host Server.
List of Session host servers is returned... and a random one is picked...
The Gateway Server passes the RPC session over to the Session Host server.
and from there it behaves like normal... juggling around between servers until the new session finds the correct server.  (Either resuming a disconnected session, or starting a new one on the least-used server.)  

And now, for the final piece.  That connection broker represents a single point of failure.  We're trying to build a setup with minimal points of failure.  So... the "Microsoft Answer" ... is to throw yet another server into the mix as a fail over.


So... there we have the "Microsoft" plan.  6 servers... to go from a single session host to adding a 2nd session host to the mix.  Admittedly, this is a great model for scalability to a HUGE deployment.  Most small businesses won't really need this level of scalability.  A single server can easily host 20-50 sessions... or even more with the right hardware or moderate to light weight applications.

In my situation and probably many others...  I can't justify that sort of expense when more than half of those servers will be sitting relatively idle... So... I set out to consolidate the roles each server performs... while still keeping the redundancy and all the benefits of each role.

12:54AM .... and tomorrow is another day.  I'll layout how I consolidated each roll into 2 servers tomorrow.

Terminal Server Fun.... (or not?)

I've been circling around & around with various problems trying to build a more redundant setup for Microsoft's "hot-new" feature "RemoteApp".

At first glance... and in simple environments, this is a really cool technology.  Imagine only paying a single license fee for (most) programs per-server... and avoid the constant upgrading of workstations... and using your budget to invest in your servers.

In a single-server setup... this works pretty well... baring a few crazy things... which is outside the scope of this particular post... but I promise to bring up sometime.  That information is pretty useful when considering Microsoft's Remote Desktop Services... (previously known as Terminal Services).

I'm sure there are many other admins out there who are playing the "Why not use Citrix" card.  Well... If we all had bottomless budgets to play with... I would loved to have considered the possibility.  Citrix is simply crazy expensive to implement.  The majority of the features of Citrix... Microsoft offers in the base setup for Remote Desktop.  Everything that citrix offers... requires that you *first* purchase Microsoft's Licenses not only for the OS, but additionally for each client.  Strange... isn't it.  Admittedly, Citrix has done a lot of additional work for you to make some things easier to setup, configure and deploy.

So... what is this whole terminal services thingie anyway?  Long story short... You probably have already seen how users can remotely control your computer... or even able to connect to your home computer through the internet... etc... what if you had 1 computer... that can have multiple "sessions" and basically act like multiple remote computers... on 1 server... configured once.  Terminal Services is exactly that.  Install Office... Install company programs... etc... on one server... and have multiple users log into it & run whatever programs.

Now for the twist.  RemoteApp... (which is only available in Windows Server 2008 and above) does some spiffy "jedi-mind-tricks" and doesn't give you a remote desktop... but instead simply draws the individual programs on YOUR desktop.  The program is still running on the remote server... only being displayed locally.  Spiffy huh?

 So... Lets start a pretty diagram of what's going on...



 Seems simple?  Well in a small office, this might work... but once you get going in a slightly larger office where you need to consider more than one server, is where you start running into problems.  On paper, you simply add another server... riight??? ... reality becomes quite a bit more complicated.

So... what happens behind the scenes? ... according to Microsoft... to do it "correctly" ... you actually need 6 servers... You need 2 gateway servers, 2 broker servers and finally 2 host servers.  So, your little project quickly skyrocketed to a HUGE undertaking... which gets rather expensive.  Hardware and software both.  Seriously?  2x the performance... for 6x the cost?  Does that really seem realistic?  Of course not.

Well, my goal is to simply expand the operation to 2 servers... to get roughly twice the scalability... and a bit of redundancy.  (there's some overhead, so it's not exactly 2x the growth... as the servers will be doing some additional tasks...)  We'll still make use of some of the Microsoft model... but put the work on the servers.

First... What does each role do?

1.  Gateways:
This is more of a layer of security than anything...  The stock security in the RDP protocol is almost non-existent.  It's enough to keep honest people out... but won't do much beyond that.  The gateway service wraps up the RDP protocol through https... which is significantly more secure.  (but can also add some additional headaches)  Plus, you can add a nice web portal for users to login and run their applications without having to manually install a million .rdp files or shortcuts to everything.

2.  Connection Brokers:
Since we're having multiple servers... if we accidentally get disconnected... what's the chance we will get re-connected to the same server?  50/50.  Which means you'll have a 99.99999% chance of getting phone calls...  I hate those kinds of phone calls.  So... We have connection brokers that help match a disconnected session with the correct user.  The connection broker in 2008R2 can also act as a load-balancer... which helps to spread the workload evenly between the session hosts.

3.  Session Hosts:
This is where the programs are actually run.  All the horsepower needs to be here.  Not much else to say.

Well... even if I had 6 servers... I probably still wouldn't divvy them up that way.  The gateway service & connection broker service rarely tick at 1% cpu usage & not even enough memory to justify a dedicated server to it.  I've seen it suggested that the connection brokers should be thrown onto the domain controllers... but I kinda don't like to put a bunch of unrelated stuff on my domain controller.  (yes, I know most DCs sit idle for the most part... it's still not my preference to do that.)

So... my solution... is to-be-continued.  It's 12:18AM... and I gotta get to work in the morning.

Thursday, January 13, 2011

Hyper-V Time Sync issues... FIXED!

I've been using Microsoft Hyper-V Server for a while now, and I've run into an issue in Linux operating systems where the time would skew VERY badly.  Installing NTPD didn't help, as it only updates the clock if it's within 128ms of the servers it is polling from... and running a cron job to update the time just wasn't accurate enough.
The time would skew more than a few minutes in the space of 1 hour.  This is VERY unacceptable.

But!  as the title suggests... there is a fix.  (for me it's a simple fix... for others... may not be so simple)  So, here we go!

The problem occurs because of an issue where the clock in an OS isn't based on the "hardware clock" that is kept alive by a battery... and is only sorta-accurate... but rather the OS's clock is based on a set number of cpu cycles.  In a physical machine, this provides a clock that is much finer-grained than the traditional hardware-time... which is only reports whole-second increments.  When you are doing highly time-sensitive things, you need a much more accurate clock.  For example,  VoIP (ulaw RTP audio streams) traditionally breaks audio data up into 20ms chunks of audio into 1 packet of data sends that.  On the other end, those packets are put into a special sort of buffer that takes those 20ms bits and reassembles them into a continuous stream of audio.  If you only had a clock accurate within 1 second... you'd have some SERIOUS delay in conversations.

Today, a VERY large number of things in computers require a highly accurate clock.  Rather than each application trying to have it's own clock, operating systems provide APIs that every applications can rely on for an accurate time source.  I am not 100% sure with all operating systems, but I do know that Linux has one such kernel-clock that is not based on the hardware-clock.  There are kernel options that can be set to define how accurate this clock is... (1/1000ms, 1/100ms, etc....) but that's not really very relevant to this topic.  In short, during the startup process, the kernel starts the os clock based on the hardware clock... and some sort of algorithm for defining a number of cpu cycles per "tick", and continues to count from there... and on shutdown sets the hardware clock to the OS's clock...  Typically, in the middle, services like NTPD can make the OS's clock much more precise with regards to the actual time as defined by the NIST.

So, what goes wrong in a Virtual environment? (not just hyper-v)  Well, cpu cycles are virtual.  There are several different things at play all of which can make the number of cycles per tick a variable rather than a constant.   Most virtual server frameworks (if not all) provide some sort of compensation to the guest OS'es to *appear* like they're getting a constant number of cycles, but this wreaks havoc if the guest OS doesn't quite understand what the host OS has done.  In the case of Hyper-V, extra cpu cycles are thrown at the guest OS to try & push the clock forward periodically when it thinks the guest OS might have missed some.  This *can* help, but in the case of most Linux OSes, they just steadily count the extra cpu cycles, and the OS clock skews forward.  The fix? Well, this is where it gets a bit more tricky.

A Kernel Module Saves the Day!!  Actually, this idea isn't as strange as it sounds.  Other virtualization frameworks have "integration" tools that do exactly this... and other functions which we really aren't worried about at this point.  We want Linux Guests in Hyper-V to keep time!  Microsoft was sooo thoughtful to provide us with the tools we need.  The "Linux Integration Services v2.1 for Windows Server 2008 Hyper-V R2" was written specifically for this purpose!  We're Saved!  ...or are we?  Well, if you read the fine print, it's only supported in a CRAZY-short list of Linux operating systems.

  • SUSE Linux Enterprise Server 10 SP3 x86 and x64 (up to 4 vCPU)
  • SUSE Linux Enterprise Server 11 x86 and x64 (up to 4 vCPU)
  • Red Hat Enterprise Linux 5.2, 5.3, 5.4, and 5.5 x86 and x64 (up to 4 vCPU)
Well... that's a start, Microsoft appears to want to be friends with the Linux community... heck, they even went as far to get several of the pieces of the Linux Integration Services integrated into the kernel.  Wow!  Microsoft creating kernel drivers?   AMAZING!.... wait... why doesn't my OS work then?  Well the down side is that Microsoft managed to get the driver into the kernel, but failed to keep it there due to a GPL violation.  So, you can't expect to see the hyper-v bits in any mainstream linux repositories anytime soon...

But this is not the end!  This Linux Integration Services are still there and still can be useful.  There's a few *gotchyas* but we are mainly focused on 1 feature... time sync.

So, without any further-ado... here's what you need to do:

1) Download the Linux Integration Services package from Microsoft, and extract the files to someplace convenient.  We're only really interested in the LinuxIC v21.iso at this point.

2) Mount the .iso into your guest OS and mount the virtual cdrom to someplace convenient.
mkdir /mnt/cdrom; mount /dev/cdrom /mnt/cdrom
3) make a copy of the cdrom-stuff on the local guest OS.  (the cdrom isn't writable <shock>)
mkdir /opt/linux_ic_v21_rtm; cp /mnt/cdrom/* /opt/linux_ic_v21_rtm
4) get your guest OS ready to build a kernel module.  I'm using Debian 5, but your os should have something similar... (basically, just need to install the build-tools & kernel source)
apt-get install build-essential linux-source module-assistant
m-a update && m-a prepare 
5) Fix one line in the script/determine_os script.  Apparently, Microsoft didn't want to build the module for every kernel, just those 2.6.27 or greater.  Unfortunately, I'm running 2.6.26.

This may be a bit iffy, but for my kernel, all I needed to do was change line 40 from:
if [ $KERNEL_VER -ge 27 ]
to:
if [ $KERNEL_VER -ge 26 ]
This may work for other kernel versions, and the entire script might be better modified to support other kernels, but I am not 100% sure of what is & what is not supported.  I figured that 2.6.26 has very few (if any  ) differences in the system clock's functions.  (the entire package was designed to work with kernels 2.6.27 and kernel 2.6.9)

6)  Build *only* the hv_timesource module.  The other bits are very kernel-version specific.  On the other side of the coin... they do contain the other nifty paravirtual drivers.... but I am not a kernel developer (or any kind of programmer) and can't tell you how to fix the compile errors.
make hv_timesource
7)  If all goes well, the hv_timesource.ko module will be built!  Finally, we just need to load it.
insmod src/hv_timesource.ko
Final notes:  At this point, the module should be loaded, and the clock shouldn't drift anymore!  That being said, it may still be wrong, so it may be useful to set it.  You can use "ntpdate" or even pull the time from the hardware clock using "hwclock --hctosys".  This should at least get you started, but you'll still need to make this module auto-load on startup so it will be there after reboots... and if you should upgrade the kernel version, you may need to rebuild the module manually.

I'd be a very happy person if Microsoft would split up their "integration services" package into pieces.  They do have closed-source bits (which is why they were in trouble for violating the GPL) that they can keep as an add-on package... but I honestly can't see any reason why this bit should be kept from the mainstream kernel releases.  This is yet another example of Microsoft playing the "see we integrate with linux" game... without actually integrating with linux.