Aces High Bulletin Board

General Forums => Aces High General Discussion => Topic started by: grmrpr on June 13, 2005, 01:53:31 PM

Title: Request for AH to be up and available!
Post by: grmrpr on June 13, 2005, 01:53:31 PM
AH-


After another weekend of outages due to Savis is there any hope of having a reasonable expectation that AH will be up and running at any given point?  I do not know what the SLA is with Savis but I do not think they are capable of delivering.  Isn't it reasonable to expect at least five 9's (99.999%) uptime out of this service we all pay for?

I personally do not have experience with Savis but I can tell you I have never heard anything positive about them.  I have used hosting facilities such as: UUNET, Digex, GenuityNet, AT&T, Inflow (Now SunGuard) and I have never seen so many outages as I have seen with AH since it has moved to Savis.

What’s the plan here? Or are we (The players) just expected to live with the shotty service being provided by Savis?

As a network engineer and unix sysadmin here are some other thoughts/suggestions.

Is it feesable to run backup bandwidth into your systems and use BGP to route the traffic when there is a failure.  I have heard statements that the amount of data the end users need is relatively small.  Could a single or maybe even multiple T1's be purchased through another bandwidth provider and ran into your cab's at Savis?  This way when Savis goes down you will have another route into the servers.

Also may I suggest fixing your DNS.  hitechcreations.com lists 2 ip's as DNS providers.  Both the IP's are on the same network.  When your network goes down so does your DNS.  I would suggest listing 2 or three more.  What I typically do is have my service provider slave my tables from me but actually list my providers dns servers as primarys with internic.  That way their servers take all the hits and I can update my tables at will on my own server without going through them.  Of course list your servers as additional DNS servers...

Have you considered moving the website off that network and away from the game machines?  I personally would consider moving it to another data center (A data center close to the main office could be an option) and using someone like Mirror Image (http://www.mirrorimage.net/) or Akamai (http://www.akamai.com/index_flash.html) to deliver the downloads for you so that bandwidth is kept away from the game servers.  I am sure the stats stuff play a role in the decision process but all that backend stuff could still be they way it is.  If for whatever reason that portion goes down at least HTC web presance would still exist and be able to provide notifications of any outages or issues....

Oh well just some thoughts and rants about my dissapointment with the avialability of AH.  I hope no offense is taken and if there is anything I could do to help out with a solution please let me know.


GrmRpr
Title: Request for AH to be up and available!
Post by: Karash on June 13, 2005, 06:06:05 PM
Constructive Post!

Thumbs up!
Title: Request for AH to be up and available!
Post by: MOIL on June 13, 2005, 11:23:57 PM
OH!!!  this is real smart,  start talking to make sense and offer up solutions or ideas that might help!!!!

Are you crazy:confused:   what if other people start helping or offering such constructive insight into a problem...................... ....................IT WILL BE PURE PANDEMONIUM.................. .......It's insane I tell ya........Insane:eek:
Title: Request for AH to be up and available!
Post by: SuperDud on June 14, 2005, 12:19:45 AM
You mean instead of throwing a fit and saying I'll take my $15 elsewhere? You can actually try to be helpful? Who knew???
Title: Request for AH to be up and available!
Post by: Wolfala on June 14, 2005, 12:49:51 AM
What i'm curious is this: how much data is being pushed during a 400-500 pilot login? For our ATC servers over at Westcoastatc.com, we purchase the OC3 line directly from AT&T while bypassing the Savis and Verio's completely, b/c AT&T owns the lines that the others rent.

Skuzzy,

What are a few solutions?

Wolf
Title: Request for AH to be up and available!
Post by: SFCHONDO on June 14, 2005, 01:36:29 AM
Nice Post, Would love to see Skuzzy reply to this one.
Title: Request for AH to be up and available!
Post by: onions4u on June 14, 2005, 03:03:36 AM
Skuzzy, I live in Texas 31 to 32 ping rate lately, Squaddie lives in Washington state and lately he's having 200 ping rate and getting discoed alot. Both have cable, I have comcast  he has one called (3d?) another  company.
Title: Request for AH to be up and available!
Post by: grmrpr on June 14, 2005, 08:25:50 AM
At my last job I managed a network that consisted of 7 warehouses across the country.  Each facility had private frame relay and T1's that were provided by AT&T.  Also each facility had third connections as in cable or DSL.  We load balanced all those lines across a device called FatPipe (http://www.fatpipeinc.com/) AT&T owns a really solid network and with our redundancy we had virtually 0% down time.  We ran VOIP which I would say is even more sensitive than the UDP game traffic.  There are relatively inexpensive ways to solve the problems HTC are having.  

I wish HTC would respond with a clue as to what their network architecture is and what usage requirements are so the engineers in the community could offer free advice.  Also I have negotiated bandwidth deals with all the major carriers and have toured all the facilities I have listed.  If HTC would make available the architecture and some small details I am sure the techies in the community could offer viable solutions to improve the service.

But alas HTC seems to be silent on this issue…

GrmRpr
Title: Request for AH to be up and available!
Post by: hitech on June 14, 2005, 09:13:08 AM
grmrpr: All items you have listed have been considered previously and are beeing considered again.


HiTech
Title: Request for AH to be up and available!
Post by: grmrpr on June 14, 2005, 09:45:07 AM
HiTech-

If there is any assistance I can provide please do not hesitate to contact me.  I would be more than willing to review any RFP's or support Skuzzy or HTC in any capacity.  I will send you and Skuzzy a copy of my resume so you can see if I have any skill sets I could volunteer that may be helpfull to the cause.

Regards-

GrmRpr
Title: Request for AH to be up and available!
Post by: hitech on June 14, 2005, 10:44:05 AM
Looks like we are switching to at&t data center in dallas. It has one nice advantage in that it has dual oc48's from different areas.

Only issure on configuration I have not been able to solve is how to do source routing with solaris. It just saves me having to put in a router at the data center. I would then be able to do user selectible routes to the data center.

HiTech
Title: Request for AH to be up and available!
Post by: grmrpr on June 14, 2005, 11:15:50 AM
Am I assuming mutliple interfaces in the Solaris box?  Can you explain source routing a little better?  I am assuming you are meaning if source =x go out interface y.  I had a similar delima that I ended up addressing with an Extreme Switch.  I had hosts coming in from differant networks and wanted to split the traffic on the multiple interfaces.

Basically this is what I did.

In the extreme switch (Alpine 3804) I created a psuedo load balanced IP and had the down stream clients talk to it.  That load balanced IP used latency alogorithims to determine which interface on the E450 Solaris 9 machine was the least used and communicated down that.  It also provided redundancy in that if a interface went down on the sun box traffic still flowed.

Only issues.  In a failure of an interface the traffic associated to that interface died due to limitations in the stack on the sun box.  It could not switch the traffic between interfaces.  But all traffic was able to reconnect with out a problem.  Unless there is some sort of logic on the server side I do not know how the server would deal with traffic switcing interfaces on the machine.  The switch would keep the traffic persistant as long as an interface didnt fail but if it did you may wind up loosing the traffic that was associated with the failed interface.

GrmRpr

Side note-

The Extreme gear has all the old big IP stuff built into it.  I like Cisco for most needs but when it comes to HA servers I've found the cheapest method is to put a Extreme Layer 3 Switch in place.  They are so much easier to configure and manage.  And frankly Cisco has never been strong in load balancing.
Title: Request for AH to be up and available!
Post by: grmrpr on June 14, 2005, 11:23:21 AM
PS what is the firewall solution?

If you had a PIX 520 in place in front of this you could add tools to the server side to send shunts to the PIX to block any ip deemed malicious.  

Just for example say user X gets really out of hand and starts going nuts.  Maybe this user has some basic script kiddy attack skills.  An admin could type a server command that would send his IP to the PIX to block it.
Title: Request for AH to be up and available!
Post by: Karash on June 14, 2005, 11:58:14 AM
Good to see this is getting worked on.  Thanks HT!
Title: Request for AH to be up and available!
Post by: grmrpr on June 14, 2005, 12:37:11 PM
Dale-

Something like this:

http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&category=51268&item=5782073294&rd=1

But with the port density you are talking about prob about 1/2 that cost.  And you can get Extreme to recertify and brought into maintenance if you buy a support contract with them.

GrmRpr
Title: Request for AH to be up and available!
Post by: Furball on June 14, 2005, 01:59:26 PM
waaaaaaay over my head.

nice to see its being worked on though, was rather annoying that the one time in a long time i get the opportunity for an extended AH session i get dumped and unable to play all night.
Title: Request for AH to be up and available!
Post by: Soulyss on June 14, 2005, 02:04:17 PM
Wow I think I understood part of that... so basically it boils down to HT saying

"we know there's a problem and we're working on it."

right?

:confused:



:)
Title: Request for AH to be up and available!
Post by: hitech on June 14, 2005, 02:15:39 PM
grmrpr: Basicly with multi homeing you are alowing the user top pic which oc48 he is comming in on by assigning different IP's to each net card. Noproblem on the in bound this is automatic based on the ip and net work number. Now when the message is prossed and sent back, it needs to choose a gateway outbound on the box, no problem in solaris setting up gate way choice on destination, i.e. normal routing, but now I wish to pick the gateway based on the Source IP. Have not seen anyway in solaris to do this other than making a new route entrie every time a new connection is established.

Not sending it out the same place it came in gets realy nasty do to what is caled asynicrous route. Basicly you get more points of failure.

The other option is just to put in a router to do the source routing.
But would much rather do it right on the computer.

HiTech
Title: Request for AH to be up and available!
Post by: Captain Virgil Hilts on June 14, 2005, 02:37:28 PM
Very interesting. I think I understand. I'm frightened. But it sounds like a solution is in the works.

Setting it up so that the processed data goes out on an outbound gateway specified to match the IP souce and the inbound gateway eliminates extra points where data errors/failures can occur?

Or did I read that wrong?

Thanks. Both of you.
Title: Request for AH to be up and available!
Post by: grmrpr on June 14, 2005, 02:54:45 PM
Understood... Same core issue.  Creating a load balanced virtual IP on the switch that load balanced to mutiple interfaces on the Sun box got around that for me.  That way the layer 3 switch sent the responses down the correct paths and the E450 had a single default GW of the switch.  I was not able to determine a way to do it otherwise and went directly to Sun support for help on it and they were unable to provide a solution that bypassed the default gw and sent the traffic down the desired return path interface.  I would be worried about performance problems with injecting all those routes into the routing table with the amount of routes you are talking about.
Title: Request for AH to be up and available!
Post by: Dead Man Flying on June 14, 2005, 02:56:58 PM
This thread = :aok
Title: Request for AH to be up and available!
Post by: vorticon on June 14, 2005, 04:19:46 PM
Quote
Originally posted by Dead Man Flying
This thread = :aok

'
more like :confused:
Title: Request for AH to be up and available!
Post by: Morpheus on June 14, 2005, 05:48:06 PM
more like :confused:  for what reason???

Todd meant it was a good thread because Grm was offering to help HTC rather than bash them for their interuptions in service.

Hence the :aok.


It pretty much spoke for itself. I thought.
Title: Request for AH to be up and available!
Post by: Soulyss on June 14, 2005, 05:55:27 PM
I think the :confused:  was more for us network-ly challenged folk, for whom the details sail way over our head.  I concur with Levi though, gotta appreciate a constructive thread, on the rare occasion they crop up.
Title: Request for AH to be up and available!
Post by: WMLute on June 14, 2005, 06:06:52 PM
Yippie!  From where I sit, Dallas is only a couple hops away.  If they move everything to dallas, I would be lookin' at 30-40ping times.  

(drools)

(edit: kudos to HT for being involved in the discussion.  good to see )
Title: Request for AH to be up and available!
Post by: Morpheus on June 14, 2005, 06:10:45 PM
for what its worth, i live in connecticut and my ping is never above 45.
Title: Request for AH to be up and available!
Post by: hubsonfire on June 15, 2005, 03:28:42 AM
"Basicaly you get more points of failure" - Hitech

Assuming I understood any of this, you get more potential for failures, but considering savvis is failing/struggling consistently (at least for many of us), this could very well be one of those 'lesser of 2 evils' situations. Anyhow, this is reassuring, plz continue.
Title: Request for AH to be up and available!
Post by: Skuzzy on June 15, 2005, 07:35:47 AM
The Internet has been a pretty ugly beast the last month or so.  Comcast and Road Runner have been having chronic problems all over the country since school has been let out.
Worst year I have ever seen.  It is always pretty bad during the summer months, but usually settles down by now.  It hasn't.

Moving the servers is not a panacea to solving all Internet related problems.  Savvis is the second largest Tier 1 ISP in the US.  There is no way to circumvent thier entire network, for all people.
There are things we will accomplish, such as getting back to an ISP who is responsive to problems, as opposed to being defensive.  Hopefully the random outages on the immediate network our servers are on will be reduced, if not eliminated.
And, as HT said, getting the servers back to the local area helps us when we need to get the servers physically.


The async route issue is a problem in that users cannot reliably trace/PingPlot to any given IP address as the trace functionality is not capable of showing the return path a data packet will take.
Basically, traces/PingPlots only show the path taken to the destination.  When the return path is different, it can give false packet loss readings and inconsistent timings.  Makes it very difficult to troubleshoot a connection.
We hope to reduce that somewhat.  However, we are not in control of the path back once it leaves the local router.
Title: Request for AH to be up and available!
Post by: Captain Virgil Hilts on June 15, 2005, 08:12:31 AM
Thank you for the explanation Skuzzy. Perhaps it was because I was having difficulty reading the spelling errors:D I was understanding Hitech as saying the problem with different in/out gateways etc was that it increased the chance for data errors/loss/failures.
Title: Request for AH to be up and available!
Post by: doc1kelley on June 15, 2005, 10:43:55 AM
GRMPR I must salute you!

It's great to see some constructive assistance to a chronic problem in here as opposed to destructive criticism!  This is indeed a much needed and welcomed approach as compared to our usual complaints with no offer of assistance.

Please continue!

All the Best...
Jay
awDOC1:aok
Title: Request for AH to be up and available!
Post by: Skuzzy on June 15, 2005, 10:53:58 AM
That is also true Virgil.  When a data packet has to traverse a different route back than from where it came, the chances of more problems occur.

Look at it like this.  When you drive someplace and arrive on time and safe, you know you have a good path back to the original destination (i.e. you go back the way you came).  If you decide to take a different route on the way back home, you could get lost or held up by construction or other problems.

Same with data packets traversing a different route back.  The unknown quantity is, "will it make it back on that different route?".
Title: Request for AH to be up and available!
Post by: tzr on June 15, 2005, 10:54:38 AM
Yea  what he said (awDoc1)...now I have to go wipe the blood from my ears after I tried to follow all that tech stuff
Title: Request for AH to be up and available!
Post by: 38ruk on June 15, 2005, 11:40:38 AM
Great thread , ive been waiting to see something like this as ive had some issues these last couple of months with a weird varience issue .  WTG guys 38
http://www.hitechcreations.com/forums/showthread.php?s=&threadid=149455
Title: Request for AH to be up and available!
Post by: Edbert1 on June 15, 2005, 02:14:17 PM
I have spent a lot of time in the AT&T datacenter in Allen (Dallas suburb), I assume that is the one you are looking at?

My suggestion would be to use one of the more independent datacenters like Inflow (hands down the best ones I've ever used). They are not an ISP in that they do not provide circuits to your demarc and as a result have major pipes from all the tier-1 providers which they can then load-balance FOR you.

The DC in Allen was about 25% full last time I was there (previous job) so you should be able to get some good rates, but if AT&T has an issue like Savvis is having you may find that all your eggs being in their basket is not the best plan.
Title: Request for AH to be up and available!
Post by: grmrpr on June 15, 2005, 03:00:01 PM
Inflow was bought by Sunguard.  I have used them in Phoenix and Raleigh.  By far the best hosting facility I have ever used.
Title: Request for AH to be up and available!
Post by: mauser on June 15, 2005, 04:36:36 PM
Players helping out HTC - now this is part of what "community" is about in my opinion.  

mauser