?

Log in

No account? Create an account
An e-mail I just sent--and copied to our entire systems team and… - A Suburbs Boy Living a Country Life [My Flickr Photos]
November 18th, 2004
12:51 am

[Link]

Previous Entry Share Next Entry
An e-mail I just sent--and copied to our entire systems team and [another clueless idiot at a large networking provider] (the e-mail prompting my rant is at the bottom)


Dear [a member of our systems team]:

I'm tired of pulling punches with these idiots. Since [a large networking provider] has proven repeatedly over the last year that they are dumber than bricks and incapable of troubleshooting network problems for themselves, let me break it down for them.


As packet size increases, packet loss increases dramatically.

ICMP ECHO/ECHO REPLY TEST (a.k.a. "ping") from 10.0.10.xx to 10.0.2.99
Packet size (bytes) loss % avg. latency (ms)
1400 7
1500 13
1600 17
1700 92 79

I isolated the LAN's latency and loss from the LAN by pinging from 10.0.10.xx to 10.0.10.254 (the [a large networking provider] router). At a packet size of 1700 there is 3% loss and 2ms average latency on the LAN. The other 89% and 77 ms is between the [a large networking provider] managed end-points 10.0.2.254 and 10.0.2.99.

What this means in practice is that "default" pings will look perfectly normal--but as packets get bigger, the performance drops off dramatically.

Now, there's a couple of things to be aware of--one, as we exceed ~ 1500 bytes we'll begin to have Ethernet frame fragmentation on the LAN. That is already factored into the LAN latency (and is obviously not a significant factor).

Now, what does [the above] mean in terms of instantaneous network utilization, since I know that has been brought up as a strawman issue by [a large networking provider]? Nodes construct echo replies by reversing the source and destination headers, setting the code to zero, and re-computing the checksum. Therefore two 1700 byte are approximately 3400 bytes on the network. If those 3400 bytes were to be transmitted over the course of 79 ms we would see 344.3 kilobits in a second of non-stop echo responses and replies. That's about 22% of a T-1. (I'm excluding fragmentation overhead, network management, and other traffic from that figure--but I'm also excluding processing delays &c. that would make that true instantaneous figure lower). For one thing, the default ping client sends out one ping per second--putting us more at _ 1.8% _ of a T-1's total bandwidth from the 1700 byte ping test over time.

Can we please now put to rest FOREVER the ridiculous idea [a large networking provider] seems to have that nothing is wrong with their implementation of our network and get them down to finding and fixing their problem(s)? Thanks much.

Fed up and going to bed,

--Pete

p.s. [a large networking provider] also seems to have missed the "for a PPTP VPN" part of our static NAT request for the BFS-local PPTP VPN on [an IP address], because they are not passing GRE and therefore VPN connections fail after authentication.
--
Peter L. Thomas, Technical Director
High Performance Technologies, inc. (www.hpti.com)
Office: 973-442-6436 x246, Cell: 703-615-7806, Fax: 973-442-6402
E-mail: [my work e-mail address]

-----Original Message-----
From: [a member of our systems team]
Sent: Wednesday, November 17, 2004 3:24 PM
To: Thomas, Peter
Subject: FW: ([a trouble ticket number]) Latency Issue.

Can you provide these?

-----Original Message-----
From: ProductSupport-Dedicated@[a large networking provider].com [mailto:ProductSupport-Dedicated@[a large networking provider].com]
Sent: Wednesday, November 17, 2004 3:21 PM
To: [a member of our systems team]
Subject: ([a trobule ticket number]) Latency Issue.

[a member of our systems team],

I am not seeing any latency issues at this site at this time. Please provide more information so that I can troubleshoot this issue further; ping results with source and destination as well as traceroute results.

Thank you,

[another clueless idiot at a large networking provider]

(12 comments | Leave a comment)

Comments
 
From:philrancid
Date:November 18th, 2004 06:27 am (UTC)
(Link)
I know that to you this is a real and annoying issue, but every time I read a part where you've replaced actual names with things in brackets, a "silky-smooth" announcer's voice steps in over the mental narrative and maketh me giggle.
[User Picture]
From:happypete
Date:November 18th, 2004 07:19 pm (UTC)

* grin *

(Link)
Well given that a woMan in Canada got sued for lIbel on the Internet recently, I didn't want to disclose who's been screwing us over.
From:philrancid
Date:November 18th, 2004 07:58 pm (UTC)

Re: * grin *

(Link)
I suppose it really depends. If you have proof through emails and th elike that all this has happened to you, it's not really "libel", and they can't say it is. But, I mean, this is from a guy, who, as you might be able to tell, kinda just says whatever the hell he wants to and the rest of the planet be damned.
[User Picture]
From:godfatherur
Date:November 18th, 2004 11:53 am (UTC)
(Link)
You go P!!! :) One hell of a well written letter if I do say so meself :)
[User Picture]
From:happypete
Date:November 18th, 2004 07:20 pm (UTC)

I got mildly yelled at...

(Link)
...for being "unprofessional" to the provider in question.

Was it a bad idea to call them incompetent idiots to their faces?
[User Picture]
From:christilyn
Date:November 18th, 2004 12:32 pm (UTC)

This morning at our house

(Link)
C: Hey, Pete wrote a post that you might understand but I won't
S: (reads post). Huh!
C: Rhat does all that mean
S: (Brief explanation in English for the network challenged
C: Would you have sent a similar email in the same situation?
S: No, I usually chew incompetent people out on the phone.
[User Picture]
From:happypete
Date:November 18th, 2004 07:21 pm (UTC)

Re: This morning at our house

(Link)
It just blew me away that they could write an e-mail saying "well, we don't see any problem," after all the documentation.

I had given them these figures over the phone before writing the e-mail and they went off to "investigate."
[User Picture]
From:macthud
Date:November 19th, 2004 02:17 pm (UTC)

Re: This morning at our house

(Link)
fer future consideration... you might dodge the `mildly yelled at` by saying `as I provided in my initial phone call raising this ticket...` before leading into the figures.

Checking pipe performance with larger packets is a good idea, for troubleshooting lots of issues, but unfortunately -- Lots and LOTS of elderly systems, including those in place at many of the LARGE providers have major issues with packets larger than about 1536 bytes...

As unix_vicky says below -- I'd also concentrate on the packet loss much more than the latency -- even to pretending that the latency doesn't bother you at all (even though I gather that is what started *you* on this investigation), and squawk loud and long about the loss.

What *should* happen when you send large packets is that they get fragmented, to the lowest common MTU of the total route traveled, which will increase latency -- but you should *not* see any increase in packet loss, if routers and wires and other elements are operating correctly. Thus, this loss should be the primary focus of complaint and troubleshooting....

oh, and also useful -- tcptraceroute, which gets around all-too-common UDP filtration rules, and also lets you check whether TCP port xxx packets are getting through. I haven't played with UDP port-specific tools, but I would hope they're out there...

Good luck! (As I go download MTR, and look to see whether it builds and runs on Mac OS X...)
From:(Anonymous)
Date:November 23rd, 2004 05:28 am (UTC)

Re: This morning at our house

(Link)
Ted writes:

What *should* happen when you send large packets is that they get fragmented, to the lowest common MTU of the total route traveled, which will increase latency -- but you should *not* see any increase in packet loss, if routers and wires and other elements are operating correctly. Thus, this loss should be the primary focus of complaint and troubleshooting....

Not so actually... the packet (or the GRE-encapsulated PPTP packet) will get fragmented into two packets. Both of these packets must arrive at the destination for the original packet to be reconstituted; thus, the packet loss will double. And indeed we see that when the 1400 byte payload pings go out (1442 bytes total including all headers; < 1518 bytes maximum ethernet frame size), the loss is 7%, while when the 1500 byte payload pings go out (1542 > 1518 therefore fragmenation), the loss is 13%, or almost double.


Pinging *to* a router is not a useful metric. Pinging *through* a router is a useful metric. Be sure of what you're measuring.


If Pete is seeing 2% packet loss on his Ethernet, it's time to start checking for duplex mismatches, etc. Get one's own house in order so as to be able to better see the problem from the service provider.


My $0.02...


--RS

[User Picture]
From:happypete
Date:November 23rd, 2004 03:05 pm (UTC)

Re: This morning at our house

(Link)
Thanks, RS...That's a very cogent summary. Of course, it doesn't explain the quantum jump from 1500 to 1700...By the way, though MCI claims to have made no changes, we are now seeing ~0% loss with >= 1500 byte packets.

Thanks for the explanation of the doubling of the loss rate in addition to increased latency.

We are, of course pinging through several routers--but because it's a managed MCI VPN we don't even see them, e.g.:

H:\>tracert 10.0.2.100

Tracing route to 10.0.2.100 over a maximum of 30 hops

1 1 ms 1 ms 1 ms router.bfs.hpti.com [10.0.10.1]
2 1 ms 1 ms 1 ms router-mci.bfs.hpti.com [10.0.10.254]
3 46 ms 46 ms 50 ms 65.207.28.250
4 84 ms 101 ms 101 ms 10.0.2.100

1st hop, our local router (completely superfluous, but our network guy here keeps it "so we can set up a back-up dial-up path to the network if MCI goes down for a long time."--my thinking is that we could keep that in cold storage, not make it the default gateway, and crack it out for that purpose if such a disaster occurs...but that's just me); 2nd hop our router; 3rd hop === 10.0.2.99's "public face"; 4th hop == node on the other side of the router.

Note that the 2nd hop is in Picatinny, the 3rd hop is in Arlington. MCI has hidden the path between from us. (This is by design--we're not supposed to care, so long as the packets get there, and our network traffic is supposed to be secure and partitioned from everyone else's).

Okay, back to work for me.
[User Picture]
From:unix_vicky
Date:November 18th, 2004 03:37 pm (UTC)

Big packets?

(Link)
I wouldn't be worried about the 79 ms latency so much as the 92% packet loss. Might as well be a big black hole for data. Then again, what do you have that would be sending 1700 byte IP packets? Most traffic starts out as LAN traffic, which means it will be 1500 bytes or smaller (unless you manually crank up your MTU, in which case you'll see the fragmentation you mentioned). Encapsulation, like PPTP, or LANE Emulation, will increase that size, but it shouldn't hit 1600 bytes, let alone 1700...


That being said, even the 7-17% packet loss at lower packet sizes shouldn't be happening, and could be causing significiant TCP retransmits, thereby causing the appearance of large latency. And UDP appplications will get crappy performance.


A cool tool you may want to look at is Matt's Traceroute (MTR), available for both Win32 and Unix/Linux... Starts out as a normal traceroute, then continues to ping each hop along the way, showing latency, packet loss percentages, etc. so you get more than just the 3 tries at each hop.

[User Picture]
From:happypete
Date:November 18th, 2004 07:22 pm (UTC)

Re: Big packets?

(Link)
Thanks...I'm looking into it.
Powered by LiveJournal.com