It's not unusual for us to get a trouble ticket that says not much more
than “My site is down! Fix it!” And nine times out of ten, we (that is,
the Tech Support staff here at Pick // Internet Services) are able to bring
the site up in our browsers, as we reply with to the ticket with:
In order to help us more accurately identify the source of the
latency you are experiencing, please download the following program
to your desktop:
http://support.pickint.net/resources/winmtr.exe
There is no need to “install” the software; simply click on the
WinMTR.exe icon and a window will appear. In the “Host” box
please type the domain name or the ip address you are having
difficulty connecting to.
WinMTR will begin a diagnotic routine that continuously pings
each host between your computer and the server you entered in the
Host box.
Please let this run for several minutes and then click “Copy Text to
clipboard”.
Then paste the results in to this ticket.
Thanks for helping us get the data we need to resolve the latency
issue you have reported.
We will get back to you shortly.
(that is the exact text we use, and it's not unusual for most Tech
Support departments to have pre-canned responses to common problems)
But it's not that hard to troubleshoot the exact problem and save all of
us from having to play Twenty
Questions.
Now, given that all of our customers use Windows, the instructions
following assume that you too are using Windows (an 88% chance at the time
of this writing, but don't worry, if you are using a Mac I'll be telling you
what to do, and what you see should be similar enough to what the Window
users will see; if you're using Linux, you probably know enough to trouble
shoot the problem anyway).
So, the next time you can't get to your website, before calling or
submitting a ticket saying “My site is down! Fix it!” take a few moments
and do the following. First, click the “Start” button, select ”Run” and
type cmd, then click “Okay”. A black window will pop up, with
the contents looking something like:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\Kids>
(and for this, I'm using The Kids' computer, since I don't have a Windows
system of my own. Mac users should run the Terminal program, found under
/Applications/Utilities). For this example, I'm using the Pick
// Internet Services website, at http://www.pickint.net/.
You'll use your own domain name for this, preceeded by
www.:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\Kids>ping www.pickint.net
Pinging www.pickint.net [204.29.162.248] with 32 bytes of data:
Reply from 204.29.162.248: bytes=32 time=26ms TTL=59
Reply from 204.29.162.248: bytes=32 time=26ms TTL=59
Reply from 204.29.162.248: bytes=32 time=33ms TTL=59
Reply from 204.29.162.248: bytes=32 time=26ms TTL=59
Ping statistics for 204.29.162.248:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 26ms, Maximum = 33ms, Average = 27ms
C:\Documents and Settings\Kids>
(Mac users, the command is the same, although you have to press
Ctrl-C to stop it running. The output is slightly different but
not enough to worry about it)
This shows that the server is accessible from your computer. The bit
that says time=26ms tells you how long it took for a packet of
data to make a round trip from your computer to the server and back again,
in milliseconds. A double digit number is very good, and low triple digits
is okay. If you still have problems pulling your website up in a browser,
at this point it's probably a problem on the server itself, so you can call
or submit a ticket with this information. But, if the times are above 400ms
or so, then it's likely to be a network problem somewhere along the way
(which we'll get to in a bit).
Now, if you get the following:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\Kids>ping www.pickint.net
Ping request could not find host www.pickint.net. Please check the name and try again.
C:\Documents and Settings\Kids>
(Mac users: the message will be: ping: unknown host
www.pickint.net)
There are two possible problems. One, your ISP is having DNS issues and we can't help you. Other symptoms of this
problem is that you can't get to other, or any website. The other
problem might be: your domain registration
expired, so yes, your site is down, but what happens is
different than if the server is down.
The other result from running that command:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\Kids>ping www.pickint.net
Pinging www.pickint.net [204.29.162.248] with 32 bytes of data:
Reply from 10.0.1.1: Destination net unreachable.
Reply from 10.0.1.1: Destination net unreachable.
Reply from 10.0.1.1: Destination net unreachable.
Reply from 10.0.1.1: Destination net unreachable.
Ping statistics for 204.29.162.248:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms
C:\Documents and Settings\Kids>
(Mac users: there will be no output at all, so after a bit, just
press Ctrl-C)
This is a networking issue (and don't worry if the reply comes from some
other IP address—this is an example, remember?) and if you are getting the
impression that most of the time, the problems are due to networking issues,
that's because it's probably the case.
Now, to trouble shoot that, you can download WinMTR and run
that, but Windows also comes with a similar program to that,
tracert (under just about everything else, including the Mac,
this is traceroute).
C:\Documents and Settings\Kids>tracert www.pickint.net
Tracing route to www.pickint.net [204.29.162.248]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms hobbes.hangar18.area51 [10.0.2.1]
2 4 ms 3 ms 4 ms janet.dreamland.area51 [10.0.1.1]
3 5 ms 4 ms 5 ms spc.bct.dsl.pickint.net [66.252.226.49]
4 25 ms 27 ms 70 ms core.bct.rt.pickint.net [66.252.227.33]
5 26 ms 25 ms 26 ms 204.29.162.248
Trace complete.
C:\Documents and Settings\Kids>
(for Mac users: the command is traceroute and the output is
reversed—the host or router is listed first, then the timing information.
Other than that, it's pretty much the same)
This command (much like WinMTR) will show each point along the Internet data
from your computer to your website will take, and how long it takes to get
to each point (skip).
Technobabble
traceroute shows each hop packets take from your
computer to the destination (and yes, “hop” is a technical term).
It does this by using a neat hack.
Each packet that a computer sends out has a “time-to-live”
field, which is the maximum number of hops it can take. At each
hop, the router will subtract one from this field and if it's equal
to zero, the packet is dropped and an error is sent back to the
originating computer that the packet “died” enroute.
Typically, the operating system will set this field to a large
enough value to ensure that the packet makes it to the destination
before the “time-to-live” field hits zero (this value is typically
set to 60, although in practice, no two points on the Internet has
been greater than 30 hops apart). But traceroute will
send the first packet with a “time-to-live” set to 1 (it actually
sends three such packets). The immediate next hop will decrement
the counter, see that it's zero, and send back an error. Then the
next packet with a “time-to-live” set to 2, so the second such hop
will return the error.
And so on until a packet reaches the destination, at which point a
different error is returned (since the packet is sent to a
non-existent program).
Occasionally, you'll see something like:
C:\Documents and Settings\Kids>tracert www.pickint.net
Tracing route to www.pickint.net [204.29.162.248]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms hobbes.hangar18.area51 [10.0.2.1]
2 4 ms 3 ms 4 ms janet.dreamland.area51 [10.0.1.1]
3 5 ms 4 ms 5 ms spc.bct.dsl.pickint.net [66.252.226.49]
4 25 ms 27 ms 70 ms core.bct.rt.pickint.net [66.252.227.33]
5 * * * Request timed out
6 26 ms 25 ms 26 ms 204.29.162.248
Trace complete.
C:\Documents and Settings\Kids>
Where one of the hops doesn't report anything. Sometimes a router is
programmed not to send back an error, or some other router on the way back
filters such error messages, or there's too much traffic at that instance on
that router (or host) for it to bother sending back an error. One or two
occasional such lines are fine and normal.
But when you start seeing three, four, five such lines in a row, there's
a problem. And depending upon how far along the problem is, it could be an
issue with your ISP, or
with some network provider between you and your website, or with your
webhosting company.
But then sometimes you'll see something like:
C:\Documents and Settings\Kids>tracert www.pickint.net
Tracing route to www.pickint.net [204.29.162.248]
over a maximum of 30 hops:
1 <1 ms <1 ms <1 ms hobbes.hangar18.area51 [10.0.2.1]
2 4 ms 3 ms 4 ms janet.dreamland.area51 [10.0.1.1]
3 100 ms 104 ms 102 ms spc.bct.dsl.pickint.net [66.252.226.49]
4 125 ms 127 ms 170 ms core.bct.rt.pickint.net [66.252.227.33]
5 126 ms 125 ms 126 ms 204.29.162.248
Trace complete.
C:\Documents and Settings\Kids>
Note the rather large jump in times between hops 2 and 3? (in this case,
it's my DSL router).
This means the problem is with my ISP (which, in this case, is Pick // Internet Services
as a perk of working there).
For another example, here's the output from WinMTR (the actual output is
text, I converted it into a table format, and changed the host/hop names so
they were less cryptic and shorter!):
WinMTR statistics
| Host |
% |
Sent |
Recv |
Best |
Avrg |
Wrst |
Last |
|
| customer.mi.comcast.net | 0 | 72 | 72 | 0 | 140 | 710 | 170 | |
| r-alpha.mi.comcast.net | 0 | 72 | 72 | 0 | 137 | 280 | 110 | * |
| r-bravo.mi.comcast.net | 0 | 72 | 72 | 0 | 143 | 1150 | 1150 | * |
| r-charlie.mi.comcast.net | 2 | 72 | 71 | 0 | 144 | 1150 | 160 | |
| 12.116.16.25 | 0 | 71 | 71 | 0 | 133 | 330 | 110 | |
| r-alpha.cgcil.att.net | 0 | 71 | 71 | 0 | 135 | 330 | 170 | |
| r-alpha.phlpa.att.net | 0 | 71 | 71 | 50 | 135 | 220 | 160 | |
| r-beta.phlpa.att.net | 0 | 71 | 71 | 50 | 136 | 330 | 110 | |
| 12.119.53.118 | 0 | 71 | 71 | 0 | 137 | 330 | 170 | |
| r-alpha.pitb.telcove.net | 0 | 71 | 71 | 0 | 146 | 330 | 160 | |
| r-alpha.atln.telcove.net | 0 | 71 | 71 | 50 | 140 | 330 | 170 | |
| 24.56.107.70 | 0 | 71 | 71 | 110 | 154 | 390 | 110 | |
| No response from host | 100 | 71 | 0 | 0 | 0 | 0 | 0 | |
| r-alpha.cm1.peak-10.net | 0 | 71 | 71 | 50 | 196 | 330 | 220 | |
| 66.129.112.148 | 0 | 71 | 71 | 50 | 140 | 220 | 110 | |
| www.example.com | 0 | 71 | 71 | 50 | 149 | 330 | 160 | |
This is an actual WinMTR sent to us by a customer. WinMTR works
similarly to tracert but keeps sending packets until
stopped.
Now, notice the two marked lines. These are still within the network of
the customer's ISP so
the problem was not something we could handle—the customer has to call his
ISP to complain.
We have another customer, on the other side of the world, that will
complain about the site being down, or being slow (another indication of a
possible network issue) and we always have to remind this customer to send
in a WinMTR trace and the majority of the times, it's due to some
trans-Atlantic connection that is slow, and not us.
So the next time you can't get to your site, you may want to run
ping and tracert and see if it's our problem, your
ISP's problem, or
something going on between the two.