Troubleshooting Network Connections
Sometimes it can be challenging to figure out exactly where a problem lies when there's network connectivity problems.
A good high level overview is available in Admin → Status. When all is well, you will see a status screen that will look something like this:
All checks show up as green. But if things are not ok and the checks show up as red, is it because of a DNS or routing issue, could it be a cable problem, what is it?
If you're interested in how these status checks work, please check the FAQ article that describes themin detail here: Testing Network Performance.
There are many ways to test connectivity issues. Here we'll cover the most methodical way, stepping through most of the levels in the OSI model for data communications.
Layer 1 - physical connection
Very difficult to test from within a virtual appliance. The best that we can do is to check in the network interface configuration that the interface status is listed as "UP". Please hit F2 on the console and type `ifconfig`. You will see something similar to the screenshot below:
You will see the list of connected network interfaces, in this case there's only an eth0 interface. If you don't see UP listed on the line where the blue arrow points, it means that there is something wrong on the physical level. Potential problems include:
- Faulty hardware - network card, motherboard, cables, switches or switch ports.
- In virtual environments, it could be that the network interface as simply been disabled.
Basically if you don't see the interface as UP, there's something wrong on the physical side of things.
Layer 2 - Link Layer
If the physical layer is ok and we have the network physically connected to the network. The next thing to test is the Link Layer. On the link layer, we are concerned with transmitting things on the local link and in the overwhelmingly majority of cases these days, we're talking about Ethernet, at least when dealing with servers like LiquidFiles. In most cases with LiquidFiles, that means testing that the LiquidFiles system can reach the next network hop, typically a firewall. In LiquidFiles, this is tested by hitting F2 on the console, and typing `arping 172.16.5.1`, replacing 172.16.5.1 with the ip address of your firewall, or whatever your default gateway ip address is. It will look something like this:
Please note that arping should not be confused with icmp ping, which is what we normally refer to as pinging. Translated into english, what's happening in the screenshot above is that the LiquidFiles server is asking: What MAC address should I use to connect to the ip address 172.16.5.1. And the device with the ip address 172.16.5.1 responds with: To reach 172.16.5.1, you should use the MAC address 00:50:56:C0:00:08.
Please note that it is not possible to "firewall" this. Often companies firewall ICMP echo and ICMP echo reply (ping), but this is not what we're dealin with here. We will look at ICMP further up in the network stack. If you don't see arp responses like this, there is a problem on the link layer.
Problems on the link layer includes:
- It could still be physical problems with things like cables and switches. Especially if you only see 1-2 responses out of three. If the network link is severely congested, there can be collisions on the network which causes a lot of retransmits and so on.
- A lot more common problem though is that you have connected to the wrong physical or logical network. If the cable is connected in the 172.16.100.0/24 network, 172.16.5.1 will never respond to arp queries, it is simply not there but in another network somewhere. Please reconnect the system into the correct network and try again.
- Another potential problem here is an ip address conflict. Each ip address needs to be unique and if you have two systems with the same ip address, you will run into problems.
Layer 3 - Network Layer
Testing IP connectivity
The first thing we'll test on the network layer is ip connectivity. Yes, we finally get to ping stuff. First a bit of background, consider the following very typical network:
What we're aiming to test on the Network Layer connectivity test is that basic IP routing works. For the LiquidFiles server to work in this network, it will need the following ip configuration details:
- IP Adress: 172.16.5.251
- Netmask: 255.255.255.0
- Default Gateway: 172.16.5.1
IP routing is the function of transmitting packets from one network to another. One of the most unfortunate namings in IP networking is "Default Gateway", it should really have been called "Default Router" as a router is the network function that performs routing, transmitting packets between networks. So by definition, any firewall is also a router. The only other thing that we haven't touched on is the Netmask. The purpose of the Netmask is to tell the host (the LiquidFiles server) the size of the network. So when you see like in the network diagram above a reference such as 172.16.5.0/24, the /24 means a subnetmask of 24 bits, or 8 bits + 8 bits + 8 bits, or decimally 255.255.255.0. Please see this Wikipedia Article for a list of all possible network sizes and their netmasks. It tells the LiquidFiles server that anything beginning with 172.16.5. is my network. Anything not beginning with 172.16.5. is outside of my network.
Written in plain English, this would be written out as, from the LiquidFiles servers point of view: When I'm connecting to something in my network (172.16.5.0/24) I can connect directly to that system, using the Link Layer network as described above. When I'm connecting outside of my network, I should connect via the default gateway 172.16.5.1.
One thing that can be confusing to people in the beginning of their studies of ip networks is why the default gateway can't be in another network. It's sort of like imaging yourself in room in a house with 4 closed doors in it. It doesn't help me if you tell me that somewhere there's a door to the outside world. In order to reach anywhere, you have to tell me which door to use to get out of this room (the next hop, or default gateway). Each room in a house is like an ip network, and ip routing is describing how to go from room to room.
Anyway, back to testing. What we want to test is ip routing and that the LiquidFiles system can reach the networks that it needs to reach. In the network diagram, there's only one internal network 172.16.0.0/24 besides the network that the LiquidFiles server is in. You would only need to test two points: 172.16.0.10 and something like 220.127.116.11. 172.16.0.10 because it's on the internal network and 18.104.22.168 (one of Google's public DNS servers) because it's somewhere on the Internet.
On the LiquidFiles server, you can test this by hitting F2 on the console and typing `ping 172.16.0.10` and `ping 22.214.171.124` respectively. It should look something like this:
If we get reponses like in the screenshot above, it means that ip routing works properly to all networks this LiquidFiles server needs to reach.
If there's things that doesn't work, it can be because of firewall issues. Most companies have a firewall as their default gateway in the screenshot above and depending on your security policy, ping, ICMP echo and ICMP echo reply, may or may not be allowed. For testing/troubleshooting network issues such as this, it's always good if you can enable at least outgoing ICMP echo from anywhere in your network. If you don't allow ping, this test can't be performed and you will have to skip to the next test. In that case please be aware that the next case could also include any of the ip routing related problems described here.
If ping is permitted in your network but you can't ping the external server, it's likely because of any of the following potential problems:
- Wrong ip address - unless the link layer test succeeded above. If the link layer test succeeded, there shouldn't be a problem with the ip address.
- Wrong netmask or default gateway.
- Firewall blocking the ping (icmp echo or icmp echo reply) packets.
- There could also be a problem with Network Address Translation. This can't really be tested from the LiquidFiles system, but if you can ping say the internal network but not the external, and you allow ping everywhere, it could well be related to Network Address Translation.
Testing DNS/ip naming resolution
Another important part of IP connectivity is naming/DNS resolution. DNS is obviously the function that translates names such as liquidfiles.company.com to an ip address like 126.96.36.199. The easiest way to test this from the LiquidFiles system is to hit F2 on the console and type: `ping 1.dnstest.liquidfiles.com`. It should look something like this:
There is a *.dnstest.liquidfiles.com DNS record. This means that we can use this and it's not being used anywhere else. When you needs to test multiple times, you can just increase with 2.dnstest.liquidfiles.com, 3.dnstest.liquidfiles.com and so on.
The only pontential problem here (assuming that everything has worked above) is that the DNS server is incorrectly configured or that a firewall or similar is blocking the DNS connection.
Layer 4 - Transport Layer
The Transport Layer, or TCP, can be tested in LiquidFiles by hitting F2 on the console and type: `tcping license.liquidfiles.com 80` and `tcping license.liquidfiles.com 443`. It should look something like this:
It will send TCP SYN packets to the specified host and TCP port, expecting a TCP SYN/ACK in response. If everything has worked up until this point, the by far most common problem on the Transport/TCP layer is that there's a firewall blocking the connection.
Layer 5-7 - Application Layer
Layer 5 (Session) and Layer 6 (Presentation) never really translated from the OSI model into the IP network model, but the last layer, Layer 7 - the Application layer is the equivalent of protocols like http, https and smtp in the IP network world. To test this in LiquidFiles, this is the test that is performed on the status page at Admin → Status.
To test this, please hit F2 on the console and type `httpping` or `httpsping`. This will attempt to download the data over http or https respectively and you should see a "pong" respons as per the screenshot below.
What we're actually doing here is just feeding the raw text (pong) back. One of the things that we test with this is that there's no proxy in the way that is trying to examining this as http or https and could potentially block the connection. If everything has worked up until this point, the almost only possible problem is that there's some form of proxy, or content inspecting firewall, or content inspecting IPS in the way that is blocking the connection.
Weill there you have it, a complete test of network connectivity from the cable connectivity to the application level. One of the really good things about the OSI model and use this is that you can test this from 1-7. And you know that you don't have to proceed until the current step succeeds. I.e. unless the Transport Layer (arping) test works, there zero (0) chance that the Network Layer (ping) test works. This also means that if the Network test (ping) works, there's also no need to test any of the previous layers. If the Network Layer test works, the Physical and Transport Layer tests will have to work as well. All this makes this fairly easy test if you need to go back and test again for whatever reason.