Skip to main content

Command Palette

Search for a command to run...

Day 14e –How to Debug Network Issues Like a DevOps Engineer

Day 14e

Updated
21 min read
M
QA Engineer transitioning into DevOps with 13+ years of experience in software testing, automation, CI/CD, Docker, Kubernetes, and cloud technologies. Sharing real-world DevOps learning, hands-on projects, and career transformation experiences.

This is Part 5 of the Networking for DevOps series.

https://90-days-devops-with-shubham.hashnode.dev/day-14d-what-really-happens-when-you-hit-a-url-step-by-step

Hands-on Networking Checks

Target Used: google.com

1. Identity

 hostname -I
ip addr

👉 These commands show all IP addresses and network interfaces on your machine

Think of your system like a house:

  • Each network interface = a door

  • Each IP address = an address assigned to that door

Breakdown of your output

1️⃣ hostname -I

172.71.56.167 172.67.5.9

What it means:

  • Your machine has multiple IPs

  • Most important one:

👉 172.71.56.167 → your main private IP (EC2 network)
👉 172.67.5.9 → These IPs are likely external/public service IPs (not assigned to your EC2 instance)

Observation:

  • System has multiple IP addresses assigned

  • 172.71.x.x → Private cloud IP range (varies by VPC configuration)

  • Shows the system has multiple IP assignments (can be due to interfaces, routing, or networking layers)

2️⃣ ip addr (this is the real detailed one)

🔹 Interface 1: lo (Loopback)

1: lo:
inet 127.0.0.1/8

Meaning:

  • This is localhost

  • Used for internal communication inside the system

👉 Example:

curl localhost

Observation:

  • 127.0.0.1 is loopback → used for internal testing

  • Always present, never leaves the machine


🔹 Interface 2: ens5 (MAIN NETWORK)

2: ens5:
inet 172.71.56.167

Meaning:

  • This is your actual network interface (like WiFi/Ethernet)

  • Connected to AWS network

👉 This is the IP used when:

  • You SSH into server

  • Server connects to internet

Extra lines:

link/ether 06:cc:88:71:71:3f

👉 This is MAC address (Layer 2 identity)


state UP

👉 Interface is active

Observation:

  • ens5 is the primary active network interface

  • Has IP 172.71.56.167 → used for external communication

  • Interface is UP and functioning correctly


🔹 Interface 3: docker0

3: docker0:
inet 172.17.0.1
state DOWN

Meaning:

  • This is created by Docker

  • Used for container networking

👉 172.17.0.1 = Docker bridge network


state DOWN

👉 No containers currently running OR inactive

Observation for :

  • Docker bridge network exists (172.17.0.1)

  • Currently inactive (DOWN)

  • Will be used when containers run


Identity Observation for hostname -I and ipaddr:

  • Multiple IP addresses observed using hostname -I

  • 172.71.56.167 is the primary private IP (EC2 network)

  • Loopback interface (127.0.0.1) is used for internal communication

  • ens5 is the main active network interface (UP)

  • Docker bridge network (172.17.0.1) exists but is currently DOWN


Super Simple Analogy

Think of it like this:

  • lo (127.0.0.1) → talking to yourself

  • ens5 (172.71...) → talking to internet

  • docker0 (172.17...) → talking to containers


Important Insight

  • Multiple IPs ≠ multiple machines

    • Shows the system has multiple IP assignments (can be due to interfaces, routing, or networking layers)
  • One machine can have:

    • internal IP

    • public IP

    • container network IP


2. Reachability

ping -c 4 google.com

Why ping -c 4 google.com instead of ping google.com?

Key difference:

🔸ping google.com → runs forever (continuous)

🔸ping -c 4 google.com → sends only 4 packets and stops

👉 In tasks/labs, we use -c 4 because:

  • It’s controlled

  • Gives quick measurable output

  • Doesn’t require you to press Ctrl + C


What actually happened when you ran ping?

Think of it like this:

👉 Your server is asking:

“Hey Google, are you reachable?”

And Google replies 4 times:

“Yep, I’m here!”


Breakdown of your output

PING google.com (142.251.46.78)

👉 DNS worked → google.com converted to IP

👉 This means DNS is working fine

64 bytes from ... time=6.07 ms

👉 Each line = one reply

  • time=6 ms → very fast (excellent network)

  • ttl=117 → TTL indicates remaining hop limit (helps estimate distance)

4 packets transmitted, 4 received, 0% packet loss

👉 This is the MOST important line

  • Sent: 4

  • Received: 4

  • Loss: 0%

✅ Network is healthy


Latency = the time it takes for data to travel from your machine → server → back to you

So it’s actually:

⏱️ Round-trip time (RTT)

In your ping output

time=6.07 ms

👉 This means:

  • Your request went to Google

  • Google replied back

  • Total time taken = 6 milliseconds

Simple analogy

Think of it like:

You send a message → friend replies

  • If reply comes in 1 sec → low latency

  • If reply comes in 10 sec → high latency

In networking terms

Latency Meaning
0–10 ms Excellent
10–50 ms Good
50–100 ms Okay
100+ ms Slow

Your result: ~6 ms → Excellent


Important distinction

Latency ≠ Speed (bandwidth)

Concept Meaning
Latency How fast data starts moving (delay)
Bandwidth How much data can move at once

Latency is the round-trip time taken for a packet to travel to the target and back, measured in milliseconds.

rtt min/avg/max = 6.07 / 6.11 / 6.15 ms

👉 Round Trip Time (RTT)

🔸 Min → fastest response

🔸 Avg → typical latency

🔸 Max → slowest response

👉 All ~6 ms → very stable


Reachability Observation

  • Successfully pinged google.com with 0% packet loss

  • Average latency ~6 ms indicating fast and stable network

  • DNS resolution worked correctly (domain resolved to IP)

  • Confirms the system has proper internet connectivity

Analogy

Ping is like:

📞 Calling someone 4 times

If they answer every time → network is good

If they miss calls → packet loss

Important Concepts

1. Ping uses:

👉 ICMP protocol (NOT TCP/UDP)

👉 Works at Network layer

2. What ping proves:

Check Result
Internet reachable
DNS working
Packet loss ❌ (0%)
Latency ✅ Good

3. What ping DOES NOT prove:

  • HTTP working

  • App working

  • Port open

👉 Only checks basic connectivity

Ping to google.com was successful with 0% packet loss and ~6 ms latency, confirming stable network connectivity and proper DNS resolution.


3. Network Path

traceroute google.com

or

tracepath google.com

What is traceroute doing?

You traced the path from your EC2 → Google

👉 And it showed each router (hop) your packet passed through.


Step-by-step breakdown

First line

traceroute to google.com (142.251.46.78)

👉 DNS worked again

👉 google.com → 142.251.46.78

Note: (IP ranges may vary depending on cloud/internal routing)

Hop 1

 1 242.16.82.235 ... ~6 ms

👉 First router after your machine
👉 Likely part of cloud provider or ISP routing

  • ~6 ms → fast, normal

Hop 2

2  * * *

👉 No response

NOT an error
👉 Router is blocking ICMP


Hop 3

3 99.83.117.221 ... ~6 ms

👉 Traffic moving through internet backbone


Hop 4

4  * * *

👉 Again hidden router (normal)


Hop 5 (Destination reached)

5  72.14.232.192
   pnseab-ad-in-f14.1e100.net (142.251.46.78)

👉 This is Google’s server

  • 1e100.net → owned by Google

  • Latency ~6–7 ms → excellent

Important things you should notice

  1. Multiple IPs in same hop
242.16.82.235 ... 242.4.194.71 ...

👉 Means:

  • Different paths (load balancing)

  • Network is dynamic

2. * * * hops

👉 Means:

  • Router didn’t reply

  • But packet still moved forward

3. Only 5 hops

👉 Very short path → because:

  • You’re in a cloud network (AWS)

  • Direct peering with Google

Simple analogy

Imagine:

Package traveling:

  • Warehouse → sorting center → highway → destination

  • Some checkpoints are visible

  • Some are hidden


Path Observation

  • Traceroute shows ~5 hops from EC2 instance to destination

  • Initial latency ~6 ms remains stable across hops

  • Some hops return * * * due to ICMP filtering (expected behavior)

  • Destination reached successfully (google.com)

  • Presence of multiple IPs in a single hop indicates load-balanced routing

Traceroute shows a short path (~5 hops) with stable ~6 ms latency; some hops are hidden due to ICMP filtering, and the destination is successfully reached.

Insight

  • If traceroute stops before destination → network issue

  • If traceroute reaches destination but app fails → app issue

  • If latency spikes at a hop → bottleneck


Path Observation

  • Traceroute shows ~5 hops from EC2 instance to destination

  • Initial latency ~6 ms remains stable across hops

  • Some hops return * * * due to ICMP filtering (expected behavior)

  • Destination reached successfully (google.com)

  • Presence of multiple IPs in a single hop indicates load-balanced routing

Traceroute shows a short path (~5 hops) with stable ~6 ms latency; some hops are hidden due to ICMP filtering, and the destination is successfully reached.

Insight

  • If traceroute stops before destination → network issue

  • If traceroute reaches destination but app fails → app issue

  • If latency spikes at a hop → bottleneck


What is tracepath doing?

👉 It shows how your request travels across the internet hop by hop

Think of it like:

A packet going from your server → Google
…and each stop in between is a router (hop)


Breakdown of your output

1?: [LOCALHOST] pmtu 9001

👉 Starting point (your machine)

  • pmtu 9001 → max packet size allowed (AWS uses jumbo frames)
1: ip-172-71-56-167.us-west-2.compute.internal 0.075ms (example AWS-style internal hostname)

👉 First hop = AWS internal router

  • Very low latency → inside same network

Note: AWS internal hostnames typically follow private IP ranges such as 172.31.x.x in default VPCs, but actual ranges may vary depending on custom VPC configuration.

(In real environments, internal IP ranges depend on VPC CIDR configuration, not a fixed standard.)


1: 242.16.82.237 6.813ms

👉 Now traffic is moving outside your local network

  • Latency increased → normal

2: no reply

👉 This is IMPORTANT:

  • Router exists but not responding to ICMP

This is NORMAL (not an error)


3: 99.83.117.221 6.222ms

👉 Another hop (likely ISP / backbone network)


4–7: no reply

👉 Multiple routers not replying


Important concept

👉 no reply DOES NOT mean failure

It means:

  • Router is configured to ignore trace requests

  • For security reasons


Simple analogy

Imagine:

You’re tracking a delivery truck

  • Some checkpoints show location

  • Some checkpoints are hidden

👉 But the truck is still moving!


Path Observation

  • Traffic passes through multiple network hops before reaching destination

  • Initial hops show low latency (~0.07 ms → 6 ms), indicating normal routing

  • Some hops show “no reply” due to ICMP filtering (expected behavior)

  • Indicates packets are traversing internal AWS network and external internet

Tracepath shows multiple network hops with increasing latency; some hops do not respond due to ICMP filtering, which is normal in real-world networks.


traceroute/tracepath is one of those things that looks complex but is actually simple once you see the pattern

Core idea

👉 Both traceroute and tracepath show the path your data takes to reach a destination

That’s it.

Simple analogy

Imagine you’re traveling from Mumbai → Goa:

  • You pass through tolls

  • Each toll = hop (router)

👉 These commands show:

“Which toll booths did my packet cross, and how long did each take?”


What is a “hop”?

👉 A hop = one router in the path

Example:

Your EC2 → Router 1 → Router 2 → Router 3 → Google

👉 That = 4 hops

Now your actual output

1 → router (AWS)         ~6 ms
2 → * * *                (hidden)
3 → router               ~6 ms
4 → * * *                (hidden)
5 → google.com           ~6 ms ✅

Biggest confusion: * * *

You’re probably thinking:

“Something is broken?”

👉 NO.

It just means:

That router is not replying

But your packet still passed through


Real-world truth

👉 Many routers (including those used by Google and cloud providers) block traceroute replies for security

So:

  • * * * = hidden hop, not failure

traceroute vs tracepath (simple difference)

Feature traceroute tracepath
Needs install Yes No
Detail level More Less
Output style Complex Cleaner
Use case Deep debugging Quick check

👉 For above task: both are fine


What you actually need to understand (not overthink)

You’re NOT expected to memorize routers.

You just need to answer:

  1. Did it reach destination?

👉 YES

  1. How many hops?

👉 ~5 hops

  1. Latency stable?

👉 Yes (~6 ms)

  1. Any failures?

👉 No (only ICMP filtering)


Path Observation

  • Network path consists of multiple hops (~5)

  • Latency remains stable (~6 ms) across hops

  • Some hops show * * * due to ICMP filtering (normal behavior)

  • Destination (google.com) is successfully reached

Traceroute shows how packets travel through multiple routers (hops) to reach the destination, with some hops hidden due to security filtering.

Final clarity

👉 You are NOT tracing the exact road

👉 You are just getting a rough idea of the journey


4. Listening Ports

ss -tulpn

What is ss -tulpn showing?

👉 It shows:

“Which ports are open and which services are listening on your system”

Think of it like:

Your server = building
Ports = doors
Services = people waiting at doors


Focus only on IMPORTANT parts (ignore noise)

You don’t need to understand every line. Just extract key ones

Key lines from output

1️⃣ SSH (MOST IMPORTANT)

tcp LISTEN 0.0.0.0:22

👉 Port 22 = SSH

  • This is how you connected to your EC2

  • 0.0.0.0 → Accessible from all network interfaces (may still be restricted by firewall/security groups)


2️⃣ DNS (local resolver)

127.0.0.53:53
127.0.0.54:53

👉 Port 53 = DNS

  • Used for resolving domains (like google.com → IP)

  • Running locally on your system


3️⃣ DHCP

172.71.56.167:68

👉 Port 68 (DHCP client) used for IP assignment


4️⃣ NTP (time sync)

127.0.0.1:323

👉 Port 323 = NTP

  • Keeps system time synced

5️⃣ Custom / unknown ports

4330
44321
44322
44323
46283

👉 These are:

  • Temporary / custom services

  • Could be:

    • apps

    • background services

    • ephemeral ports


Important concepts

🔹 LISTEN means:

👉 Service is waiting for connections


🔹 TCP vs UDP

  • tcp → reliable (SSH, HTTP)

  • udp → fast (DNS, DHCP)


🔹 0.0.0.0 vs 127.0.0.1

Address Meaning
0.0.0.0 Accessible from anywhere
127.0.0.1 Only inside machine

Ports & Services Observation

  • SSH service is running on port 22 and listening on all interfaces

  • DNS resolver is active on port 53 (localhost)

  • DHCP client is using port 68 for IP assignment

  • NTP service is running on port 323 for time synchronization

  • Multiple custom ports are listening, indicating background or application services


Multiple services are listening on different ports including SSH (22), DNS (53), and DHCP (68), confirming active network services on the system.


Simple clarity

  • Port 22 → admin entry (SSH)

  • Port 53 → phonebook (DNS)

  • Port 68 → address assignment (DHCP)

  • Other ports → internal apps


About netstat

You saw:

ubuntu@ip-172-71-56-167:~$ netstat -tulpn (No info could be read for "-p": geteuid()=1000 but you should be root.)

👉 Because you didn’t use sudo

If you run:

sudo netstat -tulpn

👉 You’ll see process names too


Observations:

Real ports from your system:

Port Service (Interpretation) Notes
22 SSH Remote login to your EC2
53 DNS resolver Local DNS (systemd-resolved)
68 DHCP IP assignment from network
323 NTP Time sync service
4330 Custom service Some app running
44321–44323 Custom ports Likely app / ephemeral services
46283 Local process Internal app (localhost only)

Ports & Services Observation:

  • Port 22 is open → SSH service is running (remote access)

  • Port 53 is open → local DNS resolver active

  • Port 68 (UDP) → DHCP client (IP assignment)

  • Port 323 (UDP) → NTP (time sync)

  • Ports 4330, 44321–44323 → custom/local services running

  • Port 46283 → local application bound to localhost

👉 No standard web ports (80/443) were observed, meaning no web server is currently running.


5. DNS Resolution

dig google.com

What just happened

You asked:

dig google.com

👉 Your system asked a DNS server:

“What is the IP of google.com?”

👉 DNS replied:

“It is 142.250.69.174


Understanding output

✅ 1. The important answer

google.com.   82   IN   A   142.250.69.174

👉 Meaning:

  • google.com → domain name

  • A record → IPv4 address

  • 142.250.69.174 → actual IP


✅ 2. Query time

Query time: 2 msec

👉 DNS responded in 2 milliseconds

✔ Very fast

✔ DNS is healthy


✅ 3. Which DNS server answered?

SERVER: 127.0.0.53#53

👉 This is important:

  • 127.0.0.53 = local DNS resolver (systemd-resolved)

  • Your system didn’t directly ask Google DNS (8.8.8.8)

  • It asked a local service, which then resolved it


✅ 4. From nslookup

Address: 142.250.69.174   (IPv4)
Address: 2607:f8b0:400a:801::200e   (IPv6)

👉 This shows:

  • Google has multiple IPs

  • IPv4 + IPv6


Key concept

👉 DNS can return:

  • One IP ✅

  • Multiple IPs ✅ (load balancing)

Google often rotates IPs depending on location.


DNS Resolution Observation:

  • google.com resolved to IP 142.250.69.174

  • Query time was ~2 ms, indicating fast DNS response

  • Resolver used: 127.0.0.53 (local system DNS)

  • nslookup also returned an IPv6 address, showing dual-stack support

  • Confirms DNS resolution is working correctly


Analogy

Think of DNS like a phonebook:

  • You search: “google.com”

  • DNS gives: “📞 142.250.69.174”

👉 Your system can’t call names — it calls IP addresses


Insight

👉 “The system uses a local DNS resolver (127.0.0.53), which forwards queries instead of directly contacting external DNS servers.”


6. HTTP / HTTPS Check

curl -I https://google.com

What you did

curl -I https://google.com

👉 You asked:

“Hey server, just give me the headers, not the full page.”


Step-by-step understanding

✅ First response

HTTP/2 301
location: https://www.google.com/

👉 Meaning:

  • 301 = Redirect

  • Google is saying:

“Go to 👉 https://www.google.com instead”


✅ When you used -IL

curl -IL https://google.com

👉 This follows redirects automatically

🔹 First response (same as before)

HTTP/2 301
location: https://www.google.com/

🔹 Second response (final)

HTTP/2 200

👉 Meaning:

  • 200 = OK

  • Final page successfully loaded

What this tells you

This single command confirms:

✅ DNS is working

✅ TCP connection is working

✅ TLS (HTTPS) is working

✅ Server is reachable

✅ Application is responding

Real-world analogy

Think of this like visiting a shop:

  1. You go to: google.com

  2. Shop says:
    👉 “We moved → go to www.google.com” (301)

  3. You go there

  4. Shop opens door
    👉 “Welcome” (200 OK)


Important headers (simple meaning)

You don’t need all, just key ones:

location

→ redirect URL

server: gws

→ Google Web Server

→ browser session/cookies being set

cache-control

→ how response should be cached


HTTP Check Observation:

  • Initial request returned 301 redirect to https://www.google.com

  • Final response returned 200 OK, confirming successful request

  • Response uses HTTP/2

  • Headers include cookies, cache-control, and security policies

  • Confirms application layer (HTTP/HTTPS) is working correctly


👉 “Using curl -IL helps trace full request flow including redirects and final response.”

👉 “curl -I quickly verifies application-layer health without downloading full content.”


7. Connections Snapshot

netstat -an | head

What this command shows

netstat -an | head
Active Internet connections (servers and established) 
Proto Recv-Q Send-Q Local Address Foreign Address State 
tcp 0 0 127.0.0.54:53 0.0.0.0:* LISTEN 
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 
tcp 0 0 0.0.0.0:4330 0.0.0.0:* LISTEN 
tcp 0 0 0.0.0.0:44322 0.0.0.0:* LISTEN 
tcp 0 0 0.0.0.0:44323 0.0.0.0:* LISTEN 
tcp 0 0 0.0.0.0:44321 0.0.0.0:* LISTEN 
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 
tcp 0 0 127.0.0.1:46283 0.0.0.0:* LISTEN 
ubuntu@ip-172-71-56-167:~$

👉 It gives a snapshot of network connections

  • LISTEN → waiting for connections

  • ESTABLISHED → active communication


Your output (what it actually means)

You got only:

LISTEN

👉 That means:

Your system currently has services waiting, but no active connections at that exact moment


Example:

tcp  0  0  0.0.0.0:22  0.0.0.0:*  LISTEN

👉 Breakdown:

  • tcp → protocol

  • 0.0.0.0:22 → listening on all IPs, port 22

  • LISTEN → waiting for incoming SSH connections


What you should notice

From your output:

  • Multiple ports are in LISTEN state

  • No ESTABLISHED connections in first few lines

👉 Meaning:

✔ Server is ready

❌ No active traffic (at that moment)


Important detail

You used:

head

👉 So you only saw top lines, not full output

👉 No active connections visible in the shown output (full output may differ)

👉 There might be ESTABLISHED connections below


Connections Snapshot Observation:

  • Multiple services are in LISTEN state (ports 22, 53, 4330, 44321–44323)

  • No ESTABLISHED connections observed in the first few lines

  • Indicates system is ready to accept connections but no active sessions at that moment

  • Output is truncated (head), so full connection state may include more entries


Simple analogy

Think of this like:

  • LISTEN → shop is open, waiting for customer

  • ESTABLISHED → customers inside the shop

👉 Your shop is open, but currently empty


👉 “netstat -an provides a quick snapshot of system networking state, useful to detect active vs idle connections.”


What is nc -zv ?

Command:

nc -zv localhost 22

Meaning:

  • nc = Netcat (network testing tool)

  • -z = just check port (don’t send data)

  • -v = show result clearly (verbose)

👉 In simple words:

“Check if this port is open and reachable”


Your Result:

Connection to localhost (127.0.0.1) 22 port [tcp/ssh] succeeded!

Meaning:

✔ Port 22 is open

✔ SSH service is working

✔ You can connect to it


Next: https://90-days-devops-with-shubham.hashnode.dev/day-14f-mini-task-port-probe-interpretation

#90DaysOfDevOps #DevOpsKaJosh #TrainWithShubham