Recently, an e-commerce website struggled with poor application performance during peak hours. Customers complained about slow page load times and delays, causing frustration and potential business loss. The application had always run smoothly, but suddenly, without significant changes, response times worsened, and users experienced noticeable delays. We suspected CPU overload caused bad application performance, so we investigated using the top command.
This article analyzes how we used top to fix lousy application performance by analyzing CPU usage and identifying bottlenecks.
Step 1: Starting top
and Initial Observations – insight into poor application performance
To begin troubleshooting the application performance, we launched top
on the affected server:
top - 15:23:45 up 7 days, 4:01, 2 users, load average: 8.32, 6.55, 5.48
Tasks: 210 total, 1 running, 209 sleeping, 0 stopped, 0 zombie
%Cpu(s): 50.0 us, 30.0 sy, 0.0 ni, 10.0 id, 10.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16384900 total, 235200 free, 15230000 used, 650000 buff/cache
KiB Swap: 4194300 total, 1048576 free, 3145724 used. 32000 avail Mem
Key observations:
- CPU Usage Breakdown:
- 50% User Time (us): Half of the CPU is spent on user-level processes, which likely include application-related tasks like handling web requests.
- 30% System Time (sy): A significant portion of the CPU is used by the kernel for system tasks such as managing hardware and I/O operations.
- 10% I/O-Wait (wa): The CPU spends 10% of the time waiting for I/O operations (e.g., disk reads/writes), which could be a significant cause of bad application performance due to storage delays.
- High Load Average (8.32):
- The 1-minute load average is 8.32. On a server with 8 CPU cores, this would indicate full utilization. However, if fewer than 8 cores are present, the system is likely overloaded, contributing to the lousy application performance.
- Task Distribution:
- Only one task is actively running, while 209 tasks are sleeping. Most processes are likely waiting for CPU or I/O resources, which can cause bad application performance as requests are delayed.
Step 2: Deep Dive into CPU Time
To identify the cause of the poor application performance, we analyzed the CPU time in more detail.
- User Time (us) – 50%:
- Half of the CPU is occupied by user processes, likely the web server and application logic. Possible causes for the high usage include:
- High traffic causing an overload of web requests.
- Inefficient application logic or resource-heavy operations, such as image processing or unoptimized database queries.
- Half of the CPU is occupied by user processes, likely the web server and application logic. Possible causes for the high usage include:
- System Time (sy) – 30%:
- A high percentage of CPU time is used by the kernel, indicating intensive system operations. Possible causes:
- High-volume network activity.
- Intensive disk operations, which might contribute to the bad application performance if the server is constantly reading from or writing to disk.
- A high percentage of CPU time is used by the kernel, indicating intensive system operations. Possible causes:
- I/O-Wait (wa) – 10%:
- The CPU spends 10% of its time waiting for I/O operations. This often signals a disk bottleneck, which is a common cause of lousy application performance. Slow disk speeds, especially on traditional HDDs, can significantly affect response times.
Step 3: Identifying CPU-Heavy Processes
Next, we needed to identify which processes were causing the bad application performance by consuming the most CPU resources. We sorted the processes in top
by CPU usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9876 www-data 20 0 128576 65412 11020 S 30.0 0.4 100:23.55 apache2
11234 mysql 20 0 323432 243432 54320 S 25.0 1.5 240:12.12 mysqld
10345 www-data 20 0 156432 65432 11230 S 20.0 0.5 120:13.22 apache2
Observations:
- The apache2 process (web server) consumed 30% of the CPU, which could be due to either high traffic or inefficient request handling, causing the poor application performance.
- The mysqld process (MySQL database) consumed 25% of the CPU, indicating that complex database queries or poor indexing might be slowing down responses.
Step 4: Optimizing the System to Resolve Poor Application Performance
After analyzing the bad application performance, we took the following actions:
- Optimizing the Web Application: We reviewed the application code for inefficiencies, especially in resource-heavy processes like image processing or excessive database queries. Implementing caching mechanisms for frequently requested data and optimizing database queries helped reduce CPU load and improve performance.
- System Configuration Adjustments:The high system time indicated inefficient use of system resources. We optimized network settings and tuned the server’s I/O scheduler, reducing CPU usage by the kernel and freeing up resources for the application.
- Addressing I/O Bottlenecks: Since the I/O-Wait was a significant factor in the bad application performance, we upgraded the storage from HDD to SSD. This improved the speed of disk read/write operations, reducing I/O-Wait time and enhancing the web application’s overall response time.
Conclusion
This real-world scenario demonstrated how lousy application performance can stem from multiple sources, including high CPU usage by user processes, high system CPU usage, and significant I/O-Wait time. By using the top
command to identify these issues, we implemented targeted optimizations in the application and system, which successfully resolved the poor application performance.For long-term stability, continuous monitoring of CPU usage and proactive system optimization are essential to prevent bad application performance from recurring. The top
command remains a valuable tool for diagnosing and resolving performance issues in real-time.
0 Comments