Linux system administrators should be proficient in Linux performance monitoring and tuning. This article gives a high level overview on how we should approach performance monitoring and tuning in Linux, and the various subsystems (and performance metrics) that needs to be monitored.
To identify system bottlenecks and come up with solutions to fix it, you should understand how various components of Linux works. For example, how the kernel gives preference to one Linux process over others using nice values, how I/O interrupts are handled, how the memory management works, how the Linux file system works, how the network layer is implemented in Linux, etc.,
Please note that understanding how various components (or subsystems) works is not the same as knowing what command to execute to get certain output. For example, you might know that “uptime” or “top” command gives the “load average”. But, if you don’t know what it means, and how the CPU (or process) subsystem works, you might not be able to understand it properly. Understanding the subsystems is an on-going task, which you’ll be constantly learning all the time.
On a very high level, following are the four subsystems that needs to be monitored.
- CPU
- Memory
- I/O
- Network
1. CPU
You should understand the four critical performance metrics for CPU — context switch, run queue, cpu utilization, and load average.
Context Switch
- When CPU switches from one process (or thread) to another, it is called as context switch.
- When a process switch happens, kernel stores the current state of the CPU (of a process or thread) in the memory.
- Kernel also retrieves the previously stored state (of a process or thread) from the memory and puts it in the CPU.
- Context switching is very essential for multitasking of the CPU.
- However, a higher level of context switching can cause performance issues.
Run Queue
- Run queue indicates the total number of active processes in the current queue for CPU.
- When CPU is ready to execute a process, it picks it up from the run queue based on the priority of the process.