I believe this "cpu cycle stealing" all came into being when things like Xen were being developed and the programmers wanted a way to account for the CPU cycles which were allocated to another partition. I suspect the programmers were looking at it from the perspective of "my partition", where something devious and nefarious was daring to steal my CPU cycles. Thus the term "stolen CPU cycles". Just guessing though.
This "steal" term is a tad unfortunate. It's been suggested that a more gentle term of "sharing" would be preferred for customers. But digging around the source code I found the term "steal" is fairly pervasive. And what's in the code, tends to end up in the man pages. Ah well.
With Power hardware, there's a mode where the two hardware threads are juggled by the Linux scheduler. This is implemented via cpu pairs (for example, cpu0 and cpu1) which represent the schedule'able individual hardware threads running on the single processor core. This is the SMT mode (simultaneous multi-threaded) on Power.
- The term "hardware thread" is with respect to the processor core. Each processor core can have two active hardware threads. Software threads and software processes are scheduled on the processor cores by the operating system via the schedule'able CPUs which correspond to the two hardware threads.
From a performance perspective, this has tremendous advantages because the processor core can flip between the hardware threads as soon as one thread hits a short-wait for things like memory accesses. Essentially the processor core can fetch the instructions and memory accesses simultaneously for the two hardware threads which improves the efficiency of the core.
In days of old, each CPU's metrics were generally based on the premise that a CPU could get to 100% user busy. Now, the new steal column can account for the processor cycles being shared by the two SMT sibling threads, not to mention additional CPU cycles being shared with other partitions. It's still possible for an individual CPU to go to 100% user busy, while the SMT sibling thread is idle.
For example, in the vmstat output below, the rightmost CPU column is the steal column. On an idle system, this value isn't very meaningful.
# vmstat 1
procs ---- -------memory------- ---swap-- ---io--- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 14578432 408768 943616 0 0 0 0 2 5 0 0 100 0 0
0 0 0 14578368 408768 943616 0 0 0 0 25 44 0 0 100 0 0
0 0 0 14578432 408768 943616 0 0 0 32 12 44 0 0 100 0 0
0 0 0 14578432 408768 943616 0 0 0 0 21 45 0 0 100 0 0
In the next example, pushing do-nothing work on every CPU... (in this case a four-core system, SMT was on, so 8 CPUs were available...), we'll see the vmstat "st" column quickly get to the point where the CPU cycles on average are 50% user and 50% steal.
- Try using "top", then press the "1" key to see what's happening on a per-CPU basis easier..
while : ; do : ; done &For customers and technical people who were used to seeing their CPUs up to 100% user busy, this can be... disconcerting... but it's now perfectly normal.. even expected..
while : ; do : ; done &
while : ; do : ; done &
while : ; do : ; done &
while : ; do : ; done &
while : ; do : ; done &
while : ; do : ; done &
while : ; do : ; done &
# vmstat 1
procs ---- -------memory------- ---swap-- ---io--- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
8 0 0 14574400 408704 943488 0 0 0 0 26 42 50 0 0 0 50
8 0 0 14574400 408704 943488 0 0 0 0 11 34 50 0 0 0 50
8 0 0 14574400 408704 943488 0 0 0 0 26 42 50 0 0 0 50
8 0 0 14574656 408704 943488 0 0 0 0 10 34 50 0 0 0 50
I just wish we could distinguish the SMT sharing of CPU cycles, and the CPU cycles being shared with other partitions.
For more details on the process of sharing the CPU cycles, especially when the CPU cycles are being shared between partitions, check out this page where we dive into more (but not yet all) of the gory details...
1 comment:
Very helpful! Thanks for clearing this up. I couldn't make heads or tails of these CPU numbers in top. The box I'm looking at should be running at ~100% utilization at all times, but only seemed to be running at 50%... with 50% of every CPU "stolen."
Post a Comment