You want to see if your Linux (or Solaris, MacOS) is fully loaded. You find "load averages" is always low from "top", "uptime", e.g.

Processes: 64 total, 3 running, 61 sleeping... 277 threads 14:09:35
Load Avg: 0.36, 0.37, 0.35 CPU usage: 11.48% user, 5.26% sys, 83.25% idle

If you search internet, you may find a few interpretations (some of them are contradictory):

  • Load average denotes the number of processes running or waiting for resources captured in 1 minute, 5 minutes and 15 minutes
  • Load average measures cpu utilization trends (i.e. both cpu utilization in % and demand for cpu resources) - calculated as the moving average of number of Linux processes in run queues
  • Load average = sum of run queue length and the number of jobs running on the cpu (Adrian Cockcoft)
  • You can consider load average 1 = 100% for single cpu. Example: load average 3 denotes 75% of your 4-cpu is utilized
Gotcha's:
  • Load averages are NOT cpu utilization but the total queue length; they are NOT really trend!
  • Load averages are point samples of 3 different time series; they are exponentially-damped moving averages.
Interpreting the Load Average (Rule of Thumb)
  • if you have 4-cpu (quad-core) with load average 0.2, your system has lots of computing capacity available; if you have load average > 3.9, then your system is busy with many threads waiting in the run queue. You need to upgrade the system.
  • If load average > # of cpu * (1+15%), then upgrade system (e.g. vertical scaling)
  • production load average < # of cpu * 2

REFERENCES
http://www.teamquest.com/resources/gunther/display/5/index.htm
http://www.linuxjournal.com/article/9001
http://www.lifeaftercoffee.com/2006/03/13/unix-load-averages-explained/
http://luv.asn.au/overheads/NJG_LUV_2002/luvSlides.html
http://www.webhostingtalk.com/showthread.php?t=230076
I am running ESXi 3.5.1 on Sun x4450 hardware.

I have noticed that after booting VMs the load average reported is skewed way high (like in the thousands). Since load takes a long time to average out down to 0 from a spike this is causing all of our VMs to be flagged with a problem by our monitoring system. Is there any patch to either soalris or ESXi that addresses this? For what its worth I'm not running vmware tools on the clients but I did install it on one to test if it helped this issue and it didn't seem too.

Thanks in advance!