Saturday, July 9

Wasting CPU cycles

A friend asked me to have a look at one of his Linux servers that have had some performance problems in the last few weeks. The machine collects and do simple data processing, mostly in SQL but there are a few shell scripts launched from cron every few minutes. When looking at the ps output there where a lot of those shell scripts running and continuously forking processes over and over, which of course slowed down the machine. Looking at the shells scripts they where quite simple, open one file and output to another. All data processing was done with the classic shell utils (cut, sed, wc, tr etc.). What people seem to forget is that bash has a quite nice array of built-in features that require way less resources and doesn't fork any child processes. I simply changed perhaps 15-20 lines a few shell scripts and the load avg. went down quite a bit.

The main features that can save CPU cycles are for variable manipulation. Things like the classic cut -d, -f1 can be replaces by ${VARNAME//,*} and wc -c can be replaced with ${#VARNAME}. It is highly recommended to study the bash man-page if you frequently use shell-scripts on production systems.

I wrote a small script to demonstrate two different ways to get fields from a CSV value (view).
Here are the results for 100.000 processed values (2x50.000).
[hali@halidell rep]$ ./fields.sh
Running 50000 tests...
Method one took 170 seconds
Method two took 5 seconds
[hali@halidell rep]$

1 comment:

Jack said...

Thank you for discussing regarding "wasting CPU cycles". Truly speaking, I never think like that. Its just like an achievement. You really did great job.
oracle ebs