One of my Linux servers running RHEL3 has stange process-being-killed problem for the last a few weeks. The automount process turned out to be killed by OOM-killer in linux kernel when the system runs out of memory.
However, the system itself has 8GB memory with 20GB swap space as application server, it is unlikely to have all memory used up at the time when out-of-memory error occurs. More likely some CPU/IO intensive process hangs the systems when a fork (in automount) is requesting for memory.
We further noticed at dmp_errd_loop sometimes takes 13-20% CPU for a few hours. As there is nothing can be refered for this daemon in google, we only know it has something to do with veritas dynamic multipath. However, when this process is active, any IO query to multi-path configuration will hang (i.e. vxdmpadm).
In addition, our automount process is usually killed during mid night where the bpbkar process starts to backup data via network. Therefore, we suspect in certain cases when bpbkar starts with evil dmp_errd_loop, the IO access will be blocked and caused system refused to response to any request including fork from automount.
Consequently, there are a few solutions
- Fix dmp_errd_loop issue
- Delay the system OOM-killer
- Disable OOM-killer
We are contacting Symantec for support to understand dmp_errd_loop, and changed the kernel configuation for kernel memory overcommit option as following
# echo 2 > /proc/sys/vm/overcommit_memory
# echo 50 > /proc/sys/vm/overcommit_ratio
With this setting, the system wont report out-of-memory as long as the request is less than swap+memory*(1+ratio)=32GB. We also disabled OOM-killer to prevent automount process being killed. The kernel setting can also be set via “/etc/sysctl.conf” file.
echo 0 > /proc/sys/vm/oom-kill