Where's My Memory Going?

(image: /content/07-blog/769-dtracing-hardware-cache-counters/ask-mr-bruning-logo.png)

While teaching a combined SDC, SmartOS Internals, ZFS Internals, and DTrace course over the last 2 weeks, a student noted that he thought he was seeing a memory leak while running imgadm import dataset_uuid. He was doing this on his laptop on top of VirtualBox. The command eventually failed. So, the question he wanted to answer was "What is using up the memory"? We'll also take a look at how the command failed (it was not because of being out of space).

I was able to reproduce the behavior, and if you like, you can follow along by doing the same steps on your SmartOS system; you can get the latest version here. Here are good instructions for getting SmartOS running on VMware Fusion, and a link for doing the same on VirtualBox. If you prefer to run it on bare metal, you can get the USB image from us-east.manta.joyent.com/Joyent_Dev/public/SmartOS/smartos-latest-USB.img.bz2, copy the img to a USB key, and boot your system off of the USB key. There are also links for running on VMware, VirtualBox, or from the USB key on the download SmartOS wiki page.

Assuming you've logged into the global zone as root using ssh, we'll do the following:

# imgadm availUUID                                  NAME           VERSION  OS       PUBLISHEDfebaa412-6417-11e0-bc56-535d219f2590  smartos        1.3.12   smartos  2011-04-11T08:45:00Z7456f2b0-67ac-11e0-b5ec-832e6cf079d5  nodejs         1.1.3    smartos  2011-04-15T22:04:12Z41da9c2e-7175-11e0-bb9f-536983f41cd8  nodejs         1.1.4    smartos  2011-04-28T17:33:45Z63ce06d8-7ae7-11e0-b0df-1fcf8f45c5d5  smartos        1.3.13   smartos  2011-05-10T09:25:00Z184c9b38-ad3d-11e0-bad6-1b7240aaa5fc  smartos        1.3.15   smartos  2011-07-13T11:39:22Z70856d1a-b860-11e0-9bbf-bf2a7494dd82  smartos        1.4.1    smartos  2011-07-27T15:17:41Z7abad50e-b865-11e0-8462-730bcfdde020  smartos64      1.4.1    smartos  2011-07-27T15:33:26Za0f8cf30-f2ea-11e1-8a51-5793736be67c  standard64     1.0.7    smartos  2012-08-30T21:37:30Z...6344c3f4-64d9-11e2-af3d-af797e5b19ca  hadoop         1.0.0    smartos  2013-01-23T05:10:08Z1327ed36-5130-11e2-95a8-9b5a153adf3e  ubuntu-12.04   2.3.1    linux    2013-01-24T21:31:37Z46ecf60e-52c8-11e2-b212-9b51fc749547  debian-6.0.6   2.3.1    linux    2013-01-24T21:36:28Z1d4fd806-6ef8-11e2-93ab-4f22d6d207c9  mongodb        1.4.5    smartos  2013-02-04T18:43:44Zd2fc0a96-61d1-11e2-8701-c7c137935369  chefserver     1.0.0    smartos  2013-02-04T19:30:16Z4bf9530a-7ae5-11e2-bb4e-3bad5fbc3de9  sngl           0.99.0   smartos  2013-02-19T19:43:00Z03d515f2-872c-11e2-8314-8387b5044570  base           1.9.0    smartos  2013-03-07T15:11:00Zbad2face-8738-11e2-ac72-0378d02f84de  base64         1.9.0    smartos  2013-03-07T15:14:30Z#

The imgadm(1M) is explained here, and in the manual page (currently, only in SmartOS).

There are currently 105 different images available. My student tried with an ubuntu image, but we'll try a SmartOS image.

Just to make this clear, the command does not always fail. But while it is running, memory usage always increases.

Here is the command to import the dataset.

# imgadm import a0f8cf30-f2ea-11e1-8a51-5793736be67cImporting image a0f8cf30-f2ea-11e1-8a51-5793736be67c (standard64 1.0.7) from "https://images.joyent.com"0% [                             ]  time 2.0s  eta 406.2s...

While the above is running, in another window we'll run vmstat(1M).

# vmstat 2 kthr      memory            page            disk          faults      cpu r b w   swap  free  re  mf pi po fr de sr lf rm s0 s1   in   sy   cs us sy id 0 0 0 3452628 1400208 724 1763 397 0 40 0 2052 36 -280 47 142 585 6864 2946 14 10 77 0 0 0 3183736 1080148 1 21  0  0  0  0  0  0  0  0  0  734 2807 1687 21  7 72 0 0 0 3180548 1076972 0  3  0  0  0  0  0  0  0 109 0  974 2198 2868 16 11 73 0 0 0 3177216 1073668 0  1  0  0  0  0  0  0  0  0  0  908 2887 1473 19  7 74 0 0 0 3174452 1070916 0  1  0  0  0  0  0  0  0  0  0  921 2572 1296 17  7 76...

From the above output (as with most of the *stat commands, ignoring the first line of output), the free memory on the machine is steadily dropping.

Let's see if prstat(1M) tells us anything. Basically, the memory is either being used by a process(es), for files using tmpfs (i.e., files in /tmp), or by the kernel itself (very possibly by ZFS' ARC).

# prstat -s size -c# prstat -s size -cPlease wait...   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP  3553 root      102M   92M run     59    0   0:00:08 7.2% imgadm/4  2125 root       35M   18M sleep    1    0   0:00:00 0.0% metadata/3  2489 root       33M   20M sleep   59    0   0:00:04 0.0% vmadmd/3  2164 root       19M   10M sleep   55    0   0:00:01 0.0% fmd/26    10 root       10M 8312K sleep   29    0   0:00:07 0.0% svc.configd/33     8 root     7800K 5088K sleep   55    0   0:00:01 0.0% svc.startd/12  3335 root     6976K 4208K sleep   53    0   0:00:07 1.9% sshd/1  2522 root     6868K 2256K sleep    1    0   0:00:01 0.0% sendmail/1  2432 smmsp    6568K 1560K sleep    1    0   0:00:00 0.0% sendmail/1  3370 root     6384K 3700K sleep   59    0   0:00:00 0.0% sshd/1  3504 root     6384K 3704K run      1    0   0:00:00 0.0% sshd/1  2450 root     6380K 5136K sleep    1    0   0:00:00 0.0% intrd/1  2158 root     6344K 3876K sleep   55    0   0:00:01 0.0% nscd/24    55 root     5988K 2696K sleep   29    0   0:00:00 0.0% syseventd/19   290 root     5960K 2804K sleep   29    0   0:00:00 0.0% syseventd/19Total: 60 processes, 365 lwps, load averages: 0.53, 0.18, 0.07   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP  3553 root      102M   93M sleep   23    0   0:00:08 7.3% imgadm/4  2125 root       35M   18M sleep    1    0   0:00:00 0.0% metadata/3  2489 root       33M   20M sleep   59    0   0:00:04 0.0% vmadmd/3  2164 root       19M   10M sleep   55    0   0:00:01 0.0% fmd/26    10 root       10M 8312K sleep   29    0   0:00:07 0.0% svc.configd/33     8 root     7800K 5088K sleep   55    0   0:00:01 0.0% svc.startd/12  3335 root     6976K 4208K sleep   53    0   0:00:07 1.9% sshd/1  2522 root     6868K 2256K sleep    1    0   0:00:01 0.0% sendmail/1  2432 smmsp    6568K 1560K sleep    1    0   0:00:00 0.0% sendmail/1  3370 root     6384K 3700K sleep   59    0   0:00:00 0.0% sshd/1  3504 root     6384K 3704K sleep    1    0   0:00:00 0.0% sshd/1  2450 root     6380K 5136K sleep    1    0   0:00:00 0.0% intrd/1  2158 root     6344K 3876K sleep   55    0   0:00:01 0.0% nscd/24    55 root     5988K 2696K sleep   29    0   0:00:00 0.0% syseventd/19   290 root     5960K 2804K sleep   29    0   0:00:00 0.0% syseventd/19Total: 60 processes, 365 lwps, load averages: 0.53, 0.18, 0.07...

From the above output, process id 3553 (imgadm) is by far the largest process, and its RSS (Resident Set Size, i.e., amount of physical memory in use) is growing. A closer look at this process shows that it is the node.js engine:

# pgrep imgadm3553# pargs 35533553:        /usr/node/bin/node /usr/sbin/imgadm import a0f8cf30-f2ea-11e1-8a51-5793736be67cargv[0]: /usr/node/bin/nodeargv[1]: /usr/sbin/imgadmargv[2]: importargv[3]: a0f8cf30-f2ea-11e1-8a51-5793736be67c# ptree 35532245  /usr/lib/ssh/sshd  3334  /usr/lib/ssh/sshd    3335  /usr/lib/ssh/sshd      3338  -bash        3553  /usr/node/bin/node /usr/sbin/imgadm import a0f8cf30-f2ea-11e1-8a5          3557  /usr/bin/bzip2 -cdfq          3558  /usr/sbin/zfs receive zones/a0f8cf30-f2ea-11e1-8a51-5793736be67#

So, imgadm is a node.js script. Is there a problem with node? It turns out that node.js is built with libumem.so, so we can get a dump and see if there are possible memory leaks.

# pldd 35533553: /usr/node/bin/node /usr/sbin/imgadm import a0f8cf30-f2ea-11e1-8a51-579/lib/libumem.so.1/lib/libz.so.1/lib/libssl.so.1.0.0/lib/libcrypto.so.1.0.0/lib/libkstat.so.1/lib/libsocket.so.1/lib/libnsl.so.1/usr/lib/libstdc++.so.6.0.13/lib/libm.so.2/usr/lib/libgcc_s.so.1/lib/libpthread.so.1/usr/lib/libc/libc_hwcap1.so.1/usr/node/0.8/node_modules/dtrace-provider.node/lib/libumem_trampoline.so.1# cd /var/tmp# gcore 3553gcore: core.3553 dumped#

A quick look at the core file for possible memory leaks:

# mdb core.3553Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]> ::findleaks -f -vfindleaks:                maximum buffers => 20613findleaks:                 actual buffers => 19722findleaks:findleaks:             potential pointers => 9391153findleaks:                     dismissals => 3760417       (40.0%)findleaks:                         misses => 3942678       (41.9%)findleaks:                           dups => 1668382       (17.7%)findleaks:                        follows => 19676         ( 0.2%)findleaks:findleaks:              peak memory usage => 299 kBfindleaks:               elapsed CPU time => 0.5 secondsfindleaks:              elapsed wall time => 0.8 secondsfindleaks:CACHE     LEAKED   BUFFER CALLER08619a10      35 08624b10 ?08619e10       4 0862fce0 ?0861a010       3 08623a08 ?0861a410       2 0868dee8 ?0861ba10       2 08698c40 ?------------------------------------------------------------------------   Total      46 buffers, 1688 bytes>

So basically, node does not seem to be leaking memory (only 1688 bytes). If node was growing, and we did suspect a leak, we could use DTrace to try to determine where the leak is occuring.

# dtrace -q -n 'syscall::brk:entry/pid == 3363/{printf("brk(%lx) = ", arg0); self->in=1;} syscall::brk:return/self->in/{printf("%lx\n", arg1); self->in=0;}tick-5m{exit(0);}'brk(cf02000) = 0brk(d003000) = 0brk(d104000) = 0brk(d205000) = 0brk(d306000) = 0brk(d407000) = 0## dtrace -q -n 'syscall::brk:entry/pid == 3553/{printf("brk(%lx) = ", arg0); self->in=1;} syscall::brk:return/self->in/{printf("%lx\n", arg1); ustack(); self->in=0;}tick-5m{exit(0);}'brk(d8df000) = 0              libc.so.1`_brk_unlocked+0x15              libumem.so.1`vmem_sbrk_alloc+0xa1              libumem.so.1`vmem_xalloc+0x45e              libumem.so.1`vmem_alloc+0x19f              libumem.so.1`vmem_xalloc+0x45e              libumem.so.1`vmem_alloc+0x19f              libumem.so.1`umem_alloc+0xf0              libumem.so.1`malloc+0x36              libstdc++.so.6.0.13`_Znwj+0x28              libstdc++.so.6.0.13`_Znaj+0x1e              node`_ZN4node6Buffer7ReplaceEPcjPFvS1_PvES2_+0xed              node`_ZN4node6BufferC1EN2v86HandleINS1_6ObjectEEEj+0x97              node`_ZN4node6Buffer3NewERKN2v89ArgumentsE+0x172              node`_ZN2v88internalL30Builtin_HandleApiCallConstructENS0_12_GLOBAL__N_116BuiltinArgumentsILNS0_21BuiltinExtraArgumentsE1EEEPNS0_7IsolateE+0x1d6              0xba00a336              0xba021aef              0x9f20b009              0xba00db41              0xba015e82              0x9f20f940## mdb /usr/node/bin/node> ::dem node`_ZN4node6Buffer7ReplaceEPcjPFvS1_PvES2_node`_ZN4node6Buffer7ReplaceEPcjPFvS1_PvES2_ == node`node::Buffer::Replace>

So the node::Buffer::Replace method is calling malloc(), which in turn decided it needed more memory so it called brk(2). The brk(2) system call is used to grow the virtual size of the heap. But again, node is large, but most likely not leaking memory.

And the tmpfs file system does not appear to be using the memory:

# df -h | grep swapswap                   2.1G   884K       2.1G     1%    /etc/svc/volatileswap                   2.1G    12K       2.1G     1%    /tmpswap                   2.1G    32K       2.1G     1%    /var/run# swap -lhswapfile             dev    swaplo   blocks     free/dev/zvol/dsk/zones/swap 90,1        4K     2.0G     2.0G# swap -shtotal: 171M allocated + 37M reserved = 208M used, 2.1G available#

Let's take a look at the kernel. Each of the following commands were run a few minutes apart.

# echo ::memstat | mdb -kPage Summary                Pages                MB  %Tot------------     ----------------  ----------------  ----Kernel                     180073               703   35%ZFS File Data              181992               710   35%Anon                        49035               191    9%Exec and libs                1282                 5    0%Page cache                  16457                64    3%Free (cachelist)             7591                29    1%Free (freelist)             85440               333   16%Total                      521870              2038Physical                   521869              2038# echo ::memstat | mdb -kPage Summary                Pages                MB  %Tot------------     ----------------  ----------------  ----Kernel                     181625               709   35%ZFS File Data              186546               728   36%Anon                        49028               191    9%Exec and libs                1282                 5    0%Page cache                  16456                64    3%Free (cachelist)             7593                29    1%Free (freelist)             79340               309   15%Total                      521870              2038Physical                   521869              2038# echo ::memstat | mdb -kPage Summary                Pages                MB  %Tot------------     ----------------  ----------------  ----Kernel                     187335               731   36%ZFS File Data              177882               694   34%Anon                        49040               191    9%Exec and libs                1284                 5    0%Page cache                  16459                64    3%Free (cachelist)             7326                28    1%Free (freelist)             82544               322   16%Total                      521870              2038Physical                   521869              2038#

The kernel is growing, but is not using an excessive amount of space. So, applications are growing, but not enough to cause problems. The same can be said for /tmp files, and for the kernel. The conclusion is that there is no memory leak.

My student thought a memory leak was causing the imgadm import to fail. Here is the failure message:

...74% [=====================        ]  time 372.5s  eta 130.9s{"name":"imgadm","hostname":"00-0c-29-4e-0c-94","pid":2891,"level":50,"cmd":"/usr/sbin/zfs destroy -r zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial","error":{"killed":false,"code":1,"signal":null},"stdout":"","stderr":"cannot destroy 'zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial': dataset is busy\ncannot destroy 'zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial': dataset is busy\n","cleanDsName":"zones/1327ed36-5130-11e2-95a8-9b5a153adf3e-partial","msg":"error destroying tmp dataset while cleaning up","time":"2013-03-13T13:08:05.784Z","v":0}imgadm: error (InternalError): uncompression error while importing: exit code 2#

This occured on a system where network access was slow and not very reliable. Most likely, the failure is due not to a memory shortage, but to corruption of data received over the network. We may come back to this in a future blog.

To summarize what to look for when you suspect a memory leak:

  • First, does vmstat(1M) show free memory (and/or free swap) declining over time.
  • If so, use prstat(1M) to check for large, growing processes.
  • If no processes are large and growing, check tmpfs files (/tmp).
  • If no large files under tmpfs, check the kernel.
  • If the kernel is fine, no large (and growing) tmpfs files, and no large and growing processes, you probably don't have a leak.

Note that files that have been unlinked but are still open can cause memory to be used up if these files are being written to and are in a tmpfs file system. How to deal with this will be covered in a future blog.

We offer comprehensive training for Triton Developers, Operators and End Users.



Post written by Mr. Max Bruning