Need Mavericks troubleshooting ideas…

(The post below lays out a system crash that had been plaguing my home system for a few months now. The updates are at the top – think of it like an email thread. Start from the bottom if you want the whole saga)

Update 5: OK, bug report submitted to Apple along with a consistent test case. The issue does really seem like a memory leak in the Spotlight plugin for emails, but I’ll leave it to the experts to sort out. It’s Radar rdar://problem/15695276 and I’ve submitted a copy to OpenRadar here (sans the email file, since it has real email addresses in there).

I went through and deleted every email larger than ~27MB, then turned on Spotlight indexing for mail. After that finished, I turned on Time Machine again. I haven’t seen the memory spike at all. So, this seems like the culprit.

On a related note, as I was submitting the bug report to Apple, I copied the email file to my MacBook. Immediately, it started feeling sluggish and stuttering. Looked at Activity Monitor and, sure enough, memory was going absolutely nuts. The MBP has twice as much RAM as the iMac, though, so I think it’s been able to recover when this happens (though it has locked up a few times that I can remember… probably because of this).

I’ll update if/when Apple confirms anything.

Update 4: no luck – crashed again after an hour or so. I found a bunch of other large emails still on the disk, so I think I need to clean them up. Or, I’m just wrong. Either way, more debugging later this week.

Update 3: Solved, maybe! So, I was able to narrow this down to files in the mail folder. I happened to inspect the mdworker processes in Activity Monitor and saw they were always in .emlx files around when the memory would spike. So, taking that as a clue, I told Spotlight to ignore that folder under the Spotlight Privacy tab and suddenly, machine stayed up. But… I also had to shut down Time Machine because that somehow uses mdworker or caused it to hit those folders, leading to another crash.

So, next step was to try and log what files mdworker was accessing. There’s probably a more elegant way to do this, but I ended up using fs_usage and then opensnoop, which are both part of OS X. They both let you see what files a process is interacting with while the process is running using DTrace hooks. The final command line was opensnoop -a -n mdworker | tee mdworker.log.

I then unblocked the Mail folder from the privacy settings and let Spotlight go (left Time Machine off for the first run). I let the machine crash a few times. After a few restarts, it was clear that largest mdworker processes last touched large emails that were in partial emlx (.partial.emlx) files. I manually ran mdimport against some representative files (mdimport -d4 /path/to/file) and was able to recreate the near 5GB kernel_task behavior. One email (~30MB on disk), in particular, added 7-8GB of SWAP space. It was crazy.

So, went and disabled TM & Spotlight again, went into Mail.app and tried to delete the files and kicked off spotlight again. It all worked. Time Machine just finished, too. I think this may be sorted out. Fingers crossed – will wait a few days before declaring victory.

So, the only bummer is that in my zeal to see if those emails were the issue, I deleted them before backing them up somewhere. So… no test case to send off to Apple. These sorts of emails show up now and again for me (they’re basically digests of an attachment heavy PR mailing list), so I will probably have another sample case soon.

Update 2: crashed again. FML. I posted a screenshot of the Activity Monitor at time of death: https://twitter.com/sujal/status/410482157644423168

Update: So, this last reboot, mdworker was still running, but memory was fine. Then, Time Machine kicked on and started prepping a backup. That’s when memory usage spiked and memory pressure went red. I killed the TM backup, memory returned to normal, but then a few moments later, went crazy again. Hmm – it looks like at least one mdworker is indexing Mail right now… wonder if this is a variation of the Gmail thing in Mavericks?


I’m hoping the Mac mavens among you can help me find some ideas on how to debug an issue I’m seeing now with both of my Macs running Mavericks. I’m going to file a bug report with Apple soon, but based on history, that will take a while and I really can’t deal with this for much longer. I may just downgrade.

Short summary:

  • after some amount of time, measuring in minutes to a few hours, my iMac 27″ (from 2010) will randomly freeze, hard. Tapping on the Magic Trackpad won’t do anything, hitting a key on the keyboard will sometimes get the backlight to go on, but no screensaver will be visible. Just a dark screen. Only way to recover is to reboot.

  • Disk Utility & DiskWarrior say everything is fine with the drives

  • memtest passed when I ran off the recovery partition, but running it in my normal logged in state, the whole memory pressure red/swap going nuts thing happened and the system froze.

  • the most consistent symptoms I can see pre-crash, based on logs and live monitoring using Activity Monitor are the following. This is the situation just before it crashes:
    – Activity Monitor shows memory pressure is high
    – kernel_task memory usage is listed at 4.68GB (or more, but in that ballpark)
    – mdworker has 3-4 processes running, each listed at ~500MB

So, why would mdworker make kernel_task use so much RAM?

(Also, I’ve tried resetting my spotlight cache, removing old unused Spotlight plugins… no luck)

Other observations:

  • I tried turning off Time Machine, which seemed to help. My current theory is that when this crash happens, my computer is in the high memory_pressure state, caused my mdworker, and then Time Machine kicks in trying to backup and the world just stops.

I’m trying to catch that, but I’m busy enough that I doubt I will catch it happening…

Any ideas on where to look next or something to try?

(iMac has 8GB of RAM, but today my MacBook with 16GB of RAM just exhibited the same symptoms, and has been less stable than I’d like with Mavericks… I’m wondering if it’s just more stable because it has more RAM…)

  • Tim

    Hi Sujal,

    I think I have the same problem, my Macbook 8,1 also hangs occasionally (not always) during Time Machine backups, and the symptoms are similar: high memory pressure, kernel_task usage ~4.6GB. The system becomes unresponsive until it freezes, probably because processes get killed or something. What is surprising is that the 4.6GB memory footprint is very consistent, the variation is maybe 0.2GB.

    The difference is that it does not always happen, sometimes the pressure becomes high but it recovers. It seems there is a ‘difficult task’ that requires large amounts of memory which fails if I run too many other things, but works ok when I don’t. I think this also explains why more memory helps, in which case the problem is not always fatal.

    Did you solve your problem? It’s become somewhat of a nuisance.

    Thanks,

    Tim

  • http://www.fatmixx.com/ sujal

    Hi Tim,

    Yeah, I’ve solved my issue for now by deleting all of the large emails from the server and my mac. If you try the steps above, you may be able to log what is causing the issue… That’s really the only way I could figure out how to work around the problem.

    Sujal

  • Tim

    The problem is also (probably) due to ‘large’ e-mails in my case, it seems. Whenever the memory pressure was high, `lsof | grep mdworker` showed `mdworker` was indexing ~15MB e-mails. I removed these now so I still have to check whether this solves the problem.

    In any case, not being able to properly index ~15MB e-mails without using 4+GB RAM and/or crashing is a severe bug that should be fixed asap. I also filed a report with Apple.

  • Tim

    Apple replied to my bug report and asked to capture the `sudo tmdiagnose` output when the problem occurs. Since after deleting large-ish e-mails I do not have this problem anymore, perhaps this is of use for somebody else experiencing this problem.