Ongoing frustrating MacBook Pro kernel panic issue

Buckle in, y’all, this is going to be a long one.

Short form summary: my current work machine is a 2019 16-inch MacBook Pro. And it has been driving me fucking spare because it keeps kernel panicking. If there are any MacBook maintenance gurus out there who could help me out with this, I’d appreciate it.

Background

This is actually my second machine from my current employer. I sent them back the original one they’d assigned me because that one was having random reboot issues, usually while I was in the middle of working in Xcode, issues which had not started happening until I upgraded it to run Catalina.

So they sent me a new one, which was also running Catalina. Only I very quickly found that that one is also prone to kernel panics.

I could send this machine back to my employer as well, but the things that complicate this matter are:

  1. My employer’s IT department is not actually local to me, they’re in Omaha, so swapping machines again would require them Fedexing me a new box, my going to the trouble to set up the new one, and then Fedexing them back the old one, and that’s a few days’ worth of work stoppage right there.
  2. I’m not a hundred percent convinced it’s purely a hardware issue, if nothing else because I have gone googling about this and I see a whole lot of cranky people posting on various blogs and forums about it, all of whom have been saying “hey this started happening to me after I installed this particular release of Catalina”.
  3. The covid-19 situation certainly isn’t helping matters either and I’d prefer to avoid having to do another Fedex swap of machines if I can avoid it on those grounds, too.

What the Machine is Actually Doing

Right now there are two situations in which I can trigger a kernel panic. Those are:

  1. If I try to put the machine to sleep, it will shut itself down about 30-60 seconds later and I’ll have to boot from scratch next time I power up.
  2. If I do not take additional steps to mitigate the problem, I can also trigger a panic at an apparently random interval after rebooting. Usually this will happen during or right after I finish booting.

The vast majority of the time, the kernel panics specifically involve the phrase “thunderbolt power on failed”.

What I Have Done to Investigate

As I am an SDET, QAing things on the computer is what I do professionally. So naturally I’ve been trying to investigate the problem, see if others are experiencing it, find out if there are steps I can take to address it myself, etc.

So here’s what I’ve done so far to try to address the problem:

  1. I have reset the SMC on the machine. Multiple times.
  2. I have also reset the NVRAM.
  3. I tried re-installing the previous release of Catalina, 10.15.5.
  4. I have booted into Diagnostic Mode to investigate at the hardware level. Diagnostic Mode thinks there isn’t anything wrong with the box.
  5. I have tried unsetting every single setting on the Energy Saver preferences panel, as per this post over on mrmacintosh.com, where a lot of other folks have reported having the same problem.

I have also attempted to verify if there are circumstances under which I can and cannot reproduce the problem.

I can consistently, one hundred percent of the time, reproduce the issue if I put the machine to sleep, either by specifically invoking the Sleep command on the Apple system menu, or else by closing the lid. What happens then is that 30-90 seconds after I do that, I hear a “whish” noise that signifies that the system has shut itself down. When I then power up again, I get an error dialog warning about the kernel panic.

I can also consistently reproduce the issue whether or not I have the machine plugged into power.

At that point, the additional mitigation steps become relevant. The only way I have been able to consistently use the machine on a daily basis is to do the following:

  1. It has to be plugged into the USB-powered fan tray Dara bought for me. Note that this fan tray’s connector is USB-A, while the machine, being a 2019 MacBook, has four USB-C ports. So I have to use an adapter to plug the tray into it. I don’t know if it’s actually relevant to the problem, mind you, but I’m mentioning it because so far it’s been the main thing I’ve had to do to make the machine actually usable.
  2. I also have to plug the machine’s power cord in on the right-side USB ports, while the tray is plugged in on the left. I’m doing this because I’m also seeing issues with the machine reliably accepting input from all four of the USB ports, which is a damn problem given that part of my daily duties involve plugging iPads into this machine and deploying builds of our app to them so that I can do my actual testing work. And right now plugging a test iPad in on the left hand side is the only way I’ve been able to get the machine to go “oh you mean THIS iPad”.

Things I Can’t Actually Usefully Do

Here are the things I’m aware of as actions I can’t actually usefully take.

Take It to an Apple Store

Because fuck you, badly handled nationwide pandemic.

Contact Apple Support Online

could do this, but I’m not convinced it’d be useful. From what I’ve seen on the mrmacintosh.com post I linked to above, a lot of people have already contacted Apple Support about the matter, and the best they’ve been able to accomplish is to get Apple go “yes, we know, it’s a software issue, our engineers are working on it”.

Also, for one thing, I’ve already done all the Tier 1 things they’d be likely to walk me through and I don’t want to waste the time. (Again, hi, I’m a QA engineer.)

For another, this is a work machine, not my own machine, so I can’t exactly send it to Apple to tell them “hey, fix the logic board” or whatever.

For a third thing, as mentioned, not actually convinced it’s a problem with the machine itself as opposed to Catalina being a raging trashfire.

I’d have to see about wrestling my way through Apple’s Tier 1 tech support to see if I could get to a higher tier engineer to talk to. I’m not feeling patient enough to try that right now, and the folks on the mrmacintosh.com post have reported a lot of that being useless anyway.

Wipe the Machine and Reinstall the OS

Well, I could do that, but again, not my machine, it’s a work box. I would probably have to get permission to completely nuke the hard drive and do a fresh install.

Plus, I’d have to take all the time necessary to set the damn thing up again with all the work-related things I need. Which, at minimum, would involve making sure I have a local Time Machine backup so that I could try to restore my data and programs from that. But the risk there is that if this shit’s getting caused by something that isn’t the OS, if I just pull stuff back out of a backup, the problem might come back anyway.

So WTF Can I Do With This?

So far this issue has persisted across three releases of Catalina for me, 10.15.4, 10.15.5, and 10.15.6. And how irritating is this? Fucking irritating. This is a top of the line MacBook Pro, running the latest release of the operating system, and yet it’s doing this batshittery to me.

And as near as I can tell right now, given that I do at least have a way to make the machine usable so I can get my damn job done, “suck it up and cope” seems like my only option.

But if there’s anything else I can possibly due to fix this, I’d really like to try it, because it’s really annoying to me that I have to plug an already large laptop onto a tray in order to be able to use it. The whole point of having a laptop, after all, is for the thing to be mobile!

So my question for you folks out there in Internet land is: is there anything else I’ve missed here that I might be able to try on my own to fix this problem? Or do I really just have to settle in for the long haul until Apple pulls its head out of its ass and fixes this, ideally before they deploy Big Sur this fall?

EDITING TO ADD

7/18/2020: New information, as of seeing additional new commentary coming in on the mrmacintosh.com post I’ve been following:

Someone on that post recommended trying a tactic of using the Sleep command on the system menu, then waiting a short amount of time before actually closing the lid, unplugging things, etc.

I can now report that by doing this, I can successfully put the machine to sleep and bring it back out of sleep again later. So it seems like there has been at least some mitigation of this issue in 10.15.6.

I cannot yet report on whether I’m out of the woods yet in regards to the kernel panics I periodically get using Xcode. Since we’re heading into the height of summer, I am very leery about abandoning my current strategy of using the fan tray even if I find it annoying. The machine does still spike up considerably in temperature if I’m doing heavy work in Xcode, especially if I’m doing an active build.