SteveX Compiled » Blog Archive » Apple Handling of Non-Reproducible Bugs

Apple Handling of Non-Reproducible Bugs

The most frustrating thing about developing in Appleâ€™s ecosystem today, for me at least, is bugs that are difficult to reproduce.

I have two separate issues right now where customers write to me because theyâ€™re having a problem, and I canâ€™t reproduce that problem. In both of these scenarios, asking the customer to reboot their phone fixes the problem. Â Iâ€™ve seen other companies do the same.

This should never happen. An app should never be able to get the system into a state where OS level functionality (in my case, iCloud document sync, and email) stops working in such a way that the system needs to rebooted to fix it. Itâ€™s an OS bug.

Iâ€™ve attempted to get these bugs through Appleâ€™s Radar process, but it always seems to stop dead with the fact that I canâ€™t give then a reproducible scenario. Occasionally theyâ€™ll ask for logs, and then the bug gets closed as a duplicate.

A lot of the problems weâ€™re seeing with iOS 8 are not easily reproducible, and I wonder if this isnâ€™t a sign of a bigger problem with the bug reporting system and itâ€™s handling of problems that are difficult to reproduce.

The iMessage problem that plagued so many people for years is a perfect example. I havenâ€™t seen this happen in Yosemite, yet, but for at least two major OS releases, thereâ€™s been a problem where some users find message delivery unreliable. It wasnâ€™t just me; itâ€™s not hard to find people talking about iMessage delivery reliability issues.

How did this bug survive for so long? I donâ€™t know Appleâ€™s internal processes, but it seems like these difficult-to-reproduce problems fall through the cracks, and persist for far longer than they should.

Itâ€™s often not clear who should own these bugs. If iCloud sync stops working, whose bug is it? Â There are probably half a dozen subsystems involved here, and coordinating reproducing the bug and fixing it is no easy task. And the problem is itâ€™s probably a task with no explicit owner; it belongs to whoever itâ€™s assigned to at the time, but once they get around to investigating it and figuring out that it seems to be a bug somewhere else, the bug gets reassigned and the process starts over.

Iâ€™ve been thinking about how Iâ€™d solve this problem, and my proposal is that once an issue reaches a certain level of notoriety, it should be assigned to a person whose job it is to own that bug. Someone who is outside the various teams involved, and can follow the bug wherever it leads. This person would be the owner of maybe 5 bugs at a time, and thatâ€™s their full time job - to contact customers who are having the problem, arrange for instrumented builds to capture information about when the problem happens, whatever it takes.

Apple is suffering a pretty severe reliability hit right now with iOS 8 and all the problems that are plaguing people. Iâ€™m sure the teams are busy enough just fixing the issues they can reproduce, but thatâ€™s what makes these other issues last so long. Thereâ€™s always a bigger fire to put out than a bug thatâ€™s affecting a tiny percentage of users, but at Appleâ€™s scale, that tiny percentage of users is still a lot of people.

This entry was posted on Saturday, October 4th, 2014 at 7:30 am.