For those that may not know, my job is to build solutions to not only give you intelligence to make more knowledgable decisions faster, but also to identify problems in your current system(s) to tell you what is working, and what isn’t. In the last few years i continually see a recurring statement when I show problems:
“That’s not really a problem. No one complains about it. If it ain’t broke…..”
When I hear, “that’s not really a problem”, what you say is “I don’t care about that”, but what you really mean to say is “it’s not important to me right now”. So, how do I make it important? Or the better question is, “how do YOU make it important?” I’ll give you an example, a company has a website where people can run searches. I found that this website has 27% of the responses to be errors (HTTP 500 status code), but no one knows because the site hides it in it’s programming and presents a page with similar searches that users use. If the error is hidden, does it really happen?
At this second, you won’t get any calls. Why? Because no one really knows an error happened. Your servers are still running, and everyone who uses your site just keeps using it. Your site developers worked hard to make sure the customer has a clean experience, as they should. So, why should you care about the errors?
Maybe the load on your servers have increased due to the errors. Maybe it’s an increase in I/O while it’s dumping stack traces. Maybe it’s a CPU load waiting on a timeout. Maybe other servers can’t handle the multi-threaded connections and is using up memory. Whatever the reason, your application isn’t running efficient. “Again….”, you may ask, “….it’s working, why should I care?”
Consistency. Period. As your company grows, as your team brings on new talent, as your architects design a bigger, badder system, they all expect one thing. Consistency. “If we do X, then Y will happen”. A formulated design based on known process standards. But, that can’t happen if your system is not working as it should. Now, you are spending more hours in the day, and more hardware to the problem, to try to workaround this issue. Or worse… you upgrade your systems just to have them fail publicly. You want your systems to be fluid and flexible, but that can’t happen if you turn a blind eye.
This isn’t just about a web server error. Maybe you have a SQL issue complaining about indexes. Maybe you have a service-account authenticating 50 times per second. Or maybe it’s a load-balancer preferring 3 servers leaving the others barely used. These are all “problems” that I have uncovered, and most of them are rarely looked at, since the application just keeps humming away.
A more accurate answer to the question “You have problems” would really be “We have more important problems to take care of.” Absolutely. I’ll buy that. You need to fix problems that affect your company’s revenue, your company’s reliability, and your company’s reputation first and foremost, then, work on the other issues. But, don’t turn a blind eye to a problem because “no one complains”, or “it ain’t broke”. Build your priorities, but don’t be ignorant. Ignorance may be bliss, but it can also bite you. Knowledge is power. Be in the know.