Performance Bottlenecks and Software

Published Oct 29, 2019

Now that we've covered some of the ways to address application performance in hardware, let's review some ways that we can better design and implement software to avoid excessive load in the first place. Remembering that our biggest concern is exhaustion of a shared resource, it's clear that if we're not able to increase the amount of resources available (via hardware), then we need to consider making more efficient use of the resources that we do have.

Processor

Processor bottlenecks occur when your code is trying to perform too many operations at once. Given the relative speed of processors today, this is often not where problems arise, however it can definitely cause problems – and when those problems arise they can be difficult to address.

For example, one common place you'll see bottlenecks is in applications performing complex mathematical modeling, or simulation. Of course, since the entire purpose of these applications is to handle these demanding tasks head-on, there's often no good way to work around such bottlenecks directly. Performance improvements here will often require substantial “domain knowledge” – that is, understanding of the specific problems being solved – and an intimate familiarity with how the code is designed and implemented. (Such improvements are outside the scope of this post.)

However, in the general case, it's worth considering if any operations can be “parallelized.” Essentially, the goal with parallelization is to break down a complex task into a smaller set of tasks that can be performed independently, and run on as many different threads or cores are available at a time. This way, instead of performing a number of tasks “in order” – thus having to wait for each one to complete before beginning the next – all of the available hardware can be leveraged by starting tasks as soon as more resources become available.

Disk

As we noted last time, bottlenecks at the I/O level can be very troublesome simply because these subsystems are, relatively speaking, orders of magnitude slower than your CPU. The simplest tasks running on the fastest processors can still take a very long time, if each of these tasks has to retrieve a substantial amount of data from disk.

One simple way to ensure high-performance code is to cache your data in-memory. If you know that a large block of data will be needed soon, or if a small block will be needed many times, then fetching it early in the process will help minimize the amount of time your code will be waiting.

You might also consider “recalculating” data that would otherwise be read from disk. To understand this, imagine you have a complicated math problem to solve. You know that the answer is written down in a reference book down the hallway, but it will take you several minutes to walk to the storage room and find the book you're looking for. If the problem would only take you a few seconds to answer by plugging it into your calculator, then that would be a far more effective way to get the answer. Similarly, avoiding lengthy disk reads by reconstructing the necessary data in-memory can (in some cases) help keep performance up.

RAM

The bottleneck with RAM is often not a speed bottleneck directly, but rather a quantity bottleneck: when too much information is loaded into memory, some of that data will often be “paged” out to a disk, which introduces the delay inherent to disk I/O.

Thus, the goal here is simply to ensure that you're not using more memory than you need. What this boils down to is, only store information you know you'll need for as long as you know you'll need it. Avoid precalculating large blocks of data that you may only partially utilize, and ensure you're releasing that data as soon as you're done with it.

In fact (much like parallelizing tasks to reduce processor bottlenecks) even if you are going to be operating on all parts of a large dataset, it's often better to calculate small chunks that can be discarded quickly than to keep a large block allocated for a long period.

Network

The nature of the network layer (inasmuch as it may be largely or even entirely outside of your control) means that it can be one of the hardest to address from a hardware perspective. As such, it's particularly important to recognize possible software approaches to minimizing these bottlenecks.

As we discussed last time, there are actually two separate issues you may encounter with network I/O. The first (usually lesser) question is one of bandwidth. After all, the more data you have to move, the longer things will take. Fortunately, simple problems have simple solutions: make sure you're not sending any more data over the wire than you actually need!

The second issue is latency – that is, how much time elapses between when the request for data is made, and when the corresponding request is received. When you're talking about a request that has to move across the public Internet, traversing multiple routers and hitting servers that may be hundreds or thousands of miles away, the milliseconds add up quick – and your users will notice.

There are two primary strategies we can use to address network latency issues. First, sequential requests should be avoided like the plague! Any time you have a “back-and-forth” between your code and a remote service, you're multiplying the latency. What could be a small, manageable delay can quickly become unusable. This is where “batching” requests can come in handy: by combining multiple requests into a single “trip,” the amount of waiting can be substantially reduced.

For requests that can't be batched, it may be worth investigating an “asynchronous” approach. That is, instead of sending a request and then waiting some indeterminate amount of time for the response to be received before continuing, it may be possible to continue immediately, and then come back to “fill in” the appropriate parts once the data is finally received.

Conclusion

Throwing more hardware at a performance problem may work in the short term, but in many cases it's far more beneficial to “go back to the code” to identify and fix performance issues at the source. Finding those bottlenecks can be a difficult task, but the teams at Smart Software Solutions have the experience and expertise necessary to help make sure you're getting the best possible performance for your application.

Processor

Disk

RAM

Network

Conclusion

Want to Learn More?