Guest Post: SOASTA's Guide to Load Testing Production Apps in the Cloud

This is a guest post written and contributed by Fred Beringer, VP Business Development EMEA at SOASTA. SOASTA provides load and performance testing products as well as solutions available as on-demand cloud services.

Common wisdom used to hold that you can’t load test live applications, and any testing during development wouldn’t accurately reflect real world conditions. So, it’s not surprising that more than 75% of companies today don’t adequately test and validate the scalability of their apps. Everyone knows that load testing is the best way to ensure production systems can handle live traffic, and to identify bottlenecks before they cause catastrophic failure, but the costs and risks often outweighted the benefits... until now.

In partnership with Joyent, Soasta is offering the Soasta CloudTest Lite suite to all Joyent customers for free. Now you can integrate load testing as part of the design process, from development through launch, and load test live apps in production to spot things that will break under heavy load long before your customers get to that point.

Let's take a look at the typical rationales for not load testing. Believe me, we hear these all the time.

  • "We have no way to replicate realistic volume of traffic. But we're very smart! We test at low volume in our test lab that sort of replicates our production environment…and then we extrapolate our results to account for our production environment. We should be ok based on our smart math."
  • "We've invested in the latest and greatest Load Balancers. It's going to manage our traffic in the best possible way, pick and choose automatically the right web servers to send traffic to! We don't even need to care about round robin, weighted round robin, least connections, least response time etc. These algorithms are handled automatically by our shiny load balancer."
  • "We have varnish servers. I mean, c'mon. We're talking massive HTTP acceleration here."
  • "We're using a bunch of CDNs to manage our static assets. No problem there, they're going to handle this for us."
  • "No way we're going to test on our production system! We're not crazy! We don't want to impact our live customers with additional traffic. Forget it! On top of it, we have no way to monitor the performance results of the test in real-time."
  • "Did we talk about our massive bandwidth? Well, it's massive. Enough to handle what we need."

Fair enough. Now let's test some of those assumptions with a look at real-world load test exercise experience of a leading eCommerce website in Europe selling the discontinued, but HOT, HP TouchPad. 1000 units at 99 Euros at a 7am sale. This is an Akamai snapshot during the peak. They topped 40k requests/sec (around 800k hits/sec) and had to handle 100 000 visitors in less than 10 minutes. Here are the results of the sales?

  • Their DNS servers were failing 10 minutes before the sale and 20 minutes after. Very unusual behavior for a DNS servers which are supposed to handle more than 1 million requests per second.
  • Their firewall was at burnt out while trying to handle a 100k concurrent connection.
  • Their varnishes servers were on their knees rapidly.
  • Average bandwidth was 2 Gbps with peaks at 4Gbps.
  • Traffic was 5 times the typical sales traffic.
  • Their Ad servers were dying and slowing down the whole site (the whole page couldn't load if ads were not first displayed. Very poor design if you ask me...)

This is unfortunately a typical case of customer doing only internal testing, at low volume, and extrapolating results with a finger in the air. And they get sorry when they leave 99,000 screaming at their door. Bad business. Bad reputation. There are no valid reasons to be in that situation today. None. (If you want some additional details about this failure, you can read a more detailed article here.)

If you're scared at this stage, this is normal. After all, you thought load testing in a lab was sufficient. So, as application developers, what can you do to avoid such a nightmare?

Test the performance of your application early and as often as possible

I'm assuming you're doing unit testing at each of your code check-in? You should be able to run continuous performance testing as well (there are no valid reasons that this should not be the case). This is where you'll ensure a solid foundation for your application:

  • Identify new memory leaks, unoptimized SQL queries, slow pages etc.
  • Simplify your page's design and minimize the number of HTTP requests ie. Combine files, make use of CSS sprites and image maps etc.
  • Identify bad design in your pages ie. CSS and script placement (CSS goes at the top, scripts at the bottom, remember?), inline CSS and Javascript versus external loading for cache optimization etc.
  • Understand the benefit of javascript and CSS minification.
  • Identify and fix any HTTP redirects that slows down the whole performance of your application. A typical redirect occurs when a trailing slash is missing at the end of a URL for example. Easy fix would be to use Alias or mod_rewrite in Apache. But if you don't test, you don't know what it is you need to fix!

Test as often as possible on your production environment and don't be scared!

It's going to be ok, really. As you know, it is very easy to segment a portion of the live environment during a low-traffic period and allow for testing in this environment. Typically, a separate IP address is configured on a load balancer and servers are moved out of the live pool and placed into the test pool. Sometimes, configurations changes need to be made to the application servers in this cluster to point to new databases and be taken off of other shared components. This is a more costly approach because it requires extra hardware and the associated maintenance overhead. It's also less reliable because you start to deviate from the actual production configuration and you cannot test at true scale. It is still, however, a more realistic test than simply testing in the lab.

Code for testability

A lot of developers are scared to test on production system, even at a low volume, because they don't want to screw up some part of the system. A typical example would be the order placement of an eCommerce application. You definitely don't want to place an actual order! One benefit of being able to control the entire payload of a request with a product such as CloudTest Lite is that an engineer can add anything in the message needed to support testability. They have the flexibility to create messages that wouldn't normally show up through typical use of the application. One can programmatically set cookies values, modify headers, and change query strings or post data in the HTTP request. Marking a request is called "Transaction tagging". A tagged transaction will get as far as it needs to go to achieve adequate test coverage, possibly moving to alternate code paths. Tagging a transaction and handling it properly can ensure that payments don't get processed, orders don't get fulfilled, and that ultimately thousands of test orders don't end up getting shipped to an unfortunate engineer's doorstep.

Test your application at expected and unexpected volumes in your production environment, with traffic coming from outside your firewall.

  • This is the ONLY way you're going to understand how your load balancer is going to handle real traffic.
  • You'll validate the performance of your CDN provider. Are you using the right geographical location? Are you files properly included in the CDN? (you'd be surprised to see how many customers are shocked that their static assets are delivered from their own web servers instead of their very expensive CDN…)
  • You'll ensure that your bandwidth matches with your traffic prediction and you'll know, for certain, how much traffic you can handle. Bandwidth is often underestimated and overlooked.
  • You'll expose your whole application ecosystem. The Ad Server you don't necessarily control, payment gateway, media server if you serve music and video etc. If you test at low volume and from your lab, you have no way to understand how they will behave when you go live.

SOASTA CloudTest Lite has been released to help as many developers and testers as possible. It's an absolutely free product (same codeline as our enterprise class Cloudtest Platform that leverages the cloud to generate load) and will help you optimize the performance of your application throughout its entire life-cycle. What do you get when you install the free download?

  • A patented web-based UI allowing you to create performance scenarios for HTTP/HTTPS, SOAP, REST, FLEX, FLASH etc. which are used in your web or mobile application. Are you worried because your application is AJAX based? Please stop worrying as most of our users leverage CloudTest Lite for their AJAX application!
  • A real-time analytics engine. If you've been doing load and performance testing before, you probably know that there aren’t any tools able to deliver real-time analytics. With CloudTest Lite, under the hood you get an in-memory OLAP engine dedicated to performance analytics. It allows you to combine, aggregate, correlate, performance data coming from your application and infrastructure placing them under the same time-line. Because you get this information in real-time, you don't have to wait hours to get your performance analysis and your test-cycle is reduced to a bare minimum. You can monitor the performance of your application as well as its underlying infrastructure.
  • A very easy and fast way to model and launch your performance scenario regularly
  • A scripting language that can accommodate the most complex scenarios. A scripting language you all master since it is Javascript!
  • A product to test your web and mobile application, up to 100 concurrent users, free forever! It can easily integrate with your CI and when you're ready to scale, we can take your performance scenario and run them in our global cloud. This is when you'll truly understand how your application and infrastructure will behave under realistic load.

We've tried to make it as easy as possible to get you started. Follow these easy steps and you'll be up and running in less than 15 minutes!

If you need additional information about our product, don't hesitate to contact us at info[at]soasta.com. You can also interact with us via Twitter on our CloudTest account.



Post written by alexsalkever