The What.

JCrawler is an open-source (under the CPL) Stress-Testing Tool for web-applications. It comes with the crawling/exploratory feature. You can give JCrawler a set of starting URLs and it will begin crawling from that point onwards, going through any URLs it can find on its way and generating load on the web application. The load parameters (hits/sec) are configurable.

The Why.

But, wait a second! Aren't there already a whole bunch of tools like that? Why would anybody write a new one? You could bet there are a number of such programs in the open-source and definitely have to be some kick-ass commercial ones.

Well, that's what we thought, too. Frankly, we had no desire to write a load-tester tool. We are writing a web-portal system (http://www.digijava.org) not load-testing tools. But then we had a problem with one of our portlets that would only occur on the production server, during a high load, and none of the existing tools we tried was able to recreate it. Log-replay tools were not much help either, because the problem would occur in several hours and we needed some tool to really stress the application so it would crash in more reasonable (i.e. less) time.

We spent a lot of time trying not to "reinvent the wheel" and find an existing wheel that would help us. There was none. We tried both OSS and commercial tools. None of them gave us the kind of result we needed. So we ended up with the JCrawler.

JCrawler was irreplaceable in helping us to identify and solve the problem we had. We have released it in under an open source license because we hope it may help somebody else, too and that somebody will not have to go through what we have gone. Also, it may be a good chance for the JCrawler itself to get enhancements. We are very open to the suggestions and especially - help :) We continue to use JCrawler for testing our applications and would not mind, of course, it to get as good as it can.

The How.

What features were missing in some of the similar tools, and what is the bundle that was in none of them together? Why did not any of them work for us?

  • Crawling - A lot of load-test tools allow you to indicate a set of URLs and they would just hit these URLs repeatedly. For a complex web-application it may not make much sense. There are too many URLs, for testing any limited number of them to give much confidence. You may hope that indicating "typical" ones may help, but if you use caching and stuff - the one "typical" URL will soon be cached, whereas in the real life hitting 10 different ones like that would have a surprisingly different effect.
     
  • "Human" pattern - Most of the existing tools allow you to indicate how many threads they start up for load-testing. This isn't always an accurate way to replicate real load. People measure performance in the terms of the number of hits per second on a website, not some geeky "number of threads". Having 200 threads hit your site does not mean that you are generating 200 hits/second. The load tester tool may be generating just 2 hits per second with that setting, if your application pages are slow. Threads have to wait for the pages to load and not all pages load in 1/200th of a second, for example. We wish they did, of course but that wish is often far from the reality.

    JCrawler follows hits-per-second pattern, guaranteeing indicated load and will fire-up as many threads, as needed to keep the load constant.
     

  • Http Redirects and Cookies - Some of the tools were not able to properly handle it. These can leave your application's authentication completely untested and give you another set of surprises in the production. This is especially true if you are using a single-sign-on system of some kind, which usually employs transparent HTTP Redirects.
     
  • OSS and tested - JCrawler is open-source and comes with thorough unit-tests. So, one has reasonable confidence that it is not buggy itself, as well as the ability to fix, customize, or enhance it. Some tools we tried (especially, proprietary ones, of course) - we could not even understand how exactly they worked and it was not clear if they were doing what the documentation claimed they would .
     

What are the features that may not be critical but we like having in the JCrawler?

  • Console mode - All these graphs and pie-charts look cool and can help for a presentation to the management but when you have a real problem to solve they are not the ones that matter. JCrawler is easy to run remotely and monitor, using little bandwidth. It can be very useful if your testing point is secured i.e. if you can access your testing environment only from a limited network, hence your test tool has to be in that network, too and the only access you get is SSH.

    Also, GUI-based applications have this tendency of hanging up on you when application gets real busy, which can quickly become quite annoying. And a load-tester is definitely one very busy application.
     

  • Easy to configure - " Easy to configure - The entire configuration happens in a central XML file. You can keep different XML files in a handy place and have several configurations ready to go, whenever you need them. It may be just us, but we find using a neat XML file more convenient then jumping from one tab to another of an overloaded GUI configuration. (Also, by using an XML formatted file from the start, it will be easier to create a GUI configuration tool if others don't agree with us on the convenience of this.)
     
  • Platform independent - JCrawler is a tool for the developers and QAs. In our team, we found that people prefer different operating systems so having tool that runs on any of them was nice.
     
 
Copyright © The Development Gateway Foundation. 2004.