Please post your Web Driver questions in official Web Driver forum

Monday, February 29, 2016

Performance Testing Checklist

Questions to ask

  • Have performance test workflow(s) been identified?

  • Has sanity functional tests been successful. You would not want to carry out performance tests on an application which is functionally broken.

  • What does the target system (hardware) look like (specify all server and network appliance configurations)?. Monitor load average, cpu, memory etc on all servers.
These can be easily monitored using top command on unix system.
Don’t forget to monitor health of test agents. load average, cpu utilization, IO, Memory usage are the least you should monitor on test agent. Many a times you would encounter that you hit system limit on load test agent which degrades test results.

  • Is test environment same as live environment else test results would have to be extrapolated

  • Has benchmarking been done ? (The objective of Benchmark tests is to determine end-to-end timing of various critical business processes and transactions while the system is under low load with a production sized database.)

  • Have performance test requirements been identified? They could entail -

    • Response Time -
      • Is response time with respect to workload identified?
      • Is it browser render time or delivery time to browser?
      • Are excluded components for ex - calls to 3rd parties beyond the control of system developer defined?
      • What is acceptable error rate during the measurement of response time

    • Workload Definition -
      • What is workload pattern? for ex begin with 1 user, add 1 new user every 5 seconds and extend to 100 users. Notice that if transaction completion time is too short and test is not to be run for longer period repeatedly then you may never have all users up and running at any time since initial set of users would have got past before last user begins transaction
      • Duration of test?

    • Transactions -
      • How many transactions should be completed during load test. This is also known as Throughput.

  • Have type of tests been identified ?  
    • Stress test, Targeted Infrastructure test,
    • Soak test (endurance test),
    • Volume test (i.e. performance test with large database size etc),
    • Failover test (start failing components (servers, routers, etc) and observe how response times are affected during and after the failover and how long the system takes to transition back to steady state)
    • Network Sensitivity tests (measure the impact of that traffic on an application that is bandwidth dependant)

  • Have performance measure characteristics been identified? for ex -

The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five-, and fifteen-minute periods.
      • CPU, broken out by process including I/O wait time, user/system time, idle time >
How does your total CPU usage compare to load average?  If you’ve routinely got a load average of 4 but your CPU usage is always under 50% (aggregated across all cores), then you’ve got some disk or network bottlenecks that aren’t letting you take advantage of all your cores.
      • Memory usage, broken out by process, and used, cached, free >
What does your memory usage look like?  Is free + cached memory very close to zero?  Most apps, daemons, etc. will work much better with a sizeable disk cache.  You don’t want to completely exhaust system memory or you’ll start swapping to disk, and that’s very bad. How much free (unused, non-cached) memory do you have?  How does this vary over time?  Tune your processes to use that free memory.  But keep enough (a small margin, perhaps 10% of total) in reserve for sudden spikes.
      • Disk activity, including requests and bytes read/written per second
      • Network bytes read/written per second
      • Is your web server dumping nearly a MB/sec to disk during normal operations?  That could be some poorly tuned logging from apache or one of your applications.  Turn that chattiness down to get more performance.
      • Server Response Time - This refers to the time taken for one system node to respond to the request of another. A simple example would be a HTTP 'GET' request from browser client to web server. In terms of response time this is what all load testing tools actually measure. It may be relevant to set server response time goals between all nodes of the system.
some performance data collection tools - performance data, I like collectd, RRDTool, DStat, and IOStat.

You may like to learn more about linux performance management commands

    • client-side, perceived performance-
      • distribution of response times (or at least mean, median and 90%) Load-testing tools have difficulty measuring render-response time, since they generally have no concept of what happens within a node apart from recognizing a period of time where there is no activity 'on the wire'. To measure render response time, it is generally necessary to include functional test scripts as part of the performance test scenario. Many load testing tools do not offer this feature.
      • counts of successful (probably 200 OK) and failed (anything else) responses
      • throughput in total time to run a certain number of reports
There are plenty of tools and packages out there.

    • server-side errors and per-request details-
      • You’ll almost certainly uncover some errors under load.  You’ll want to make sure your application (and other server processes) have a reasonable amount of logging.  Debug logging could result in lots of unnecessary disk writes, so be sure to turn those off.  But it’s certainly okay to log errors for perf tests and in production. It’s also a good idea to have Apache request logging, including timing turned on so you can see responses the server gave out, and the time to process them.  This will back up what you’re recording at the client.

Monday, February 15, 2016

JMeter Backend Listener - 16th JMeter Training Video

This is 16th JMeter training video Session. You may like to watch all previous JMeter training video sessions before continuing with this video.

This JMeter Training session covers Live Test Run Reporting -

1. Setup JMeter BackEnd Listener

2. Test Plan
Deep dive into test plan

3. Enable View Results Tree listener and Run Test
Check that data is pushed in influx db > jmeter database - http://localhost:8083/

4. Disable View Results Tree listener and Run Test from command line.

5. Grafana Dashboard
Analyze Grafana Dashboard - http://localhost:3000/
Edit board and view query

6. Off line Analysis
One test run is over then update time stamp and view result of past run

7. Project files can be downloaded

You will find following files -

Test Plan
Grafana Dashbaord json file > to import and create Grafana dashboard

16th JMeter training session video can be watched online.

Saturday, February 6, 2016

Drop in application throughput on AWS (Mystery Unresolved)

I have posted a question on stackoverflow about sudden reduction in application throughout. I also follows up on this conversation with AWS support and Following is the gist of conversation with AWS Support -
tl dr:
AWS instances my team was using was not set with enhanced networking capabilities to be able to get max n/w performance (i.e. 10 gigabits on c4.8xlarge instance). For example ixgbevf on test-aws-am was not set to 2.14.2.
Long Version:
The test in question is a static html page. GET request, no complicated logic, No EBS, DJ-CTS etc.
  • What n/w capabilities should we experience when instance does not satisfy enhanced networking capabilities?
    AWS Support: We really do not have specific numbers because that varies... by a lot of factors (time of day, other instances sharing network in same location.. and several other factors... What we do, is to advice customers to do benchmarking tests to confirm that the instances meet the performance expectations... for the applications
  • During the test 18430 KiloBytes/sec data is transferred which is way under the limit of 10 gigabits n/w.
    AWS support has been insistent on probability of throughput throttle being an application issue.
    AWS Support: Through our testing we have eliminated ELB as a potential bottleneck for the drop in throughput and we know that the issue is occurring at the back-end instances. Looking at the back-end instances, the general performance metrics such as CPU utilization, Network In, Network Out etc for both the instances looks good. This indicates that the issue we are facing could be an application issue. could you please add another instance and try the test again?
    I can not confirm on this since there have been zero errors during multiple iterations of test runs. Once tests hit the lower limit on throughput then any subsequent test run shows the results in the lower range of throughout (about 8000 requests/sec) but if I wait for few hours (about 2 to 3 hrs) and run test then it is back to same behavior that is higher throughput for about half hour and then back to 8000 requests/sec. I excluded the possibilities of adding another instances as testing without ELB (described below) exhibits same behavior.
  • ELB Prewarm did not help on addressing reduced throughput. Moreover warm up is time bound and it scales down after some time. The warmup rather showed the more skewed results and throughput dropped in 5 minutes than without it - 

Before Warmup >>

After Warmup >>

  • Different instance type for test agent from m4.4xlarge to m4.10xlarge and c4.large have shown same behaviour of throttle in application throughput. Test runs with ramp up period of increasing threads hold same behavior barring that it reaches the higher limit after greater amount of time attributed to the ramp up period and not half an hour
  • Testing without ELB exhibits same behavior except that the high throughput would be in the range of 8000 requests/sec and then it drops to about 4000 requests/sec. Despite tests use custom DNS resolver; this further excludes any anomaly caused by probable DNS caching of ELBs.

Mystery of drop in application throughput was never solved.

Later on I got to know from in house n/w engineers that it is not only enhanced n/w capabilities but also placement group which is required to be able to use 10 gigabit n/w performances. Unfortunately placement group is limited to having instances in same availability zone. Is not it risky?

Monday, February 1, 2016

Graph Generator Listener - 15th JMeter Training Video

This is 15th JMeter training video Session. You may like to watch all previous JMeter training video sessions before continuing with this video.

This JMeter Training session covers Graphs Generator Listener -

1. JMeterPlugins-Extras is for Graphs Generator Listener (from JMeterPlugins-ExtrasLibs jar)
2. Graphs Generator Listener generates the following graphs at end of test or previous test - :
    Active Threads Over Time
    Response Times Over Time
    Transactions per Second
    Server Hits per Seconds
    Response Codes per Second
    Response Latencies Over Time
    Bytes Throughput Over Time
    Response Times vs Threads
    Transaction Throughput vs Threads
    Response Times Distribution
    Response Times Percentiles

3. Generate CSV / PNG for current test results
    Requires usages of View Results Tree or Graphs Generator Listener    
    Not recommended !

4. Generate CSV / PNG for existing/previous test results

15th JMeter training session video can be watched online.
Fork me on GitHub