Technically Creative

Friday, September 7, 2018

Hystrix Circuit Breaker and Execution Timeouts

Hystrix is a latency and fault tolerance tool developed by Netflix OSS. Included in the library is an implementation of the very useful Circuit Breaker pattern that can be easily folded into Java applications.

Since Hystrix provides much more than just circuit breaker functionality, it can be easy to overlook the impact that execution timeouts can have on @HystrixCommand methods.

Let's take the following example:

public class Cashbox
{
  @Autowired
  private DepositService service;
  
  @HystrixCommand(
      fallbackMethod = "depositLater"
      commandProperties = { 
        @HystrixProperty(name="circuitBreaker.errorThresholdPercentage", value="1"),
        @HystrixProperty(name="circuitBreaker.requestVolumeThreshold", value="1"),
        @HystrixProperty(name="circuitBreaker.sleepWindowInMilliseconds", value="10000"),
      }
  )
  public boolean deposit(BigDecimal amount)
  {
    return service.makeDeposit(amount);
  }

  public boolean depositLater(BigDecimal amount)
  {
    return service.queueForDeposit(amount);
  }
}

While the circuit is closed, a call to deposit will attempt the make a deposit via a backing service. As you might expect, should the service throw an exception, the circuit will open. In this case the depositLater method will be called and an attempt made to have the service queue the deposit for later. Since the method with the @HystrixCommand annotation does not return an AsyncResult, all of this execution will be done synchronously.

So, you may think that you can put on your single-threaded hat and ignore any potential concurrency problems. Think again! All HystrixCommands are executed on their own thread, despite the synchronous nature of the call.

What may not be immediately obvious here is that if the service's makeDeposit method takes longer than 1 second, the depositLater method will be called as well. This is because all @HystrixCommand methods have an execution timeout of 1000 millis enabled by default.

To disable this execution timeout, you can add the following Hystrix configuration property:

@HystrixProperty(name="execution.timeout.enabled", value="false")

Hystrix is a fantastic open source contribution and provides a wealth of functionality for handling latency and fault-tolerance. When adding a circuit breaker to a method that mutates a shared resource (e.g. account, file, record), don't ignore the complexities of concurrency.

Tuesday, June 28, 2016

Testing Webapps with Docker and Selenium

Functional testing of web applications within your continuous integration pipeline has huge advantages to streamlining delivery. Often considered the final vote before promotion of a build, function test suites constantly flex the customer use-cases and protect the user experience from regression. Testing every user workflow in an application sequentially takes time however -- a lot of time -- and only continues to grow.

In order to scale a functional test suite, the work must be divided into parallel processes with the full stack of each isolated from the rest. At the CI server, this amounts to having a centralized Master Job that invokes parameterized Child Jobs in a fork-join pattern (using Jenkins Multijob Plugin for example) to install the application and run the tests. With each Child Job being assigned a portion of the total tests, the eventual join provides the Master Job the ability to collect the test results and coverage metrics from all jobs for centralized reporting.

Docker provides a great platform for isolating the necessary components of each Child Job. If you are already using docker images as your managed build artifacts, this process is a piece of cake. If instead your artifacts are in another form you would need to install those artifacts into an isolated db/app server pair.

Thanks to the fine contributions of folks at SeleniumHQ, the docker-selenium project provides the docker images needed to setup a selenium grid with ease. A centralized Selenium Hub container listens on a specific host port for Selenium API calls from your tests (over JSONWP). From there, the hub forwards the commands to one of several Selenium Node containers that have registered with the Hub. The Node is responsible for managing a Selenium session, browser profile, opening the browser, and any other browser interactions.

First, to create the Selenium Hub:

[root@localhost ~]# docker run -d -P --name selenium-hub selenium/hub
44274517aebdf04a6517e604275fced2b0db134375c631241da2644dcdb077b5
[root@localhost ~]# docker ps
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS              PORTS                     NAMES
44274517aebd        selenium/hub                  "/opt/bin/entry_point"   10 seconds ago      Up 5 seconds        0.0.0.0:32770->4444/tcp   selenium-hub

Next, add selenium nodes for the desired browser/version (here Firefox 32.0):

[root@localhost ~]# docker run -d -P -e FIREFOX_VERSION=32.0 --link selenium-hub:hub --name selenium-node-ff32 selenium/node-firefox
cdd369b95f1a5c631241da2644dcdb077b5f04a6517e604275fced2b0db13437
[root@localhost ~]# docker ps
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS              PORTS                     NAMES
cdd369b95f1a        selenium/node-firefox         "/opt/bin/entry_point"   10 seconds ago      Up 5 seconds                                  selenium-node-ff32
44274517aebd        selenium/hub                  "/opt/bin/entry_point"   1 minute ago        Up 1 minute         0.0.0.0:32770->4444/tcp   selenium-hub

Keep in mind that by default each selenium-node container will run only one browser session at a time. You will want to scale your number of nodes to match (or more likely, exceed) the number of parallel Child Jobs in your CI server.

Finally, point your tests to the Selenum Hub URL http://192.168.1.50:32770/wd/hub. The Hub also provides a GUI console where you can review the configuration and monitor browser use at http://192.168.1.50:32770/grid/console.

Using lightweight containers like this to divide and conquer massive functional test suites will have a profound impact on your delivery pipeline, and help reign-in the developer feedback loop. Further, with Docker's ubiquitous nature, developers often run large Selenium Grids in their local environment and farm out test groups to it before checkin.

Tuesday, February 23, 2016

Constant Customer Communication

It's important...

Tuesday, June 30, 2015

Upgrading Spring Security from 3 to 4

There are a few changes to the configuration defaults in the Spring Security 4 release that may initially break your application.

Mostly this applies to applications using XML-based configurations. In the 4.0 release, there was an effort to align the XML-based Configuration defaults with the more recent JavaConfig counterparts.

For starters, turning on DEBUG logging for org.springframework.security will give you some hints.

The first change was in the main Spring Security Filter Chain. The CSRF token filter is added to the filter chain and turned on by default. This may cause your login form submit URL to fail the CSRF check and you will see the following in the log:

06/22/2015 18:26:19 DEBUG org.springframework.security.web.FilterChainProxy (FilterChainProxy.java:324) - /doLogin at position 5 of 13 in additional filter chain; firing Filter: 'CsrfFilter'
06/22/2015 18:26:19 DEBUG org.springframework.security.web.csrf.CsrfFilter (CsrfFilter.java:106) - Invalid CSRF token found for http://localhost/unity/doUserLogin

The change is described in SEC-2347. To restore the previous functionality, you can turn off CSRF by adding a disabled <csrf> element to your <http> configuration. Keep in mind that if you don't have another mechanism for CSRF protection in your app, you really should consider enabling Spring's CSRF.

  <http use-expressions="true">
    <csrf disabled="true"/>
    <intercept-url pattern="/app/login.do"  access="permitAll()" />
    <intercept-url pattern="/app/logout.do" access="permitAll()" />
    <intercept-url pattern="/app/**"        access="isAuthenticated()" /> 
    <intercept-url pattern="/**"            access="permitAll()" />
    <form-login login-processing-url="/doLogin"
      . . .
    />
  </http>

SSL/TLS Integration between Java and .NET

Through the years, I've done a lot of integration between Java and .NET platforms. Mostly things play well together, but every once and while an interoperability issue is raised that I haven't encountered before.

Recently, we had a service written in C# on the .NET 4.5 platform that exposed a custom length-based framing protocol over TCP. Essentially requests and responses are prefixed with length-bytes which indicate the subsequent message length, and the same is true for any variable-length fields within the message itself. We needed a Java 7 client library for this service, so I wrote one that exposed a contextual service API backed by a connection pool. The pool would eventually serve-up reusable secure connections with the SSL sessions being reused as well to reduce overhead.

I had built in support for SSL throughout, but had turned it off in my local configuration for most of the development testing. At this stage, all of our concurrency and throughput tests were passing. However, when we flipped the switch to turn on secure connections, any tests that reused a connection were failing. Breaking the problem down to the smallest reproducible scope, we could recreate the issue with 2 sequential requests.

Create secure socket to the service and complete handshake
Send request A to the service
Receive response to A from the service, OK
Reuse previous secure socket
Send request B to the service
Service never responds with a response
Timeout occurs

Out comes Wireshark on both sides to see what's going on...

Too Busy to Improve?

Tuesday, October 21, 2014

Adopting DevOps

There are many lessons learned (originally outside of software) in lean manufacturing and single-piece flow that profoundly improve an organization's ability to rapidly produce software products and services.

In some companies, the culture maintains a notion that the development group has its own delivery pipeline, separate from the IT operations pipeline. This approach is more of a delivery conveyor belt anti-pattern. The software team is only ever committed to transporting product to the end of their conveyor. From there, it's off to a different belt and becomes another team's problem.

In contrast, a deployment pipeline delivering value directly to the customer implies a continuous and constant flow. At the very least, it starts with the source-commit (or sooner) and doesn't end until value is delivered to the customer. That pipeline is a common resource, owned and shared by all teams, to deliver products and services directly to the customer as a continuous stream of value. Just like water does not need hand-holding as it flows through a pipeline, the ideal road to production should be fully automated with sensors along the way ensuring quality is maintained.