Stateful code in Java

What is stateful code?

The stateful code is often not thread-safe

We usually have many production issues related to stateful code that didn’t have to be stateful. The more mutable state a piece of code stores, the harder it is to reason about that code. This SDE Habit will mostly talk about Java, but its advice should be applicable to other languages (feel free to add examples.)

Try to always use “final”

For member variables in particular, using “final” is almost always correct. It is not sufficient to make sure your code is thread-safe, but it is a good first step. Take a look at this piece of code as an example (note that this is a modified form of actual production code that caused real problems)

public class SQSProvider {
  private AmazonSQS client;
  private String queue;
  private String receiptHandle;
 
  public SQSProvider(AmazonSQS client, String queue) {
    this.client = client;
    this.queue = queue;
  }
 
  public Document getNextMessage() {
    // (this code is a bit simplified to illustrate the point)
    Message sqsMessage = client.receiveMessage(new ReceiveMessageRequest(this.queue));
    Document document = new Document(sqsMessage.getBody());
    this.receiptHandle = sqsMessage.getReceiptHandle();
    return document;
  }
 
  public void deleteSQSMessage() {
    client.deleteMessage(new DeleteMessageRequest(queue, receiptHandle));
  }
}

This code isn’t thread-safe. If multiple instances of it exist, they will clobber each other’s receipt handles, and start deleting each other’s messages.

It might be immediately obvious that this code isn’t thread-safe, and that if multiple threads call getNextMessage(), they will clobber each other’s receipt handle. But the fact of the matter is that a class can get complicated, and these sorts of bugs are very easy to introduce when a class starts storing state.

If the client, queue, and receiptHandle were all final, it would be immediately obvious that there was a bug in this code. The developer would realize that getNextMessage shouldn’t set the receiptHandle as a member variable, and should instead return it in Document (and deleteLatestSQSMessage() should take a Document as a parameter, or at least a receipt handle.)

Here is the fixed version of the above code:

public class SQSProvider {
  private final AmazonSQS client;
  private final String queue;
 
  public SQSProvider(final AmazonSQS client, final String queue) {
    this.client = client;
    this.queue = queue;
  }
 
  public Document getNextMessage() {
    // (this code is a bit simplified to illustrate the point)
    final Message sqsMessage = client.receiveMessage(new ReceiveMessageRequest(this.queue));
    return new Document(sqsMessage.getBody(), sqsMessage.getReceiptHandle());
  }
 
  public void deleteSQSMessage(final Document doc) {
    client.deleteMessage(new DeleteMessageRequest(queue, doc.getReceiptHandle()));
  }
}

Here are some other uses of non-final member variables that introduced real bugs in production Amazon code:

  • Having a member variable to store the content of an email. When that class started being called by multiple threads, they started overwriting each other’s contents
  • Having a member variable to store a connection to a database. This caused the connection not to be closed when running in a multi-threaded environment.

[edit][hide]Consider using immutable collections

Google Guava provides a number of utility classes that are immutable, notably ImmutableListImmutableMap, and others. If your class has to have a collection as a member variable, consider if it really has to ever change. If not, you should use an immutable collection.

It is a good habit to always to default your collection to these immutable options and only use a mutable collection if absolutely necessary.

[edit][hide]Use @NotThreadSafe if your class is not thread-safe

Using non-thread safe code as if it were thread-safe is often a source of defects down the line. If you must write a class which is not thread-safe, use the @NotThreadSafe java annotation to let consumers know.

@NotThreadSafe
public class SQSProvider {
}

If you know that your code is non-thread safe and “by design”, you know you will not make it thread-safe, then @NotThreadSafe is the appropriate annotation to use. Code which has no annotation should probably be assumed to be not thread-safe. If you can confirm that a given piece of code is thread-safe, you should add @ThreadSafe —Olivwong 19:53, 17 January 2014 (UTC)

Stateful code can be hard to maintain

Whenever expending engineering resources to develop a piece of technology, consider whether it is worth it for you to devote engineering resources to maintaining that piece of software.

At Amazon, it is often correct to let a dedicated team maintain your state. There are many different technologies available (see the Storage wiki for a full run-down on your options.) But before you go and create a HashMap (or *insert other homegrown state management* methodology) in your code to store some state, think about whether or not you’ve considered all the edge cases:

  • What if that HashMap grows infinitely?
  • What if a box dies and comes back online?
  • How do you handle deployments?

Sometimes it is OK to store state in your own code, but many teams have had great success offloading that state management to others. For example, AWS services do a great job of managing state, and can greatly simplify your application (if they’re a good fit.)

The stateful code is hard to test

Imagine the above example of the SQSProvider class. If you want to test the deleteSQSMessage() method, you would have to always call getNextMessage() first since the deleteSQSMessage() refers to the state that is only set in the getNextMessage() method call.

By introducing state into your code, it makes it harder to test individual components of your code. In addition, it can be difficult to make fully reproducible test cases, since each test case will need to make sure it stores the same state each time it runs.

Leave a Reply

%d bloggers like this: