Lambdas and Streams : Effective java

Lambdas and Streams

Objective: In Java 8, functional interfaceslambdasmethod references, along with streams API were added to help software engineers create function objects and process sequences of data elements more easily. This section explains best practices of how to utilize those facilities.

Key topics:

  1. Lambdas and Anonymous Classes
  2. Lambdas and Method References
  3. Lambdas and Functional Interfaces
  4. Streams
  5. Streams and Collections
  6. Parallel Streams

Estimated time: 15-30 minutes.

Lambdas and Anonymous Classes

Which of the following implementations would be more readable?

Use lambda expressions

Collections.sort(words, (w1, w2) -> Integer.compare(w1.length(), w2.length()));

Use function objects

Collections.sort(words, new Comparator<String>() {
public int compare(String w1, String w2) {
return Integer.compare(w1.length(), w2.length();
}
});

In general, lambdas are by far the best way to represent small functional objects because it makes the code less verbose and thus more readable. In the above example, the compiler can deduce the types of parameters, but if in other situations it throws an error then you can add them easily. Note that few software engineers understand how the rules for type reference work but it is okay. Note also that lambdas lack names and documentation; so, if a computation isn’t self-explanatory, or exceeds few lines, then don’t use them. Furthermore, lambdas share with anonymous classes the property that you can’t reliably serialize and deserialize them across implementations; hence, don’t use them, but use an instance of a private static nested class, if you want to do so.

Lambdas and Method References

The goal of the following code snippets is to associate 1 with the key if it is not in the map; otherwise increment the associated value. Which snippet would be more readable?

Use lambda expressions

map.merge(key, 1, (count, incr) -> count + incr);

Use method references

map.merge(key, 1, Integer::sum);

Method references often provide a more succinct alternative to lambdas; therefore, use the former wherever they are shorter and clearer than the latter. Here are some common method references:

  •  Integer::parseInt : A static method reference for  str -> Integer.parseInt(str) .
  •  Instant.now()::isAfter : A bound method reference for  Instance i = Instant.now(); t -> i.isAfter(t) .
  •  String::toLowerCase : An unbound method reference for  str -> str.toLowerCase() .
  •  TreeMap<K,V>::new : A class constructor for  () -> new TreeMap<K,V> .
  • int[]::new : An array constructor for  len -> new int[len] .

Lambdas and Functional Interfaces

You can use  LinkedHashMap  class as a cache. If you were the designer of this class, which of the following implementations would you prefer?

Use protected methods

public class LinkedHashMap<K,V> extends HashMap<K,V> implements Map<K,V> {

protected boolean removeEldestEntry(Map.Entry<K, V> eldest) { return false; }

}

// Example of how to use LinkedHashMap

LinkedHashMap<String, String> map = new LinkedHashMap<String, String>() {
// This method is invoked by put and putAll after inserting a new entry into the map.
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > 120;
}
};

Use standard interfaces

public class LinkedHashMap<K,V> extends HashMap<K,V> implements Map<K,V> {

// removeEldestEntry.test is invoked by put and putAll after inserting a new entry into the map.
public LinkedHashMap(BiPredicate<Map<K,V>, Map.Entry<K,V>> removeEldestEntry) { … }

}

// Example of how to use LinkedHashMap

LinkedHashMap<String, String> map = new LinkedHashMap<String, String>((m, e) -> return m.size() > 120);

Use my functional interfaces

// Always annotate your functional interfaces with @FunctionalInterface
@FunctionalInterface interface EldestEntryRemovalFunction<K,V> {
boolean remove(Map<K,V> map, Map.Entry<K,V> eldest);
}

public class LinkedHashMap<K,V> extends HashMap<K,V> implements Map<K,V> {

// removeEldestFunction.remove is invoked by put and putAll after inserting a new entry into the map.
public LinkedHashMap(EldestEntryRemovalFunction<K,V> removeEldestFunction) { … }

}

// Example of how to use LinkedHashMap

LinkedHashMap<String, String> map = new LinkedHashMap<String, String>((m, e) -> return m.size() > 120);

You should utilize lambdas facility when you design your APIs, meaning accept functional interface types on input and return them on output. In general, you should use standard interfaces in  java.util.function . You don’t have to memorize them but should know some fundamental ones, as follows:

Interface Function Signature Example
 UnaryOperator<T>   T apply(T t)   String::toLowerCase 
 BinaryOperator<T>   T apply(T t1, T t2)   BigInteger::add 
 Predicate<T>  boolean test(T t)   Collection::isEmpty 
 Function<T>   R apply(T t)   Arrays::asList 
 Supplier<T>   T get()   Instant::now 
 Consumer<T>  void accept(T t)   System.out::println 

Note that there are many variants of the above interfaces to operate on primitive types  int  long , and  double You should use them wherever possible because the performance consequences of using boxed primitives for batch operations can be deadly. Note also that our old friend  Comparator<T> , which is structurally identical to  ToIntBiFunction<T,T>  interface, has been very popular and provides excellent self-documentation every time it is used in an API. Hence, you should consider using  Comparator<T>  first before deciding if you should try functional interfaces.

Streams

You are writing a program to read words from a dictionary and print all the anagram groups whose size meets a user-specified minimum. Which of the following implementations would be preferable?

Use streams and helper methods

public class Anagrams {
public static void main(String[] args) throws IOException {
Path dictionary = Paths.get(args[0]);
int minGroupSize = Integer.parseInt(args[1]);
try (Stream<String> words = Files.lines(dictionary)) {
words.collect(groupingBy(word -> alphabetize(word))).values().stream()
.filter(group -> group.size() >= minGroupSize)
.forEach(g -> System.out.println(g.size() + “: ” + g));
}
}

private static String alphabetize(String word) {
return new String(Arrays.sort(word.toCharArray()));
}
}

Use streams only

public class Anagrams {
public static void main(String[] args) throws IOException {
Path dictionary = Paths.get(args[0]);
int minGroupSize = Integer.parseInt(args[1]);
try (Stream<String> words = Files.lines(dictionary)) {
words.collect(groupingBy(word -> word.chars().sorted()
.collect(StringBuilder::new,
(sb, c) -> sb.append((char) c),
StringBuilder::append)
.toString()))
.values().stream()
.filter(group -> group.size() >= minGroupSize)
.map(group -> group.size() + “: ” + group)
.forEach(System.out::println);
}
}
}

A stream pipeine consists of a source stream followed by zero or more intermediate operations and one terminal operation. The purpose of the former is to transform one stream into another, such as mapping each element to a function of that element or filtering out all elements that do not satisfy some condition. The goal of the latter is to perform a final/return computation on the stream resulting from the last intermediate operation. Here are some things you should know about streams:

  • Stream pipelines are evaluated lazily, meaning that evaluation starts only after the terminal operation is invoked and data elements that are not required in the computation of the terminal operation will be never evaluated. This property enables you to work with infinite streams. Note that a stream pipeline without a terminal operation is a silent no-op; so, you would be better off always implementing this operation.
  • Streams API is fluent, allowing you to make multiple calls on one or multiple pipelines in a single expression.
  • By default, stream pipelines run sequentially. It is possible to make it run in parallel, but this is a rare case.
  • You should only use  forEach  as the terminal operation to report the result of a computation, not to perform the computation.
  • The essence of programming streams pipelines is side-effect-free function objects. You should be familiar with important collector factories such as  toList  toSet  toMap  groupingBy , and  joining .
  • Some tasks are best accomplished with streams, and others with iteration; in many cases, combining the two approaches would be preferable. The first implementation above is preferable because it is more readable and maintainable than the second one that overuses streams.

Streams and Collections

Which of the following implementations would be preferable if the size of the input list is large?

Return streams

// Returns a stream of all sublists of its input list
public class SubLists {
public static <E> Stream<List<E>> of(List<E> list) {
return Stream.concat(Stream.of(Collections.emptyList()),
prefixes(list).flatMap(SubLists::suffixes));
}

private static <E> Stream<List<E>> prefixes(List<E> list) {
return IntStream.rangeClosed(1, list.size()).mapToObj(end -> list.subList(0, end));
}

private static <E> Stream<List<E>> suffixes(List<E> list) {
return IntStream.range(0, list.size()).mapToObj(start -> list.subList(start, list.size()));
}
}

Return collections

// Returns a collection of all sublists of its input list
public class SubLists {
public static <E> Collection<List<E>> of(List<E> list) {
int n = list.size();
List<List<E>> result = new ArrayList<>(n * (n-1) / 2 + 1);
for (int i = 0; i < n; i++)
for (int j = i + 1; j <= n; j++)
result.add(list.subList(i, j));
result.add(Collections.emptyList());
return result;
}
}

The above problem is an exceptional case where returning a stream in the first implementation would be preferable because although the second one looks simpler, the memory required to hold the return collection is quadratic in the size of the input list, which is unacceptable for large input lists.

Generally, when you write a method that returns a sequence of elements, note that some of your clients may want to process them as a stream while others may want to iterate over them. You should try to accommodate both groups by returning a collection if reasonable. If it is too costly to do so, like the above example however, then consider to return a  Stream  or  Iterable , whichever seems more natural. If future releases of Java modify  Stream  interface to extend  Iterable  then feel free to return  Stream  because it will allow clients to process the return as a stream or as an iteration.

Parallel Streams

Can you spot a problem in the following method?

// Generate the first 20 Mersene primes in the form of 2^p – 1 where p is a prime number
Stream.iterate(TWO, BigInteger::nextProbablePrime)
.parallel()
.map(p -> TWO.pow(p.intValueExact()).subtract(ONE)
.filter(mersene -> mersene.isProbablePrime(50))
.limit(20)
.forEach(System.out::println);

Yes
No

Which of the following implementations would be faster?

Use parallel streams

static long numberOfPrimes(long n) {
return LongStream.rangeClosed(2, n)
.parallel()
.mapToObj(BigInteger::valueOf)
.filter(i -> i.isProbablePrime(50)
.count();
}

Do not use parallel streams

static long numberOfPrimes(long n) {
return LongStream.rangeClosed(2, n)
.mapToObj(BigInteger::valueOf)
.filter(i -> i.isProbablePrime(50)
.count();
}

In general, do not even attempt to parallelize a stream pipeline unless you have a very good reason that it will preserve the correctness of the computation and increase its performance, otherwise it can cause liveness and/or safety failures or performance disaster.

If you choose to do it then you should ensure that your code remains correct when run in parallel, and you should collect performance metrics under realistic conditions to justify your decision.

Under the right circumstances, it is possible to achieve near-linear speedup in the number of processor cores simply by adding a parallel  call to a stream pipeline. Here are things you should consider before you choose to parallelize a stream pipeline:

  • It is unlikely to increase performance of a stream pipeline if its source is from  Stream.iterate , or the intermediate operation  limit  is used. In the first example above, the pipeline has to contend with both problems and the streams library has no idea how to parallelize the pipeline.
  • As a rule, it is likely to gain performance on streams over instances of  ArrayList  HashMap  HashSet  ConcurrentHashMap  arrays  int  ranges; and  long  ranges; because they can be accurately and cheaply split into subranges of any desired size, making it easy to divide work among parallel threads. In the second example above, the parallel stream can be 3 to 4 times faster than the other one on a quad-core machine.

Leave a Reply

%d bloggers like this: