Sunday, June 7, 2015

Java 8 Features

Lambda expressions allow to pass in a block of code, a function without a name. The -> separates the parameter from the body of the lambda expression. The parameter of lambda expression does not have a type as the type is inferred from the context of the variable. The lambda method parameters are statically typed. Some different ways of writing lambda expressions areas below:

// 1) empty pair of parentheses, (), can be used to signify that there are no arguments.

Runnable noArguments = () -> System.out.println("Hello World");
// 2) when there is only one argument to the lambda expression we can leave out the parentheses around the arguments.

ActionListener oneArgument = event -> System.out.println("button clicked");
// 3) full block of code, bookended by curly braces ({})

Runnable multiStatement = () -> {
   System.out.print("Hello");
   System.out.println(" World");
};
// 4) This creates a function that adds together two numbers, were the variable called add 
//    isn’t the result of adding up two numbers; it is code that adds together two numbers.

BinaryOperator<Long> add = (x, y) -> x + y;
// 5) providing explicit type to arguments requires to surround the arguments to the lambda expression with parentheses. 
//    The parentheses are also necessary if you’ve got multiple arguments.

BinaryOperator<Long> addExplicit = (Long x, Long y) -> x + y;

The target type of a lambda expression is the type of the context in which the lambda expression appears—for example, a local variable that it’s assigned to or a method parameter that it gets passed into.

Lambda expressions only use final variables since they capture values, not variables. Although we are not required to declare the variable(s) as final, we cannot use them as nonfinal variable(s) if they are to be used in lambda expressions. If you assign to the variable multiple times and then try to use it in a lambda expression, you’ll get a compile error.

Lambda expressions are statically typed, and these types of lambda expressions are called functional interfaces. A functional interface is an  interface with a single abstract method that is used as the type of a lambda expression.

Table 2-1. Important functional interfaces in Java
Interface name Arguments Returns Example
Predicate<T> T boolean Has this album been released yet?
Consumer<T> T void Printing out a value
Function<T, R> T R Get the name from an Artist object
Supplier<T> None T A factory method
UnaryOperator<T> T T Logical not (!)
BinaryOperator<T> (T, T) T Multiplying two numbers (*)

Java 8 allows you to leave out the types for whole parameters of lambda expressions. Java compiler looks for information close to the lambda expression and uses this information to figure out what the correct type should be. It’s still type checked and provides all the safety that you’re used to, but you don’t have to state the types explicitly. This is what we mean by type inference.

A Predicate is a functional interface that checks whether something is true or false. It is also a lambda expression that returns a value, unlike the previous ActionListener examples.
Predicate<Integer> atLeast5 = x -> x > 5;

Here the return value of the lambda expression is the value its body evaluates to.

BinaryOperator is a functional interface that takes two arguments and returns a value, all of which are of same type. It takes only a generic argument. If no generic argument is specified the code doesn't compile.
BinaryOperator<Long> addLongs = (x, y) -> x + y;

Some of the Key Points on Lambda Expressions:
  • A lambda expression is a method without a name that is used to pass around behavior as if it were data.
  • Lambda expressions look like this: BinaryOperator<Integer> add = (x, y) ? x + y.
  • A functional interface is an interface with a single abstract method that is used as the type of a lambda expression.

Streams

In for loop for collections, the iteration proceeds by creating a new Iterator object and then explicitly calling the hasNext and next methods on the Iterator. Hence its too hard to abstract away the different behavioral operations and is inherently serial in nature.

Streams allow to write collections-processing code at a higher level of abstraction. A Stream is a tool for building up complex operations on collections using a functional approach. The Stream interface contains a series of functions each of which corresponds to a common operation that we might perform on a Collection. The call to stream() returns a Stream which is an equivalent interface with the Iterator in the internal iteration of collection.
long count = allArtists.stream().filter(artist -> artist.isFrom("London")).count();

The operations using Streams API does not change the contents of collection but declared the contents of the Stream. The Stream object returned isn’t a new collection—it’s a recipe for creating a new collection. The filter method of the Stream above keeps only those objects that pass a test by returning either true or false. The call to filter builds up a Stream recipe, but there’s nothing to force this recipe to be used. Methods such as filter that build up the Stream recipe but don’t force a new value to be generated at the end are referred to as lazy. Methods such as count that generate a final value out of the Stream sequence are called eager. If the operation gives back a Stream, it’s lazy; if it gives back another value or void, then it’s eager. The values in the Stream that are operated on are derived from the initial values and the recipe produced by the sequence of Stream calls.

The collect(toList()) is an eager operation that generates a list from the values in a Stream.
List<String> collected = Stream.of("a", "b", "c").collect(Collectors.toList());

The map operation allows to apply a function (say converting a value of one type into another) to a stream of values, producing another stream of the new values. It is an instance of Function.
List<String> collected = Stream.of("a", "b", "hello").map(string -> string.toUpperCase()).collect(toList());

The filter method on the stream allows to check each element of the collection. The Stream after the filter has the elements of the Stream
beforehand, which evaluated to true. It is an instance of Predicate interface.

The flatMap method allows to replace a value with a Stream and concatenates all the streams together. It is a variant of map operation which produces a new Stream object as the replacement. It is associated with functional interface but its return type is restricted to streams and not any value.
// Takes Stream of lists of numbers and returns all the numbers from in the sequences.

List<Integer> together = Stream.of(asList(1, 2), asList(3, 4)).flatMap(numbers -> numbers.stream()).collect(toList());

The max and min methods of stream API finds the maximum or minimum element in the streams. The Comparator is passed to determine the ordering of the elements. The comparing method in java 8 allows to build a Comparator using keys. The getter function for the value allows to get the same element out of both elements being compared. When the max method is called on an empty Stream it returns an Optional value, which represents a value that may exist, or may not.
Track shortestTrack = tracks.stream().min(Comparator.comparing(track -> track.getLength())).get();

The reduce operation is used to generate a single result from a collection of values. It takes the initial count and the lambda expression. The count always starts with 0 which is the count of an empty Stream. The type of the reducer is a BinaryOperator.
int count = Stream.of(1, 2, 3).reduce(0, (acc, element) -> acc + element);

Higher-order functions: A higher-order function is a function that either takes another function as an argument or returns a function as its result. If a functional interface is used as a parameter or return type, then we have a higher-order function. Nearly all the functions that we’ve encountered on the Stream interface are higher-order functions. Comparator is also a functional interface as it has only a single abstract method.

Streams describes the operations on data by saying what transformation is made rather than how the transformation occurs. They help to get towards an idea of side effect–free function. Functions with no side effects don’t change the state of anything else in the program or the outside world. Lambda expressions allow to capture values rather than capturing variables (as variables are final), hence promoting to write code free from side effects. The only exception to this is the forEach method, which is a terminal operation.

Java 8 introduces default methods and static methods on interfaces enabling to have methods with bodies containing code.

Boxed types which wrap up the primitive types are objects and have a memory overhead. An primitive int takes 4 bytes of memory, an boxed type Integer takes 16 bytes. Further an Integer[] take up nearly six times more memory than an int[] of the same size. There is also a computational overhead when converting from a primitive type to a boxed type (boxing) and vice versa (unboxing). The streams library hence differentiates between the primitive and boxed versions of some library functions e.g. mapToLong higher-order function and ToLongFunction.
        If the return type is a primitive, the interface is prefixed with To and the primitive type, as in ToLongFunction. If the argument type is a primitive type, the name prefix is just the type name, as in LongFunction. If the higher-order function uses a primitive type, it is suffixed with To and the primitive type, as in mapToLong. These methods such as mapToLong return specialized streams such as IntStream, DoubleStream and LongStream instead of Stream. The min, max, average, sum along with summaryStatistics methods are all available on all three primitive specialized Stream variants.

While overloading the methods, java infers the type of lambda to be the most specific functional interface when calling these methods.
  • If there is a single possible target type, the lambda expression infers the type from the corresponding argument on the functional interface.
  • If there are several possible target types, the most specific type is inferred.
  • If there are several possible target types and there is no most specific type, you must manually provide a type.

@FunctionalInterface is an annotation that should be applied to any interface that is intended to be used as a functional interface. The new interfaces provide Stream interoperability and are really there to bundle up blocks of code as data. Interfaces such as java.lang.Comparable and java.io.Closeable have only a single method (which depends on the object's internal state) but aren’t normally meant to be implemented by lambda expressions. The @FunctionalInterface annotation can be applied to such interfaces enabling them to be used with lambda expressions.

Default Methods allow to let the implementing classes of the interface use their implementation e.g. the stream method of collection or forEach method on Iterable. Unlike classes, interfaces don’t have instance fields, so default methods can modify their child classes only by calling methods on them. This helps to avoid making assumptions about the implementation of their children. The default method is a virtual method opposite of a static method. The class or concrete method override always takes precedence over the default method. If a child class extends a parent class which in turn implemented an interface with a default method. Then the child class also implemented another interface with the default method of the same signature. But the default method from the parent class takes precedence over the default method from the interface.

When a class implements multiple interface with the same default method it results in the compile error. With the new enhanced super syntax i.e. using the InterfaceName.super variant it’s possible to specify a method from an inherited interface.

Below are the rules for multiple inheritance for default methods.
  1. Any class wins over any interface. So if there’s a method with a body, or an abstract declaration, in the superclass chain, we can ignore the interfaces completely.
  2. Subtype wins over supertype. If we have a situation in which two interfaces are competing to provide a default method and one interface extends the other, the subclass wins.
  3. No rule 3. If the previous two rules don’t give us the answer, the subclass must either implement the method or declare it abstract.
The default methods avoids multiple inheritance of state but rather allows inheritance of blocks of code. Java 8 enables static methods on an interface e.g the of() method of the Stream interface.

Optional is a new core library data type that is designed to provide a better alternative to null. null is often used to represent the absence of a value, and this is the use case that Optional is replacing. The problem with using null in order to represent absence is the dreaded NullPointerException. Optional encourages the coder to make appropriate checks as to whether a variable is null in order to avoid bugs. Second, it documents values that are expected to be absent in a class’s API. The factory method is used to create an Optional instance from a value. The empty factory method of Optional is used to represent Optional as an absent value. Also a a nullable value can be converted into an Optional using the ofNullable method.

A common idiom you may have noticed is the creation of a lambda expression that calls a method on its parameter.
artist -> artist.getName()
Such common idiom can be written using method reference as below:
Artist::getName
The standard form of method reference is Classname::methodName. There are no method brackets, since we are not actually calling the method but providing the equivalent of a lambda expression that can be called in order to call the method. Constructors can also be called using the same abbreviated syntax as below:
(name, nationality) -> new Artist(name, nationality) // original
Artist::new // creating an Artist object
String[]::new // creating String array

Method references automatically support multiple parameters, as long as you have the right functional interface.

Element Ordering
The encounter order is defined depends on both the source of the data and the operations performed on the Stream. When you create a Stream from a collection with a defined order e.g. List, the Stream has a defined encounter order. When we try to map values and there’s a defined encounter order, then that encounter order will be preserved. When there’s no encounter order on the input Stream, there’s no encounter order on the output Stream. Most operations, such as filter, map, and reduce, can operate very efficiently on ordered streams. Although ordering can be removed by using stream's unordered method. The forEachOrdered method provides an ordering guarantee compared to forEach method especially while using parallel streams.

The collector is a general-purpose construct for producing complex values from streams. These can be used with any Stream by passing them into the collect method. The collectors can be statically imported from the java.util.stream.Collectors class.
The toList collector produces java.util.List instances, while the toSet and toCollection collector produces instances of Set and Collection.
List<Integer> numbers = asList(1, 2, 3, 4);
List<Integer> stillOrdered = numbers.stream().map(x -> x + 1).collect(toList());
By calling toList or toSet we don’t get to specify the concrete implementation of the List or Set and the implementation is picked by stream library.

The collector e.g. toCollection can take a function to build a specified type of collection as its argument as below:
stream.collect(toCollection(TreeSet::new));

Also single value can be collected using collector such as maxBy and minBy collectors.
public Optional<Artist> biggestGroup(Stream<Artist> artists) {
   // defines a lambda expression that can map an artist to the number of members
   Function<Artist,Long> getCount = artist -> artist.getMembers().count();
   return artists.collect(maxBy(comparing(getCount)));
}

The averagingInt method takes a lambda expression in order to convert each element in the Stream into an int before averaging the values as below.
public double averageNumberOfTracks(List<Album> albums) {
    return albums.stream().collect(averagingInt(album -> album.getTrackList().size()));
}

A collector partitioningBy takes a stream and partitions its contents into two groups. It uses a Predicate to determine whether an element should be part of the true group or the false group and returns a Map from Boolean to a List of values.
public Map<Boolean, List<Artist>> bandsAndSolo(Stream<Artist> artists) {
 // The method reference Artist::isSolo can be also written as artist -> artist.isSolo()
 return artists.collect(partitioningBy(Artist::isSolo));
}

The groupingBy collector takes a classifier function in order to partition the data similar to the partitioningBy collector which took a Predicate to split it up into true and false values. Below is an example which groups a Stream of albums by the name of their main musician. The groupingBy form below divides elements into buckets. Each bucket gets associated with the key provided by the classifier function, e.g. here its getMainMusician. The groupingBy operation then uses the downstream collector to collect each bucket and makes a map of the results.
public Map<Artist, Long> numberOfAlbums(Stream<Album> albums) {
    return albums.collect(groupingBy(album -> album.getMainMusician(),
    counting()));
}

The mapping collector allows to perform a map-like operation over the collector’s container. The mapping collector needs the collection to store the results, which can be done using the toList collector.
public Map<Artist, List<String>> nameOfAlbums(Stream<Album> albums) {
    return albums.collect(groupingBy(Album::getMainMusician,
    mapping(Album::getName, toList())));
}

In both the above cases, the second collector is used in order to collect a subpart of the final result, hence also called downstream collectors.

Comma separated string from the formatted list can be generated using Collectors.joining which collects the Stream to string as below:
String result =
    artists.stream()
              .map(Artist::getName)
              .collect(Collectors.joining(", ", "[", "]"));

Data Parallelism

Amdahl’s Law is a simple rule that predicts the theoretical maximum speedup of a program on a machine with multiple cores. If we take a program that is entirely serial and parallelize only half of it, then the maximum speedup possible, regardless of how many cores we throw at the problem, is 2×. Given a large number of cores—and we’re already into that territory—the execution time of a problem is going to be dominated by the serial part of that problem

When we have a Stream object, we can call its parallel method in order to make it parallel. If we’re creating a Stream from a Collection, we can call the parallelStream method in order to create a parallel stream from the get-go. Below is the example which calculates the total length of a sequence of albums in parallel.
public int parallelArraySum() {
    return albums.parallelStream()
                 .flatMap(Album::getTracks)
                 .mapToInt(Track::getLength)
                 .sum();
}

The kinds of problems that parallel stream libraries excel at are those that involve simple operations processing a lot of data, such as Simulations. Below is the example of parallel Monte Carlo simulation. Here we first use the IntStream range function to create a stream of size N, then call the parallel method in order to use the parallel version of the streams framework. The twoDiceThrows function simulates throwing two dice and returns the sum of their results, which is used on the data stream using the mapToObj method. All the simulation results are combined using the groupingBy collector. Then the numbers are mapped to 1/N and added using the summingDouble function.
public Map<integer double=""> parallelDiceRolls() {
    double fraction = 1.0 / N;
    return IntStream.range(0, N)
            .parallel()
            .mapToObj(twoDiceThrows())
            .collect(groupingBy(side -> side,
                                summingDouble(n -> fraction)));
}

When calling reduce the initial element could be any value, but for same operation to work correctly in parallel, it needs to be the identity value of the combining function. The identity value leaves all other elements the same when reduced with them. For example, if we’re summing elements with our reduce operation, the combining function is (acc, element) -> acc + element. The initial element must be 0, because any number x added to 0 returns x. Another caveat specific to reduce is that the combining function must be associative. This means that the order in which the combining function is applied doesn’t matter as long the values of the sequence aren’t changed.

The streams framework deals with any necessary synchronization itself, so there’s no need to lock the data structures. If we tried to hold locks on any data structure that the streams library is using, such as the source collection of an operation then it would likely cause issues. Stream also has a sequential method other than the parallel method. When a stream pipeline is evaluated, there is no mixed mode, i.e. the orientation is either parallel or sequential. If a pipeline has calls to both parallel and sequential, the last call wins. Under the hood, parallel streams back onto the fork/join framework. The fork stage recursively splits up a problem, were each chunk is operated upon in parallel and then the results are merged back together in the the join stage.

The common data sources from the core library can be split up into three main groups by performance characteristics:

The good: An ArrayList, an array, or the IntStream.range constructor. These data sources all support random access, which means they can be split up arbitrarily with ease.
The okay: The HashSet and TreeSet. These cannot be decomposed easily with perfect amounts of balance, but most of the time it’s possible to do so.
The bad: Some data structures don’t split well; for example, they may take O(N) time to decompose. Examples here include a LinkedList, which is computationally hard to split in half. Also, Streams.iterate and BufferedReader.lines have unknown length at the beginning, so it’s pretty hard to estimate when to split these sources.

Ideally, once the streams framework has decomposed the problem into smaller chunks, we’ll be able to operate on each chunk in its own thread, with no further communication or contention between threads.

Java 8 includes a couple of other parallel array operations that utilize lambda expressions outside of the streams framework These operations are all located on the utility class Arrays, which also contains a bunch of other useful array-related functionality from previous Java versions

Name Operation
parallelPrefix Calculates running totals of the values of an array given an arbitrary function
parallelSetAll Updates the values in an array using a lambda expression
parallelSort Sorts elements in parallel

The parallelSetAll method is used to easily initialize an array in parallel instead of using a for loop. An array is provided to operate on and a lambda expression, which calculates the value given the index. The array passed into the operation is altered, rather than creating a new copy.
public static double[] parallelInitialize(int size) {
      double[] values = new double[size];
      Arrays.parallelSetAll(values, i -> i);
      return values;
}

The parallelPrefix operation is useful for performing accumulation-type calculations over time series of data. It mutates an array, replacing each element with the sum (or any BinaryOperator) of that element and its predecessors. The below example takes a rolling window over a time series and produces an average for each instance of that window.
public static double[] simpleMovingAverage(double[] values, int n) {
      double[] sums = Arrays.copyOf(values, values.length);
      Arrays.parallelPrefix(sums, Double::sum);
      int start = n - 1;
      return IntStream.range(start, sums.length)
                  .mapToDouble(i -> {
                        double prefix = i == start ? 0 : sums[i - n];
                        return (sums[i] - prefix) / n;
                  })
            .toArray();
}