Saturday, October 12, 2013

Still using Externalizable to get faster Serialization with Java ?


Update: When turning off detection of cycles, performance can be improved even further (FSTConfiguration.setUnshared(true)). However in this mode an object referenced twice is written twice.

According to popular belief, the only way to go is handcrafted implementation of the Externalizable interface in order to get fast Java Object serialization.

Manual coding of "readExternal" and "writeExternal" methods is an errorprone and boring task. Additionally, with each change of a class's fields, the externalizable methods need adaption.

In contradiction to popular belief, a good implementation of generic serialization can be faster than handcrafted implementation of the Externalizable interface 99% if not 100% of the time.

Fortunately I managed to save the world with the fast-serialization library by addressing the disadvantages of Serialization vs Externalizable.

The Benchmark

The following class will be benchmarked.
public class BlogBench implements Serializable {
    public BlogBench(int index) {
        // avoid benchmarking identity references instead of StringPerf
        str = "Some Value "+index;
        str1 = "Very Other Value "+index;
        switch (index%3) {
            case 0: str2 = "Default Value"; break;
            case 1: str2 = "Other Default Value"; break;
            case 2: str2 = "Non-Default Value "+index; break;
        }
    }
    private String str;
    private String str1;
    private String str2;
    private boolean b0 = true;
    private boolean b1 = false;
    private boolean b2 = true;
    private int test1 = 123456;
    private int test2 = 234234;
    private int test3 = 456456;
    private int test4 = -234234344;
    private int test5 = -1;
    private int test6 = 0;
    private long l1 = -38457359987788345l;
    private long l2 = 0l;
    private double d = 122.33;
}
To implement Externalizable, a copy of the class above is made but with Externalizable implementation
(source is here).
The main loop for fast-serialization is identical, I just replace "ObjectOutputStream" with "FSTObjectOutput" and "ObjectInputStream" with "FSTObjectInput".
Result:

  • Externalizable performance with JDK-Serialization is much better compared to Serializable
  • FST manages to serialize faster than manually written "read/writeExternal' implementation

Size is 319 bytes for JDK Serializable, 205 for JDK Externalizable, 160 for FST Serialization. Pretty big gain for a search/replace operation vs handcrafted coding ;-). BTW if the "Externalizable" class is serialized with FST it is still slightly slower than letting FST do generic serialization.

There is still room for improvement ..

The test class is rather small, so setup + allocation of the Input and Output streams take a significant part on the times measured. Fortunately FST provides mechanisms to reuse both FSTObjectInput and FSTObjectOutput. This yields ~200ns better read and write times.

So "new FSTObjectOutput(inputStream)" is replaced with
FSTConfiguration fstConf = FSTConfiguration.getDefaultConfiguration();
...
fstConf.getObjectOutput(bout)
There is even more improvement ..

Since Externalizable does not need to track references and this is not required for the test class, we turn off reference tracking for our sample by using the @Flat annotation. We can also make use of the fact, "str3" is most likely to contain a default value ..

@Flat
public class BlogBenchAnnotated  implements Serializable {
    public BlogBenchAnnotated(int index) {
        // avoid benchmarking identity references instead of StringPerf
        str = "Some Value "+index;
        str1 = "Very Other Value "+index;
        switch (index%3) {
            case 0: str2 = "Default Value"; break;
            case 1: str2 = "Other Default Value"; break;
            case 2: str2 = "Non-Default Value "+index; break;
        }
    }
    @Flat private String str;
    @Flat private String str1;
    @OneOf({"Default Value","Other Default Value"})
    @Flat private String str2;


and another one ..

To be able to instantiate the correct class at readtime, the classname must be transmitted. However in many cases both reader and writer know (at least most of) serialized classes at compile time. FST provides the possibility to register classes in advance, so only a number instead of a full classname is transmitted.
FSTConfiguration.getDefaultConfiguration().registerClass(BlogBenchAnnotated.class);



What's "Bulk"

Setup/Reuse of Streams actually require some nanoseconds, so by benchmarking just read/write of a tiny objects, a good part of per-object time is stream init. If an array of 10 BenchMark objects is written, per object time goes <300ns per object read/write.
Frequently an application will write more than one object into a single stream. For RPC encoding applications, a kind of "Batching" or just writing into the same stream calling "flush" after each object are able to actually get <300ns times in the real world. Of course Object Reference sharing must be turned off then (FSTConfiguration.setShared(false)).

For completeness: JDK (with manual Externalizable) Bulk yields 1197 nanos read and 378 nanos write, so it also profits from less initilaization. Unfortunately reuse of ObjectInput/OutputStream is not that easy to achieve mainly because ObjectOutputStream already writes some bytes into the underlying stream as it is instantiated.

Note that if (constant) initialization time is taken out of the benchmarks, the relative performance gains of FST are even higher (see benchmarks on fast serialization site).

Links:

Source of this Benchmark
FST Serialization Library (moved to github from gcode recently)