Test Driving Base64

Test Driving Base64

Base64 is a workhorse on the internet. It allows binary data to be
encoded in an ASCII string. This way it can be stored and transported
without danger of data being corrupted.

Because the encoding/decoding scheme is so ubiquitous one wonders why
you should implement it yourself instead of using a library.

1 Reinventing the Wheel

Numerous cases have been made against reinventing the
wheel in production. Arguments usually fall in the following
categories.

  1. Libraries are better tuned for memory usage and speed.
  2. Libraries are better tested because of the greater usage base.

These are valid objections. But reinventing the wheel is not about
building software that already is widely available. The benefits of
reinventing the wheel are numerous.

  1. One could invent a different wheel that is better suited in some
    situations.
  2. One gains the experience of making various design decision under
    the given constraints.
  3. One becomes familiar with the details of a technology.
  4. The time spend counts for the 10000 hour rule.

Just don’t use it in production.

2 Specification

Base64 encodes a multiple of three bytes into four characters via the
following mechanism.

byte #0 #1 #2
bit pattern 0 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 0 1 1 0 1 1 0 0
index 3 26 46 45
encoding C a t s

Three bytes are placed after each other. Each group of six bits are
encoded via the following table.

Index Character Index Character Index Character Index Character
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

If the total number of bytes is not a multiple of three pad with zero
bytes until it is. If there was one byte left over take the first two
characters and append ==. If there were two bytes left over take
the first three characters and append =.

This will result in a string with a number of characters that is
divisible by four.

3 Test Driven Development

Test Driven Development (TDD) is the a process where a developer sets
up a very short feedback loop by writing test and only then writing
code to make the test pass. When all the tests pass one cleans up the
codebase before new tests are introduced.

So we will focus on the tests that are used to drive our
implementation. The test is a number of test cases created with the
following code.

verifyThat(0b0, 0b0, 0b0).encodesAs("AAAA");

The above code is used to create a test case that, together with a
Parameterized test runner, actually performs the verification
described in the snippit. If you want to know how this is achieved,
you can look at the implementation.

4 Test Cases

What test cases should we consider? The simplest input would be three
zero bytes. Performing the encoding by hand produces the following
test case.

verifyThat(0b0, 0b0, 0b0).encodesAs("AAAA");

The following few test cases increase the number of zero bytes to
multiples of three.

verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAA");
verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAAAAAA");

For the next test cases we take a few steps back. When the number of
available bytes is not divisible by three we need to introduce = in
the output. The simplest case we can think of is one and two zero
bytes.

verifyThat(0b0, 0b0).encodesAs("AAA=");
verifyThat(0b0).encodesAs("AA==");

In order to make sure that longer sequences of zero bytes are still
being encoded correctly we introduce the following test cases.

verifyThat(0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAA=");
verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAAAAA=");
verifyThat(0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAA==");
verifyThat(0b0, 0b0, 0b0, 0b0, 0b0, 0b0, 0b0).encodesAs("AAAAAAAAAA==");

Now we have our work cut out for us, because the next thing to
consider is encoding something different than zero bytes. The
simplest thing that springs to mind is

verifyThat(0b00000000, 0b00000000, 0b00000001).encodesAs("AAAB");

We switch to writing our bytes with zeros prepend to see the
structure of the bits. The next few test cases are a bit dull because
they go through every output symbol. The end result is

verifyThat(0b00000000, 0b00000000, 0b00000010).encodesAs("AAAC");
<<testcase:AAAD-AAA+>>
verifyThat(0b00000000, 0b00000000, 0b00111111).encodesAs("AAA/");

Now that we confidently encoded the fourth character, we can expand
our territory by taking on the third character. Because we have
working code that can be reused we only focus on the edge cases.

verifyThat(0b00000000, 0b00000000, 0b01000000).encodesAs("AABA");
verifyThat(0b00000000, 0b00001111, 0b11000000).encodesAs("AA/A");

Similar for the second and first character

verifyThat(0b00000000, 0b00010000, 0b00000000).encodesAs("ABAA");
verifyThat(0b00000011, 0b11110000, 0b00000000).encodesAs("A/AA");
verifyThat(0b00000100, 0b00000000, 0b00000000).encodesAs("BAAA");
verifyThat(0b11111100, 0b00000000, 0b00000000).encodesAs("/AAA");

Now that we have a proper implementation for all the four characters
lets check our encoding on smaller sizes.

data.add(verifyThat(0b00000001).encodesAs("AQ=="));
data.add(verifyThat(0b11111111).encodesAs("/w=="));
data.add(verifyThat(0b00000000, 0b000000001).encodesAs("AAE="));
data.add(verifyThat(0b11111111, 0b111111111).encodesAs("//8="));

With a proper set of test cases we can confidently refactor our
implementation without losing our solution. A similar procedure can
be used to implement the decode method.

5 Theories

Now that we have a proper understanding of how the Base64 encoding is
performed, it is time to reflect on its properties. Although the test
cases that were provided during TDD phase have merit, they offer a
myopic view on the process. JUnit Theories can cast a wider net and
provide a way to check theories about code.

For example, the whole purpose of the encode and decode methods
is to be inverses of each other. This can be expressed with the
following test that uses the Theories test runner.

@RunWith(Theories.class)
public class Base64Theories {
    private Kata kata;

    @Before
    public void createKata() {
        kata = new Kata();
    }

    @DataPoints
    public static byte[][] dataPoints = new byte[][]{
            new byte[]{0b0},
            new byte[]{0b0, 0b0},
            new byte[]{0b0, 0b0, 0b0},
            new byte[]{0b0, 0b0, 0b0, 0b0},
            new byte[]{0b0, 0b0, 0b0, 0b0, 0b0},
            new byte[]{0b0, 0b0, 0b0, 0b0, 0b0, 0b0},
            new byte[]{0b111111},
            new byte[]{0b111111, 0b111111},
            new byte[]{0b111111, 0b111111, 0b111111},
            new byte[]{0b111111, 0b111111, 0b111111, 0b111111},
            new byte[]{0b111111, 0b111111, 0b111111, 0b111111, 0b111111},
            new byte[]{0b111111, 0b111111, 0b111111, 0b111111, 0b111111, 0b111111}
    };

    @Theory
    public void decodeIsTheInverseOfEncode(byte[] input) {
        assumeThat(input.length, is(greaterThan(0)));

        assertThat(kata.decode(kata.encode(input)), is(input));
    }
}

Most notable is the @Theory annotation. It is used to validate the
assertions for all data points that adhere to the assumptions. The
data points are provided by an array annotated with the @DataPoints
annotation.

For other theories about encoding and decoding Base64 see the full
source of the Base64Theories.

6 Conclusions

Base64 encoding/decoding is an ideal problem to use Test Driven
Development on. It has the benefits of being widely used but
succintly expressed. It offers wide range of decision moments, both
in the the selection of test cases and in implementation choice.

Deel dit artikel