I was changing jobs recently and in the process had quite a few technical interviews. In addition to the interview problems asked on the spot, several companies also asked to solve test problems at home. These problems required more time to complete, but were much closer to what programmers actually do at work, in contrast to the typical implement-the-quick-sort type of interview problems.

Here I want to share one of such problems, together with a solution. It took me about 3 hours to complete, and I enjoyed solving it.

The problem

You need to write an implementation of the following interface:

struct IReceiver {
    virtual void Receive(const char* data, unsigned int size) = 0;
    virtual ~Receive() = default;
};

The Receive member function is called continuously with two types of data: binary and text.

  1. The binary data always starts with byte 0x24, followed by a 4-byte integer, indicating the size of the packet. Then the packet body follows.

  2. The text data never starts with byte 0x24. No length is provided. The packet body always ends with two consecutive empty lines \r\n\r\n.

The important caveat is that a packet can span several Receive invocations. Also a single Receive invocation might be provided with several packets at once.

Each time a complete packet is received, the IReceiver descendant needs to invoke the right member function of the following callback class:

struct ICallback {
    virtual void BinaryPacket(const char* data, unsigned int size) = 0;
    virtual void TextPacket(const char* data, unsigned int size) = 0;
    virtual ~ICallback() = default;
};

The BinaryPacket and TextPacket functions should both receive a complete packet body, without the binary header or the two trailing empty lines.

The solution is judged by its performance, simplicity and readability. The language to use is C++.

Tools

No special requirements were made of what tools to use. I decided to use the following:

  • The catch library for unit tests.
  • GCC as the C++ compiler.
  • cmake as the build system.
  • QtCreator as the IDE.

Analysis

There are several special cases that needed to be properly handled:

  • Once a binary packet starts, the sequence r\n\r\n doesn’t have any special meaning. The same is true for text packets and the $ symbol (byte 0x24).
  • The binary header can be split among several Receive invocations.
  • A Receive invocation might contain several packets of different types.

Also, to guarantee acceptable performance, I decided to constrain the implementation:

  • Only one packet of any type can be stored in memory at the same time. As soon as a packet ends, the allocated memory should be reused by the next packet.
  • The Receive function shouldn’t take more than O(n) time to complete, where n is the size.

You might notice that IReceiver looks a lot like a simple regular expression parser, except that it doesn’t ask for input itself. Instead it is being fed with data via the Receive function. Also, the binary header contains the size of the following body. This can’t be encoded with a regular expression.

An approximate formal grammar for such a parser would look like this:

stream: packet*
packet: binary | text
binary: '$' SIZE BODY
text: BODY '\r\n\r\n'

This realization makes it obvious that IReceiver is actually a kind of state machine, with three states:

  • Idle - no packet is being parsed.
  • Binary - currently a binary packet is being parsed (including the header).
  • Text - currently a text packet is being parsed.

state machine

It is difficult to implement IReceiver as a classical recursive descend parser, due to the data being fed to it externally. But the idea of states is still applicable.

Implementation

Before implementing the IReceiver, I first covered simple cases with tests. Then I coded a simple implementation that made the tests pass. After that I added more test cases and improved the implementation. Once I had all the functionality covered, I made some refactoring to simplify the code. This is the classical TDD workflow, which fit the problem neatly.

The tests are pretty self-explanatory and can be found here. The implementation: Receiver.h and Receiver.cpp. The complete solution is here.

The Receive function is the dispatcher. It is responsible for state transitions from idle to either text or binary. The functions receiveBinary and receiveText handle the corresponding stream types and also transition back to idle. They update the size argument, based on which the Receive function decides if it should return and wait for more data, or continue to the next packet.

There is some bookkeeping involved in parsing the binary header and searching for the \r\n\r\n sequence. Other than that the code should is not very complicated.

Conclusion

Of all the problems I was asked to solve (including on-site problems) I liked this one the most. It is small enough to be completed in a reasonable amount of time. It didn’t force me to use a particular library of framework. And it wasn’t overly pedagogical in nature. It was the old plain engineering.