A Story of MQTT 5.0

The MQTT protocol has been around since the late 90s when it was created to enable the monitoring of a long distance oil pipeline. It went through several iterations before landing on version 3.1, published by IBM.

The next step was standardisation, at the OASIS standards body. As anyone who has taken part in a standardisation committee will know, this process is necessarily bureaucratic and slow. To speed up adoption, the main imperative was minimising disruption to existing implementations, as set out in the Technical Committee (TC) charter.

As a result, wholesale changes to the MQTT 3.1 specification were not allowed in the 3.1.1 standard. This meant that many irritating flaws could not be fixed nor widely sought enhancements included. This is where MQTT 5.0 comes in. While we still wanted to minimise disruption (no-one wanted to repeat the experiences of say AMQP 0.9 to 1.0), we also wanted to address the MQTT wish-list as far as possible so that major changes would not be needed for a long time to come. Whether we succeeded in that aim, time will tell.

To help us make sense of the multitude of items on that wish list, as I wrote in September 2016, they were grouped into four Big Ideas:

  • Improved error reporting
  • Extensible metadata
  • Scalability and large scale systems
  • Resource Constrained Clients and Performance Improvements

At that time, many of the solutions were not decided upon, but now with the availability of Committee Specification 01, I can write about the details. We are in the final stages of the standardisation process for 5.0. We hope to complete the process of rubber stamping in the next few months, and expect no substantive changes during that time.

Improved Error Reporting

Negative responses, or nacks, were the biggest omission from earlier versions of MQTT. If the client or server had a problem with the request or packet from the other end, the only recourse in many circumstances was to close the TCP connection. Connect packets were the exception to this: the connack always had a return code. MQTT 3.1.1 added a negative response code to subscribe requests because there was space available in the “granted QoS” field of the suback packet. Publish requests, however, were still not catered for. This is now remedied.

Reason Codes

As all ack packets now have reason codes, they have been consolidated into one set, which starts like this:

Reason codes are one byte. Values from 0 to 0x127 inclusive indicate successful outcomes, those from 0x128 to 0xFF unsuccessful. So as the subscribe response can have an error code:

msc {
    arcgradient="4";

    c [label="client"], b [label="broker"];

    c => b [label="connect"];
    b => c [label="connack(rc=0)"];

    c => b [label="subscribe"];
    b => c [label="suback(rc=0x80)"];

    ...;
}

so can the publish, when the QoS is greater than 0:

msc {
    arcgradient="4";

    c [label="client"], b [label="broker"];

    c => b [label="connect"];
    b => c [label="connack(rc=0)"];
    ...;
    c => b [label="publish, qos=1"];
    b => c [label="puback(rc=0x80)"];
    ...;
    c => b [label="publish, qos=2"];
    b => c [label="pubrec(rc=0x80)"];
    ...;
    c => b [label="publish, qos=2"];
    b => c [label="pubrec(rc=0x0)"];
    c => b [label="pubrel(rc=0x80)"];
    ...;
    c => b [label="publish, qos=2"];
    b => c [label="pubrec(rc=0x0)"];
    c => b [label="pubrel(rc=0x0)"];
    b => c [label="pubcomp(rc=0x80)"];
    ...;
}

For the QoS 2 exchange, it stops if any of the reason codes are 0x80 or above. This is a major improvement on previous versions of MQTT, where continuing with the exchange or terminating the connection were the only options.

Server initiated disconnect

In MQTT versions prior to 5.0, only the client could send a disconnect packet. This meant that in any case where the server wanted to end the conversation with a client, there was no option but to just terminate the TCP connection. A common case is when the server shuts down – there is no error in the interaction between broker and client, but the client has no idea what’s happening. In MQTT 5.0, the server can send a disconnect packet with a “Server shutting down” reason code:

msc {
    arcgradient="4";

    c [label="client"], b [label="broker"];

    c => b [label="connect"];
    b => c [label="connack(rc=0)"];
    ...;
    c => b [label="publish, qos=1"];
    b => c [label="disconnect(rc=139)"];
}

In this case, the client might wait for a while before attempting a reconnect, knowing that the server might not be available for a while.

Extensible metadata

The big change here is in addition to reason codes, each packet (apart from pings) can have properties. This is an extract of the full list:

Properties can be used to add extra information to responses, such as a reason string, or extra parameters to requests. A lot of the rest of the changes rely on properties because now we had a mechanism for adding that extra information to packets, we had to use it!

Request/response

The new request/response capability makes good use of properties. The requester subscribes to the topic it expects to receive responses on, then sets the value of the “Response Topic” property to that topic name. The responder simply uses that property to set the topic name for its response.

msc {
    arcgradient="4";

    c1 [label="client1"], b [label="broker"], c2 [label="client2"];

    c1 => b [label="subscribe, topic=resp"];
    b => c1 [label="suback"];

    c1 => b [label="publish, response_topic=resp"];
    b => c2 [label="publish, response_topic=resp"];

    ...;

    c2 => b [label="publish, topic=resp"];
    b => c1 [label="publish, topic=resp"];
}

The “Correlation Data” property can be used to set an id for each request, so that replies can be matched to requests by the requester.

Payload format indicator

There were fairly contentious discussions about how much flexibility there should be in payload format settings. Some were in favour of user definable payload formats. Others felt that if people could define their own formats it was no better than the current position, unless some body kept an approved list of format indicators and their meanings. That seemed a step too far for MQTT. MIME types were discussed, but the final approach is minimalistic – just two values: binary, as 3.1.1, or UTF-8 data.

Enhancements for Scalability

Improved error reporting helps scalability because exchanges between servers and clients become more efficient. Properties are again crucial to the following functions.

Simplified session state

One of the other big irritations with 3.1.1, along with the lack of nacks for publish commands, is the behaviour of the “clean session” flag. In earlier versions of MQTT, this started out as the “clean start” flag, where the session state was only cleaned up at the start of a session, not at the end. This was good for clients, because it meant you could ensure a clean starting point, and leave the session around in case you needed to reconnect. Not so good for servers, because clients would tend to leave the state lying around for ever.

Later on, this flag was changed to “clean session”, cleaning the session state both at the start and end of the session. Good for servers. For clients, if they want to ensure a clean slate to start with, but then want to have session state saved, they have to connect twice:

msc {
    arcgradient="4";

    c [label="client"], b [label="broker"];

    c => b [label="connect cleansession=true"];
    b => c [label="connack"];
    c => b [label="disconnect"];
    ...;
    c => b [label="connect cleansession=false"];
}

We knew we should fix this situation once and for all. The “clean session” flag becomes “clean start” once more – session state is only cleaned up at the start of the session. Then there is the “session expiry interval” property, a four-byte integer value in seconds which defaults to zero if omitted. If it is set to 0xFFFFFFFF (UINT_MAX), the session does not expire. To accomplish the above scenario:

msc {
    arcgradient="4", wordwraparcs=on;

    c [label="client"], b [label="broker"];

    c => b [label="connect cleanstart=true, expiry_interval=0xFFFFFFFF"];
    ...;
}

The MQTT-SN “offline keep alive” scenario is also catered for. By setting the expiry interval to a suitable non-zero value, the client can ensure that the session state is saved as long as it reconnects regularly. If the client disappears entirely, the session state will be cleaned up at some point. Both clients and servers are happy.

Shared subscriptions

To allow load balancing of high throughput topics, the concept of shared subscriptions is introduced to MQTT. Messages on these topics are sent to one of a group of subscribers rather than to them all. The subscriber indicates that the subscription is shared simply by subscribing to a special topic pattern:


$share/{ShareName}/{filter}

where ShareName is the name of the shared subscription group, and filter is the usual topic filter used in the subscribe request.

Optional server capabilities

Some server functionality is expensive to implement at large scale. In MQTT 5.0, the server can advertise the limitations on the functionality it provides in the connack properties. Some examples:

Retain Available
are retained messages supported?
Maximum QoS
the maximum publish QoS the server will accept
Maximum Packet Size
the maximum packet size the server will accept
Receive maximum
the maximum number of concurrent QoS 1 and 2 message the server will handle

Resource Constrained Clients and Performance Improvements

Various features fall into this category, including some already described. Some further examples follow.

Nolocal subscriptions

Up until MQTT 5.0, the publisher of a message will receive that message back if it is subscribed to the same topic. People often find this out in their first experience of writing an MQTT application, when they implement a shared chat room. There is now a subscribe option noLocal which when set, indicates that the publishing application should not receive its own messages.

msc {
    arcgradient="4", wordwraparcs=on;

    c [label="client"], b [label="broker"];

    c => b [label="subscribe topic=a"];
    b => c [label="suback"];
    c => b [label="publish topic=a"];
    b => c [label="publish topic=a"];
    ...;
    c => b [label="subscribe topic=b, noLocal"];
    b => c [label="suback"];
    c => b [label="publish topic=b"];
    ...;
}

Retained message control

Options on the subscribe request have been added to:

  • 0 = Send retained messages at the time of the subscribe
  • 1 = Send retained messages at subscribe only if the subscription does not currently exist
  • 2 = Do not send retained messages at the time of the subscribe

This could help particularly with the implementation of MQTT bridges from one broker to another.

Topic aliases

This capability exists in MQTT-SN, to reduce the size of the publish packet when long topic names are used. The publish request allows a numeric topic alias to be specified, which can be used in subsequent publish packets. Topic aliases on the client and the server are independent of each other, in much the same way as packet ids are.

msc {
    arcgradient="4", wordwraparcs=on;

    c [label="client"], b [label="broker"];

    c => b [label="publish topic=long_name,alias=1"];
    c => b [label="publish alias=1"];
    ...;
    b => c [label="publish topic=server_long_name, alias=1"];
    b => c [label="publish alias=1"];
    ...;
}

Topic aliases only exist for the lifetime of a TCP connection.

Specifying client limitations

To help protect implementations on small devices, the client can specify its limitations using properties on the connect packet. Some examples:

Maximum Packet Size
the maximum packet size the client can accept
Receive maximum
the maximum number of concurrent QoS 1 and 2 message the client can handle

It is an administrative action or decision on the part of the server to decide what to do with messages that it receives bound for a client for which that message exceeds the constraints. This is not particularly different from 3.1.1 where the message would be sent anyway, and then the client might be forced to disconnect as its only recourse. At a minimum, the server should probably emit a warning log message.

Eclipse Paho Progress

A release of the Eclipse Paho project is planned for June 2018 with its first implementations of MQTT 5.0. I first implemented a broker to test against in the Paho test project. It combines 3.1.1 and 5.0 implementations, and has been used by James Sutton to implement the Java MQTT 5.0 support. It is used in Travis and AppVeyor continuous integration tests for the MQTT 5.0 branches. Example output when you start it up is shown below.

My own C clients, embedded and main are planned to have a June release. The MQTT 5.0 implementations continue in the embedded mqttv5 and mqttv5 branches. Please do give your feedback or thoughts on these implementations as they progress via the GitHub issues: