Annotated RTMP

I've spent some time recently working on the RTMP protocol implementation of our Objective C iOS LCDS Messaging and Remoting client. (You can see a video of it in action here).

The RTMP protocol was designed for streaming multimedia with the ability to interleave other kinds of content. It's is a low-level binary protocol, so I've spent a lot of time over the last few months looking at dumps of bytes sent over the wire and decoding them to determine that we're sending the right bytes and that they match what our other clients are sending and expecting.

For anyone else implementing RTMP, or just interested in it, I've cleaned up and annotated some parts of a typical RTMP session below for your reading pleasure.

This RTMP stream represents a client connecting to the server, and sending a request to subscribe to a messaging destination. This example is how a client would start a data feed from the Market Data LCDS sample. The subscribe message s an instance of "flex.messaging.messages.CommandMessage", and looks like this:

[cc]
operation = multi_subscribe
clientId = null
correlationId = null
destination = market-data-feed
messageId = 3300FE5C-F850-451F-847C-B7D257E6D224
timestamp = 1296478297471
timeToLive = 0
body = null
hdr(DSMaxFrequency) = 1000
hdr(DSAddSub) =
[
ADBE_;_
]
[/cc]

And on to the protocol decoding. Bytes being sent to the server from the client are denoted with > (greater than) and bytes coming the other direction are denoted < (less than).

The first thing that happens is the client opens a socket connection to the RTMP server, and sends:

[cc]
> 03
[/cc]

This is the client telling the server "I support protocol version 3".

[cc]
> (1536 bytes of random data)
[/cc]

The client sends 1536 bytes of random data to the server. This is part of a sequence of exchanges that the server and client use to make a guess as to their available bandwidth.

[cc]
< 03 [/cc] This is the server responding with the protocol version that it is using. All is well; both report version 3.

[cc]
< (1536 bytes of random data) [/cc] The server also sends 1536 bytes of random data.

[cc]
< (Echo of the 1536 bytes that the client sent to the server) [/cc] The server sends back the data that the client sent it. This gives the client a way of guessing not only at bandwidth but a pretty good measure of latency. [cc] > (Echo of the 1536 bytes that the server sent to the client)
[/cc]

The client does the same.

Now the initial handshake is done and we can start talking RTMP.

Our goal is to request a subscription to an LCDS messaging destination, so we need to send the message to do this. This starts out with a chunk header (Type 0, section 6.1.2.1 of the RTMP spec):

[cc]
> 00 00 00 00 00 3D 11 00 00 00 00
[/cc]

This decodes as a timestamp of 0 (the first 3 bytes), message length of 61 bytes (next 3 bytes), message type ID 17 (decimal, 0x11), and a stream ID of 0 (the last 4 bytes). The message type is 17, which means an AMF3 encoded message follows.

Here's what the subscribe message looks like:

[cc]
> 0x00 (msg format amf3)
> 0x02 0x00 0x07 5f726573756c74 (string "_result")
> 0x05 (trxID = null - 0x05 is null in AMF0 and we're still in AMF0 mode)
> 0x05 (cmdData = null)
> 0x11 (AvmPlusObjectType (17) = an object follows)
[/cc]

Writing the CommandMessage object:

[cc]
> 0x0a // Type of what follow is an object, AMF3ObjectType, see section 3.1 of the AMF3 spec
[/cc]

Next, internally we have a function that walks the object's properties and serializes it. The object's properties are written first, then the values. So here's what the object's properties look like on the wire:

[cc]
> 0x81 0x13
[/cc]

This is a variable-length 29-bit integer encoding (see 1.3.1 in the AMF3 spec) of the value 147 (0x93) which is: 0x03 (U290-traits) | 0x08 (dynamic) | number of member names that follow (9 in this case).

Next we have the class name:

[cc]
> 0x4d // ((utflen << 1) | 1), utflen=38 > 66 6c 65 78 2e 6d 65 73 73 61 67 69 6e 67
> 2e 6d 65 73 73 61 67 65 73 2e 43 6f 6d 6d
> 61 6e 64 4d 65 73 73 61 67 65
[/cc]

This is a string serialization of the 38 byte class name, "flex.messaging.messages.CommandMessage".

Next we have the 9 property names, each represented as a UTF-8 variable length string:

[cc]
> 0x13 6f 70 65 72 61 74 69 6f 6e "timestamp"
> 0x0f 68 65 61 64 65 72 73 "headers"
> 0x09 62 70 65 72 61 74 69 6f 6e "operation"
> 0x09 62 6f 64 79 "body"
> 0x1b 63 6f 72 72 65 6c 61 74 69 6f 6e 49 64 "correlationId"
> 0x13 6d 65 73 73 61 67 65 49 64 "messageId"
> 0x15 74 69 6d 65 54 6f 4c 69 76 65 "timeToLive"
> 0x13 74 69 6d 65 73 74 61 6d 70 "timestamp"
> 0x11 63 6c 69 65 6e 74 49 64 "clientId"
> 0x17 64 65 73 74 69 6e 61 74 69 6f 6e "destination"
[/cc]

Now that the 9 names have been written, the 9 values are written (in the same order).

[cc]
> 5 42 72 de 1c d6 76 0 0
[/cc]

0x05 is a double, always written as 8 bytes

[cc]
> 0x0a object marker for 'headers'
[/cc]

Next up is 'headers', a map of name/value pairs that's written as a custom object. This takes almost the same form as the object we're currently in the middle of serializing, except that it's an anonymous class.

[cc]
> 0x43 // 4<<4 | 3 = embedded object (not a reference), 3 properties > 0x1 // object name (empty string in this case - 0x01 is the marker for a null)
[/cc]

Next the traits for the headers:

[cc]
> 0x1d 44 53 4d 61 78 46 72 65 71 75 65 6e 63 79 "DSMaxFrequency"
> 0x9 44 53 49 64 "DSId"
> 0x11 44 53 41 64 64 53 75 62 "DSAddSub"
> 15 44 53 45 6e 64 70 6f 69 6e 74 "DSEndpoint"
[/cc]

Next, the values for the headers:

[cc]
> 0x04 87 68
[/cc]

DSMaxFrequency header. 0x04 is an integer, and the next 2 bytes are a UInt29 representation of the integer value 1000. 1000 is ((7<<7) | 0x68).

[cc]
> 0x06 0x49 34 34 36 41 31 32 38 37 2d 42 31 41
> 46 2d 30 44 33 38 2d 43 35 31 41 2d
> 38 41 45 32 36 31 43 46 44 41 39 42
[/cc]

DSId header. 0x06 is a string; 0x49 is the length of the string << 1 with bit 0 unset. If bit 0 was not set, then it would be a reference into a string table. The string length is 36 bytes (it's a GUID) << 1 | 1.

[cc]
> 0x09 0x03 0x01 0x06 0x0f 41 44 42 45 5f 3b 5f
[/cc]

DSAddSub header. This is an array (0x09) of 1 element (bit 1 is always set, count is shifted left one), with no name (0x01 indicates a null string). The one element is a string (0x06) of length 7 (0x0f = 7 << 1 | 1), containing the bytes "ADBE_;_". This is what we're subscribing to.

[cc]
> 0x06 0x0f 6d 79 2d 72 74 6d 70 "my-rtmp"
[/cc]

DSEndpoint header. The endpoint name, "my-rtmp".

(At this point we're done with the embedded headers array and we're back to serializing the properties of the CommandMessage, continuing with the 'operation' property).

[cc]
> 0x09 0xb
[/cc]

The 'operation' property is an integer, value 11. Operation 11 is the multi-subscribe operation. (You can see the operations in the BlazeDS documentation for CommandMessage). http://livedocs.adobe.com/blazeds/1/javadoc/constant-values.html#flex.messaging.messages.CommandMessage.MULTI_SUBSCRIBE_OPERATION

[cc]
> 0x01 (body = null)
> 0x01 (correlationId = null)
> 0x06 0x49 37 35 37 35 44 42 45 34 2d 42 30 31
> 32 2d 34 41 45 46 2d 42 43 37 41 2d
> 41 37 34 44 30 36 43 46 37 42 46 44
[/cc]

The messageId is a GUID, sent similarly to the DSId header above.

[cc]
> 0x05 0 0 0 0 0 0 0 0
[/cc]

timeToLive is of type double (0x05) value zero (doubles are always 8 bytes).

[cc]
> 0x01 (clientId = null)
> 0x06 0x21 6d 61 72 6b 65 74 2d 64 61 74 61 2d 66 65 65 64 (market-data-feed)
[/cc]

Finally, the name of the feed that we're subscribing to, "market-data-feed".

And that's that.

As you can tell RTMP and AMF3 are incredibly efficient binary protocols - the decoded streams above show the efficiency of the protocol in general but don't show one of the other big gains, the tables that the client and server maintain so that if a string or object has already been sent once during a session, all that needs to be sent over the wire is a reference to the string, not the string itself. This shaves many more bytes off the session.

And why is that important? Well, especially in mobile apps, users are paying for bytes. There is typically less bandwidth. And even if your clients don't mind the traffic, think of the bytes you're saving on the server side when you've got a few thousand clients connected and receiving streaming updates. It can be significant.

Looking for more information about RTMP? The spec is here. The AMF3 spec is here, and it builds on the AMF0 spec which is here.