Date

MessagePack specification

MessagePack 是一个类似于 JSON 的对象序列化规范。

MessagePack 中有两个概念:类型系统格式

序列化是指通过 MessagePack 类型系统将应用对象转换成 MessagePack 格式数据的过程。

反序列化是指通过 MessagePack 类型系统将 MessagePack 格式数据转换为应用对象的过程。

Serialization:
    Application objects
    -->  MessagePack type system
    -->  MessagePack formats (byte array)

Deserialization:
    MessagePack formats (byte array)
    -->  MessagePack type system
    -->  Application objects

本文档描述了 MessagePack 类型系统,MessagePack 格式及它们之间的转换。

Type system

  • 类型
    • Integer 表示整数
    • Nil 表示 nil
    • Boolean 表示 true 或 false
    • Float 表示 IEEE 754 双精度浮点数包含 NaN Infinity
    • Raw
      • String 扩展 Raw 类型,表示 UTF-8 字符串
      • Binary 扩展 Raw 类型,表示字节数组
    • Array 表示一个对象序列
    • Map 表示 Key/Value 结构的对象
    • Extension 表示类型信息元组和字节数组,其中类型信息是一个整数,其含义由应用程序或 MessagePack 规范定义
    • Timestamp 与时区无关的时间点,最大精度为纳秒。

Limitation

  • Integer 对象的范围是 -(2^63)(2^64)-1
  • Binary 对象的最大长度是 (2^32)-1
  • String 对象的最大字节大小是 (2^32)-1
  • String 对象可以包含无效的字节序列,并且反序列化的行为取决于实际实现
    • Deserializer 应提供获取原始字节数组的功能,以便可以让应用程序决定如何处理该对象
  • Array 对象的最大元素数目是 (2^32)-1
  • Map 对象的最大元素数量是 (2^32)-1

Extension types

MessagePack 允许应用使用 Extension 类型去定义自定义类型。Extension 类型由整数和字节数组构成,其中整数表示类型,字节数组用于表示数据。

应用可以分配 0127 以存储自定义类型信息。一个示例用法是应用定义 type = 0 作为应用唯一的类型系统,并且在 payload 中存储类型的名称和值。(Applications can assign 0 to 127 to store application-specific type information. An example usage is that application defines type = 0 as the application's unique type system, and stores name of a type and values of the type at the payload.)

MessagePack 保留 -1-128 用于在未来添加预定义类型。添加这些类型来换取更多类型,且不需要在不同的编程环境中使用预共享的静态类型模式。

[0, 127]: application-specific types
[-128, -1]: reserved for predefined types

由于 Extension 类型的加入,旧的应用可能无法实现全部。但是它们仍能够将其作为一种 Extension Type 进行处理。因此应用可以自主决定是否拒绝未知的 Extension types,接受不透明的数据,或者将其转移到另一个应用中进行处理。

预定义的 Extension types 如下表,对应的格式定义在 Formats 一节。

Name Type
Timestamp -1

Formats

Overview

format name first byte (in binary) first byte (in hex)
positive fixint 0xxxxxxx 0x00 - 0x7f
fixmap 1000xxxx 0x80 - 0x8f
fixarray 1001xxxx 0x90 - 0x9f
fixstr 101xxxxx 0xa0 - 0xbf
nil 11000000 0xc0
(never used) 11000001 0xc1
false 11000010 0xc2
true 11000011 0xc3
bin 8 11000100 0xc4
bin 16 11000101 0xc5
bin 32 11000110 0xc6
ext 8 11000111 0xc7
ext 16 11001000 0xc8
ext 32 11001001 0xc9
float 32 11001010 0xca
float 64 11001011 0xcb
uint 8 11001100 0xcc
uint 16 11001101 0xcd
uint 32 11001110 0xce
uint 64 11001111 0xcf
int 8 11010000 0xd0
int 16 11010001 0xd1
int 32 11010010 0xd2
int 64 11010011 0xd3
fixext 1 11010100 0xd4
fixext 2 11010101 0xd5
fixext 4 11010110 0xd6
fixext 8 11010111 0xd7
fixext 16 11011000 0xd8
str 8 11011001 0xd9
str 16 11011010 0xda
str 32 11011011 0xdb
array 16 11011100 0xdc
array 32 11011101 0xdd
map 16 11011110 0xde
map 32 11011111 0xdf
negative fixint 111xxxxx 0xe0 - 0xff

Notation in diagrams

one byte:
+--------+
|        |
+--------+

a variable number of bytes:
+========+
|        |
+========+

variable number of objects stored in MessagePack format:
+~~~~~~~~~~~~~~~~~+
|                 |
+~~~~~~~~~~~~~~~~~+

XYZA 将会被实际的 bit 所替换。

nil format

Nil 使用 1 字节存储。

nil:
+--------+
|  0xc0  |
+--------+

bool format family

Bool 使用 1 字节存储 true 或者 false。

false:
+--------+
|  0xc2  |
+--------+

true:
+--------+
|  0xc3  |
+--------+

int format family

Int 格式使用 1, 2, 3, 5, or 9 个字节来存储整数。

positive fixnum stores 7-bit positive integer
+--------+
|0XXXXXXX|
+--------+

negative fixnum stores 5-bit negative integer
+--------+
|111YYYYY|
+--------+

* 0XXXXXXX is 8-bit unsigned integer
* 111YYYYY is 8-bit signed integer

uint 8 stores a 8-bit unsigned integer
+--------+--------+
|  0xcc  |ZZZZZZZZ|
+--------+--------+

uint 16 stores a 16-bit big-endian unsigned integer
+--------+--------+--------+
|  0xcd  |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+

uint 32 stores a 32-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+
|  0xce  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+

uint 64 stores a 64-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0xcf  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+

int 8 stores a 8-bit signed integer
+--------+--------+
|  0xd0  |ZZZZZZZZ|
+--------+--------+

int 16 stores a 16-bit big-endian signed integer
+--------+--------+--------+
|  0xd1  |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+

int 32 stores a 32-bit big-endian signed integer
+--------+--------+--------+--------+--------+
|  0xd2  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+

int 64 stores a 64-bit big-endian signed integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0xd3  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+

float format family

Float 格式使用 5 或者 9 个字节进行存储。

float 32 stores a floating point number in IEEE 754 single precision floating point number format:
+--------+--------+--------+--------+--------+
|  0xca  |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+

float 64 stores a floating point number in IEEE 754 double precision floating point number format:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0xcb  |YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+

where
* XXXXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX is a big-endian IEEE 754 single precision floating point number.
  Extension of precision from single-precision to double-precision does not lose precision.
* YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY is a big-endian
  IEEE 754 double precision floating point number

str format family

Str 格式使用额外的 1, 2, 3, 或 5 个字节存储字节数组的大小,数据排列在长度的后面

fixstr stores a byte array whose length is upto 31 bytes:
+--------+========+
|101XXXXX|  data  |
+--------+========+

str 8 stores a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
|  0xd9  |YYYYYYYY|  data  |
+--------+--------+========+

str 16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
|  0xda  |ZZZZZZZZ|ZZZZZZZZ|  data  |
+--------+--------+--------+========+

str 32 stores a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+========+
|  0xdb  |AAAAAAAA|AAAAAAAA|AAAAAAAA|AAAAAAAA|  data  |
+--------+--------+--------+--------+--------+========+

where
* XXXXX is a 5-bit unsigned integer which represents N
* YYYYYYYY is a 8-bit unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ is a 16-bit big-endian unsigned integer which represents N
* AAAAAAAA_AAAAAAAA_AAAAAAAA_AAAAAAAA is a 32-bit big-endian unsigned integer which represents N
* N is the length of data

bin format family

Bin 格式使用额外的 2, 3, 或 5 个字节存储字节数组的大小,数据排列在后面

bin 8 stores a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
|  0xc4  |XXXXXXXX|  data  |
+--------+--------+========+

bin 16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
|  0xc5  |YYYYYYYY|YYYYYYYY|  data  |
+--------+--------+--------+========+

bin 32 stores a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+========+
|  0xc6  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|  data  |
+--------+--------+--------+--------+--------+========+

where
* XXXXXXXX is a 8-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the length of data

array format family

数组格式使用额外的 1, 3, 或 5 个字节存储数组长度,元素排列在后面

fixarray stores an array whose length is upto 15 elements:
+--------+~~~~~~~~~~~~~~~~~+
|1001XXXX|    N objects    |
+--------+~~~~~~~~~~~~~~~~~+

array 16 stores an array whose length is upto (2^16)-1 elements:
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
|  0xdc  |YYYYYYYY|YYYYYYYY|    N objects    |
+--------+--------+--------+~~~~~~~~~~~~~~~~~+

array 32 stores an array whose length is upto (2^32)-1 elements:
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
|  0xdd  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|    N objects    |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+

where
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the size of an array

map format family

Map 格式使用额外的 1, 3, 或 5 个字节存储哈希表的长度,Key/Value 对排列在后面

fixmap stores a map whose length is upto 15 elements
+--------+~~~~~~~~~~~~~~~~~+
|1000XXXX|   N*2 objects   |
+--------+~~~~~~~~~~~~~~~~~+

map 16 stores a map whose length is upto (2^16)-1 elements
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
|  0xde  |YYYYYYYY|YYYYYYYY|   N*2 objects   |
+--------+--------+--------+~~~~~~~~~~~~~~~~~+

map 32 stores a map whose length is upto (2^32)-1 elements
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
|  0xdf  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|   N*2 objects   |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+

where
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the size of a map
* odd elements in objects are keys of a map(奇数元素是 map 的 key)
* the next element of a key is its associated value(key 的下一个相邻的元素是其对应的 value)

ext format family

Ext 格式存储一个整数元组和一个字节数组。

fixext 1 stores an integer and a byte array whose length is 1 byte
+--------+--------+--------+
|  0xd4  |  type  |  data  |
+--------+--------+--------+

fixext 2 stores an integer and a byte array whose length is 2 bytes
+--------+--------+--------+--------+
|  0xd5  |  type  |       data      |
+--------+--------+--------+--------+

fixext 4 stores an integer and a byte array whose length is 4 bytes
+--------+--------+--------+--------+--------+--------+
|  0xd6  |  type  |                data               |
+--------+--------+--------+--------+--------+--------+

fixext 8 stores an integer and a byte array whose length is 8 bytes
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0xd7  |  type  |                                  data                                 |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+

fixext 16 stores an integer and a byte array whose length is 16 bytes
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0xd8  |  type  |                                  data                                  
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+
                              data (cont.)                              |
+--------+--------+--------+--------+--------+--------+--------+--------+

ext 8 stores an integer and a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+--------+========+
|  0xc7  |XXXXXXXX|  type  |  data  |
+--------+--------+--------+========+

ext 16 stores an integer and a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+--------+========+
|  0xc8  |YYYYYYYY|YYYYYYYY|  type  |  data  |
+--------+--------+--------+--------+========+

ext 32 stores an integer and a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+--------+========+
|  0xc9  |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|  type  |  data  |
+--------+--------+--------+--------+--------+--------+========+

where
* XXXXXXXX is a 8-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a big-endian 32-bit unsigned integer which represents N
* N is a length of data
* type is a signed 8-bit signed integer
* type < 0 is reserved for future extension including 2-byte type information

Timestamp extension type

-1 被分配到到扩展类型 Timestamp。它定义了3种格式:32位格式,64位格式和96位格式。

timestamp 32 stores the number of seconds that have elapsed since 1970-01-01 00:00:00 UTC
in an 32-bit unsigned integer:
+--------+--------+--------+--------+--------+--------+
|  0xd6  |   -1   |   seconds in 32-bit unsigned int  |
+--------+--------+--------+--------+--------+--------+

timestamp 64 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC
in 32-bit unsigned integers:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
|  0xd7  |   -1   |nanoseconds in 30-bit unsigned int |  seconds in 34-bit unsigned int   |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+

timestamp 96 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC
in 64-bit signed integer and 32-bit unsigned integer:
+--------+--------+--------+--------+--------+--------+--------+
|  0xc7  |   12   |   -1   |nanoseconds in 32-bit unsigned int |
+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+
                    seconds in 64-bit signed int                        |
+--------+--------+--------+--------+--------+--------+--------+--------+
  • Timestamp 32 格式可以表示 [1970-01-01 00:00:00 UTC, 2106-02-07 06:28:16 UTC) 范围内的时间,精确到秒。
  • Timestamp 64 格式可以表示 [1970-01-01 00:00:00.000000000 UTC, 2514-05-30 01:53:04.000000000 UTC) 范围内的时间,精确到纳秒。
  • Timestamp 96 格式可以表示 [-584554047284-02-23 16:59:44 UTC, 584554051223-11-09 07:00:16.000000000 UTC) 范围内的时间,精确到纳秒。
  • Timestamp 64 和 Timestamp 96 的纳秒部分不超过 999999999

序列化的伪代码:

struct timespec {
    long tv_sec;  // seconds
    long tv_nsec; // nanoseconds
} time;
if ((time.tv_sec >> 34) == 0) {
    uint64_t data64 = (time.tv_nsec << 34) | time.tv_sec;
    if (data64 & 0xffffffff00000000L == 0) {
        // timestamp 32
        uint32_t data32 = data64;
        serialize(0xd6, -1, data32)
    }
    else {
        // timestamp 64
        serialize(0xd7, -1, data64)
    }
}
else {
    // timestamp 96
    serialize(0xc7, 12, -1, time.tv_nsec, time.tv_sec)
}

反序列化的伪代码:

 ExtensionValue value = deserialize_ext_type();
 struct timespec result;
 switch(value.length) {
 case 4:
     uint32_t data32 = value.payload;
     result.tv_nsec = 0;
     result.tv_sec = data32;
 case 8:
     uint64_t data64 = value.payload;
     result.tv_nsec = data64 >> 34;
     result.tv_sec = data64 & 0x00000003ffffffffL;
 case 12:
     uint32_t data32 = value.payload;
     uint64_t data64 = value.payload + 4;
     result.tv_nsec = data32;
     result.tv_sec = data64;
 default:
     // error
 }

Serialization: type to format conversion

MessagePack serializers 将 MessagePack 类型转换为以下格式:

source types output format
Integer int format family (positive fixint, negative fixint, int 8/16/32/64 or uint 8/16/32/64)
Nil nil
Boolean bool format family (false or true)
Float float format family (float 32/64)
String str format family (fixstr or str 8/16/32)
Binary bin format family (bin 8/16/32)
Array array format family (fixarray or array 16/32)
Map map format family (fixmap or map 16/32)
Extension ext format family (fixext or ext 8/16/32)

如果一个对象可以用多种格式进行表示,那么 serializers 应当使用占用字节数目最少的数据格式。

Deserialization: format to type conversion

MessagePack deserializers 将 MessagePack 格式数据转换为以下类型:

source formats output type
positive fixint, negative fixint, int 8/16/32/64 and uint 8/16/32/64 Integer
nil Nil
false and true Boolean
float 32/64 Float
fixstr and str 8/16/32 String
bin 8/16/32 Binary
fixarray and array 16/32 Array
fixmap map 16/32 Map
fixext and ext 8/16/32 Extension