MessagePack specification
MessagePack 是一个类似于 JSON 的对象序列化规范。
MessagePack 中有两个概念:类型系统 和 格式
序列化是指通过 MessagePack 类型系统将应用对象转换成 MessagePack 格式数据的过程。
反序列化是指通过 MessagePack 类型系统将 MessagePack 格式数据转换为应用对象的过程。
Serialization:
Application objects
--> MessagePack type system
--> MessagePack formats (byte array)
Deserialization:
MessagePack formats (byte array)
--> MessagePack type system
--> Application objects
本文档描述了 MessagePack 类型系统,MessagePack 格式及它们之间的转换。
Type system
- 类型
- Integer 表示整数
- Nil 表示 nil
- Boolean 表示 true 或 false
- Float 表示 IEEE 754 双精度浮点数包含 NaN Infinity
- Raw
- String 扩展 Raw 类型,表示 UTF-8 字符串
- Binary 扩展 Raw 类型,表示字节数组
- Array 表示一个对象序列
- Map 表示 Key/Value 结构的对象
- Extension 表示类型信息元组和字节数组,其中类型信息是一个整数,其含义由应用程序或 MessagePack 规范定义
- Timestamp 与时区无关的时间点,最大精度为纳秒。
Limitation
- Integer 对象的范围是
-(2^63)
到(2^64)-1
- Binary 对象的最大长度是
(2^32)-1
- String 对象的最大字节大小是
(2^32)-1
- String 对象可以包含无效的字节序列,并且反序列化的行为取决于实际实现
- Deserializer 应提供获取原始字节数组的功能,以便可以让应用程序决定如何处理该对象
- Array 对象的最大元素数目是
(2^32)-1
- Map 对象的最大元素数量是
(2^32)-1
Extension types
MessagePack 允许应用使用 Extension 类型去定义自定义类型。Extension 类型由整数和字节数组构成,其中整数表示类型,字节数组用于表示数据。
应用可以分配 0
到 127
以存储自定义类型信息。一个示例用法是应用定义 type = 0
作为应用唯一的类型系统,并且在 payload 中存储类型的名称和值。(Applications can assign 0
to 127
to store application-specific type information. An example usage is that application defines type = 0
as the application's unique type system, and stores name of a type and values of the type at the payload.)
MessagePack 保留 -1
到 -128
用于在未来添加预定义类型。添加这些类型来换取更多类型,且不需要在不同的编程环境中使用预共享的静态类型模式。
[0, 127]: application-specific types
[-128, -1]: reserved for predefined types
由于 Extension 类型的加入,旧的应用可能无法实现全部。但是它们仍能够将其作为一种 Extension Type 进行处理。因此应用可以自主决定是否拒绝未知的 Extension types,接受不透明的数据,或者将其转移到另一个应用中进行处理。
预定义的 Extension types 如下表,对应的格式定义在 Formats 一节。
Name | Type |
---|---|
Timestamp | -1 |
Formats
Overview
format name | first byte (in binary) | first byte (in hex) |
---|---|---|
positive fixint | 0xxxxxxx | 0x00 - 0x7f |
fixmap | 1000xxxx | 0x80 - 0x8f |
fixarray | 1001xxxx | 0x90 - 0x9f |
fixstr | 101xxxxx | 0xa0 - 0xbf |
nil | 11000000 | 0xc0 |
(never used) | 11000001 | 0xc1 |
false | 11000010 | 0xc2 |
true | 11000011 | 0xc3 |
bin 8 | 11000100 | 0xc4 |
bin 16 | 11000101 | 0xc5 |
bin 32 | 11000110 | 0xc6 |
ext 8 | 11000111 | 0xc7 |
ext 16 | 11001000 | 0xc8 |
ext 32 | 11001001 | 0xc9 |
float 32 | 11001010 | 0xca |
float 64 | 11001011 | 0xcb |
uint 8 | 11001100 | 0xcc |
uint 16 | 11001101 | 0xcd |
uint 32 | 11001110 | 0xce |
uint 64 | 11001111 | 0xcf |
int 8 | 11010000 | 0xd0 |
int 16 | 11010001 | 0xd1 |
int 32 | 11010010 | 0xd2 |
int 64 | 11010011 | 0xd3 |
fixext 1 | 11010100 | 0xd4 |
fixext 2 | 11010101 | 0xd5 |
fixext 4 | 11010110 | 0xd6 |
fixext 8 | 11010111 | 0xd7 |
fixext 16 | 11011000 | 0xd8 |
str 8 | 11011001 | 0xd9 |
str 16 | 11011010 | 0xda |
str 32 | 11011011 | 0xdb |
array 16 | 11011100 | 0xdc |
array 32 | 11011101 | 0xdd |
map 16 | 11011110 | 0xde |
map 32 | 11011111 | 0xdf |
negative fixint | 111xxxxx | 0xe0 - 0xff |
Notation in diagrams
one byte:
+--------+
| |
+--------+
a variable number of bytes:
+========+
| |
+========+
variable number of objects stored in MessagePack format:
+~~~~~~~~~~~~~~~~~+
| |
+~~~~~~~~~~~~~~~~~+
X
,Y
,Z
和 A
将会被实际的 bit 所替换。
nil format
Nil 使用 1 字节存储。
nil:
+--------+
| 0xc0 |
+--------+
bool format family
Bool 使用 1 字节存储 true 或者 false。
false:
+--------+
| 0xc2 |
+--------+
true:
+--------+
| 0xc3 |
+--------+
int format family
Int 格式使用 1, 2, 3, 5, or 9 个字节来存储整数。
positive fixnum stores 7-bit positive integer
+--------+
|0XXXXXXX|
+--------+
negative fixnum stores 5-bit negative integer
+--------+
|111YYYYY|
+--------+
* 0XXXXXXX is 8-bit unsigned integer
* 111YYYYY is 8-bit signed integer
uint 8 stores a 8-bit unsigned integer
+--------+--------+
| 0xcc |ZZZZZZZZ|
+--------+--------+
uint 16 stores a 16-bit big-endian unsigned integer
+--------+--------+--------+
| 0xcd |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+
uint 32 stores a 32-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+
| 0xce |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+
uint 64 stores a 64-bit big-endian unsigned integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xcf |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
int 8 stores a 8-bit signed integer
+--------+--------+
| 0xd0 |ZZZZZZZZ|
+--------+--------+
int 16 stores a 16-bit big-endian signed integer
+--------+--------+--------+
| 0xd1 |ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+
int 32 stores a 32-bit big-endian signed integer
+--------+--------+--------+--------+--------+
| 0xd2 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+
int 64 stores a 64-bit big-endian signed integer
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd3 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
float format family
Float 格式使用 5 或者 9 个字节进行存储。
float 32 stores a floating point number in IEEE 754 single precision floating point number format:
+--------+--------+--------+--------+--------+
| 0xca |XXXXXXXX|XXXXXXXX|XXXXXXXX|XXXXXXXX|
+--------+--------+--------+--------+--------+
float 64 stores a floating point number in IEEE 754 double precision floating point number format:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xcb |YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|YYYYYYYY|
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
where
* XXXXXXXX_XXXXXXXX_XXXXXXXX_XXXXXXXX is a big-endian IEEE 754 single precision floating point number.
Extension of precision from single-precision to double-precision does not lose precision.
* YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY_YYYYYYYY is a big-endian
IEEE 754 double precision floating point number
str format family
Str 格式使用额外的 1, 2, 3, 或 5 个字节存储字节数组的大小,数据排列在长度的后面
fixstr stores a byte array whose length is upto 31 bytes:
+--------+========+
|101XXXXX| data |
+--------+========+
str 8 stores a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
| 0xd9 |YYYYYYYY| data |
+--------+--------+========+
str 16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
| 0xda |ZZZZZZZZ|ZZZZZZZZ| data |
+--------+--------+--------+========+
str 32 stores a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+========+
| 0xdb |AAAAAAAA|AAAAAAAA|AAAAAAAA|AAAAAAAA| data |
+--------+--------+--------+--------+--------+========+
where
* XXXXX is a 5-bit unsigned integer which represents N
* YYYYYYYY is a 8-bit unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ is a 16-bit big-endian unsigned integer which represents N
* AAAAAAAA_AAAAAAAA_AAAAAAAA_AAAAAAAA is a 32-bit big-endian unsigned integer which represents N
* N is the length of data
bin format family
Bin 格式使用额外的 2, 3, 或 5 个字节存储字节数组的大小,数据排列在后面
bin 8 stores a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+========+
| 0xc4 |XXXXXXXX| data |
+--------+--------+========+
bin 16 stores a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+========+
| 0xc5 |YYYYYYYY|YYYYYYYY| data |
+--------+--------+--------+========+
bin 32 stores a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+========+
| 0xc6 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| data |
+--------+--------+--------+--------+--------+========+
where
* XXXXXXXX is a 8-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the length of data
array format family
数组格式使用额外的 1, 3, 或 5 个字节存储数组长度,元素排列在后面
fixarray stores an array whose length is upto 15 elements:
+--------+~~~~~~~~~~~~~~~~~+
|1001XXXX| N objects |
+--------+~~~~~~~~~~~~~~~~~+
array 16 stores an array whose length is upto (2^16)-1 elements:
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xdc |YYYYYYYY|YYYYYYYY| N objects |
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
array 32 stores an array whose length is upto (2^32)-1 elements:
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xdd |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| N objects |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
where
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the size of an array
map format family
Map 格式使用额外的 1, 3, 或 5 个字节存储哈希表的长度,Key/Value 对排列在后面
fixmap stores a map whose length is upto 15 elements
+--------+~~~~~~~~~~~~~~~~~+
|1000XXXX| N*2 objects |
+--------+~~~~~~~~~~~~~~~~~+
map 16 stores a map whose length is upto (2^16)-1 elements
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xde |YYYYYYYY|YYYYYYYY| N*2 objects |
+--------+--------+--------+~~~~~~~~~~~~~~~~~+
map 32 stores a map whose length is upto (2^32)-1 elements
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
| 0xdf |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| N*2 objects |
+--------+--------+--------+--------+--------+~~~~~~~~~~~~~~~~~+
where
* XXXX is a 4-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a 32-bit big-endian unsigned integer which represents N
* N is the size of a map
* odd elements in objects are keys of a map(奇数元素是 map 的 key)
* the next element of a key is its associated value(key 的下一个相邻的元素是其对应的 value)
ext format family
Ext 格式存储一个整数元组和一个字节数组。
fixext 1 stores an integer and a byte array whose length is 1 byte
+--------+--------+--------+
| 0xd4 | type | data |
+--------+--------+--------+
fixext 2 stores an integer and a byte array whose length is 2 bytes
+--------+--------+--------+--------+
| 0xd5 | type | data |
+--------+--------+--------+--------+
fixext 4 stores an integer and a byte array whose length is 4 bytes
+--------+--------+--------+--------+--------+--------+
| 0xd6 | type | data |
+--------+--------+--------+--------+--------+--------+
fixext 8 stores an integer and a byte array whose length is 8 bytes
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd7 | type | data |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
fixext 16 stores an integer and a byte array whose length is 16 bytes
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd8 | type | data
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+
data (cont.) |
+--------+--------+--------+--------+--------+--------+--------+--------+
ext 8 stores an integer and a byte array whose length is upto (2^8)-1 bytes:
+--------+--------+--------+========+
| 0xc7 |XXXXXXXX| type | data |
+--------+--------+--------+========+
ext 16 stores an integer and a byte array whose length is upto (2^16)-1 bytes:
+--------+--------+--------+--------+========+
| 0xc8 |YYYYYYYY|YYYYYYYY| type | data |
+--------+--------+--------+--------+========+
ext 32 stores an integer and a byte array whose length is upto (2^32)-1 bytes:
+--------+--------+--------+--------+--------+--------+========+
| 0xc9 |ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ|ZZZZZZZZ| type | data |
+--------+--------+--------+--------+--------+--------+========+
where
* XXXXXXXX is a 8-bit unsigned integer which represents N
* YYYYYYYY_YYYYYYYY is a 16-bit big-endian unsigned integer which represents N
* ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ_ZZZZZZZZ is a big-endian 32-bit unsigned integer which represents N
* N is a length of data
* type is a signed 8-bit signed integer
* type < 0 is reserved for future extension including 2-byte type information
Timestamp extension type
-1
被分配到到扩展类型 Timestamp。它定义了3种格式:32位格式,64位格式和96位格式。
timestamp 32 stores the number of seconds that have elapsed since 1970-01-01 00:00:00 UTC
in an 32-bit unsigned integer:
+--------+--------+--------+--------+--------+--------+
| 0xd6 | -1 | seconds in 32-bit unsigned int |
+--------+--------+--------+--------+--------+--------+
timestamp 64 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC
in 32-bit unsigned integers:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| 0xd7 | -1 |nanoseconds in 30-bit unsigned int | seconds in 34-bit unsigned int |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
timestamp 96 stores the number of seconds and nanoseconds that have elapsed since 1970-01-01 00:00:00 UTC
in 64-bit signed integer and 32-bit unsigned integer:
+--------+--------+--------+--------+--------+--------+--------+
| 0xc7 | 12 | -1 |nanoseconds in 32-bit unsigned int |
+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+
seconds in 64-bit signed int |
+--------+--------+--------+--------+--------+--------+--------+--------+
- Timestamp 32 格式可以表示 [1970-01-01 00:00:00 UTC, 2106-02-07 06:28:16 UTC) 范围内的时间,精确到秒。
- Timestamp 64 格式可以表示 [1970-01-01 00:00:00.000000000 UTC, 2514-05-30 01:53:04.000000000 UTC) 范围内的时间,精确到纳秒。
- Timestamp 96 格式可以表示 [-584554047284-02-23 16:59:44 UTC, 584554051223-11-09 07:00:16.000000000 UTC) 范围内的时间,精确到纳秒。
- Timestamp 64 和 Timestamp 96 的纳秒部分不超过 999999999
序列化的伪代码:
struct timespec {
long tv_sec; // seconds
long tv_nsec; // nanoseconds
} time;
if ((time.tv_sec >> 34) == 0) {
uint64_t data64 = (time.tv_nsec << 34) | time.tv_sec;
if (data64 & 0xffffffff00000000L == 0) {
// timestamp 32
uint32_t data32 = data64;
serialize(0xd6, -1, data32)
}
else {
// timestamp 64
serialize(0xd7, -1, data64)
}
}
else {
// timestamp 96
serialize(0xc7, 12, -1, time.tv_nsec, time.tv_sec)
}
反序列化的伪代码:
ExtensionValue value = deserialize_ext_type();
struct timespec result;
switch(value.length) {
case 4:
uint32_t data32 = value.payload;
result.tv_nsec = 0;
result.tv_sec = data32;
case 8:
uint64_t data64 = value.payload;
result.tv_nsec = data64 >> 34;
result.tv_sec = data64 & 0x00000003ffffffffL;
case 12:
uint32_t data32 = value.payload;
uint64_t data64 = value.payload + 4;
result.tv_nsec = data32;
result.tv_sec = data64;
default:
// error
}
Serialization: type to format conversion
MessagePack serializers 将 MessagePack 类型转换为以下格式:
source types | output format |
---|---|
Integer | int format family (positive fixint, negative fixint, int 8/16/32/64 or uint 8/16/32/64) |
Nil | nil |
Boolean | bool format family (false or true) |
Float | float format family (float 32/64) |
String | str format family (fixstr or str 8/16/32) |
Binary | bin format family (bin 8/16/32) |
Array | array format family (fixarray or array 16/32) |
Map | map format family (fixmap or map 16/32) |
Extension | ext format family (fixext or ext 8/16/32) |
如果一个对象可以用多种格式进行表示,那么 serializers 应当使用占用字节数目最少的数据格式。
Deserialization: format to type conversion
MessagePack deserializers 将 MessagePack 格式数据转换为以下类型:
source formats | output type |
---|---|
positive fixint, negative fixint, int 8/16/32/64 and uint 8/16/32/64 | Integer |
nil | Nil |
false and true | Boolean |
float 32/64 | Float |
fixstr and str 8/16/32 | String |
bin 8/16/32 | Binary |
fixarray and array 16/32 | Array |
fixmap map 16/32 | Map |
fixext and ext 8/16/32 | Extension |