001 - weekly reading

asyncio: A dumpster fire of bad design

本篇文章就是在喷 asyncio，尤其是在写作口吻上。关于 asyncio 还有一篇比较经典的 I don't understand Python's Asyncio。之所以放这篇文章，是因为作者所抱怨的地方也可能会是我以后踩到的坑。

Sin 1) 忽视了 generator API

Python 的 coroutine 是借助 generator 实现的。所以 asyncio 应当自然而然地使用 generator API 去和 coroutine 进行交互。(作者这里可能是基于 Python 3.4 来说的，PEP 492 引入了 async/await)。

利用 generator API，涉及事件循环的异步操作可以通过一系列的 yield 操作。事件循环通过 coro.send 使 coroutine 得到调度，yield 可以再次陷入事件循环中。这便是 coroutine 和事件循环交互的方式。而 asyncio 中则需要在每处需要用到事件循环的地方显式的传入此对象。比如 sleep 的实现

# asyncio/tasks.py
@coroutine
def sleep(delay, result=None, *, loop=None):
    """Coroutine that completes after a given time (in seconds)."""
    if delay == 0:
        yield
        return result

    if loop is None:
        loop = events.get_event_loop()
    future = loop.create_future()
    h = future._loop.call_later(delay,
                                futures._set_result_unless_cancelled,
                                future, result)
    try:
        return (yield from future)
    finally:
        h.cancel()

如果不传入则会通过 asyncio.get_event_loop 来获取当前的 loop 对象。它是 thread-local 的，所以当你想要在其他线程中去执行时会比较尴尬。

这在其他语言中可能还好，但是 generator API 的存在正是为了解决这种问题，它可以作为一种通信通道。有一个示例：PEP 342: 3. A simple co-routine scheduler or trampoline that lets coroutines call other coroutines by yielding the coroutine they wish to invoke.

Sin 2) 糟糕的网络 I/O

有三种网络处理方式：

Streams
Protocols
Raw loop functions

Streams

import asyncio

@asyncio.coroutine
def tcp_echo_client(message, loop):  
    reader, writer = yield from asyncio.open_connection('127.0.0.1', 8888,
                                                        loop=loop)
    print('Send: %r' % message)
    writer.write(message.encode())

    data = yield from reader.read(100)
    print('Received: %r' % data.decode())

    print('Close the socket')
    writer.close()

message = 'Hello World!'  
loop = asyncio.get_event_loop()  
loop.run_until_complete(tcp_echo_client(message, loop))  
loop.close()

作者的观点大致为：读操作是异步的，写操作是同步的。而且写入失败，asyncio 会丢弃错误，而不是等待写入。通知循环在一段时间后处理它，然后回到代码中。这可能意味着当你读取时，服务器并没有收到你的数据。解决方法是使用 writer.drain()，但为什么要分开成两个 API。

Protocols

它使用 callback，而且是同步的

import asyncio

class EchoClientProtocol(asyncio.Protocol):  
    def __init__(self, message, loop):
        self.message = message
        self.loop = loop

    def connection_made(self, transport):
        transport.write(self.message.encode())
        print('Data sent: {!r}'.format(self.message))

    def data_received(self, data):
        print('Data received: {!r}'.format(data.decode()))

    def connection_lost(self, exc):
        print('The server closed the connection')
        print('Stop the event loop')
        self.loop.stop()

loop = asyncio.get_event_loop()  
message = 'Hello World!'  
coro = loop.create_connection(lambda: EchoClientProtocol(message, loop),  
                              '127.0.0.1', 8888)
loop.run_until_complete(coro)  
loop.run_forever()  
loop.close()

作者观点大致为：同步的函数中无法 await coro。而且这里并不能指定读取多少字节，不够灵活。

本人认为：如果这里是支持 await coro 的 async def 本身好像就是一个错误的做法。因为无法保证顺序

同步 callback 下
- 数据1到来 -> callback1 - 数据2到来 -> callback2

异步 callback 下
- 数据1到来 -> callback1 执行到一半发生 yield - 数据2到来 -> callback2 执行到一半发生 yield - callback2 条件达成被再次调度，并执行完成 - callback1 条件达成被再次调度，并执行完成

这样可能需要进行同步处理

无法指定读取多少字节的问题，可以通过自己创建 buffer 来解决

Sin 3) 线程兼容性

一旦你在线程那侧(thread-land)，如果你想与 async 那侧(async-land)进行交互，那就 GG 了
loop.run_in_executor(executor, callable) 中，None 经常作为第一个参数。将它作为一个关键字参数，而不是位置参数的第一个貌似更好
需要使用 _threadsafe API

对比 asyncio 的线程支持和 curio 的异步线程。异步线程可以通过 AWAIT 函数和 async-land 进行通信。asyncio 则不提供这种兼容

Sin 4) 没有异步文件 I/O

aiofiles 对此进行了扩充

Sin 5) API 设计缺陷

asyncio 的一些 utils，不需要和 loop 进行绑定，它们应当被放到模块层级中。有一些函数和事件循环进行交互，比如 asyncio.open_connection，它们不应当被放到模块层级。
asyncio.ensure_future，可能会返回一个 Task，并在当前的事件循环上调度
缺少一个恰当的管理器。gather 和 wait 被设计成了独立的两个
不能在 as_completed 上使用 async for

目前 Python 异步框架有下面三个阵营

asyncio
curio
trio

就 GitHub 上的 repo 来说，asyncio 的生态环境还是最好的