Date

本周依旧围绕 PyCon 2018 的视频

Clearer Code at Scale: Static Types at Zulip and Dropbox

本演讲的主题是通过 static types 使 code 变得 clearer

演讲者为 Greg Price,曾在 Dropbox 和 Zulip 工作

视频的 4:16 中解释了为什么需要 static types

Work incrementally

添加 static types 是一个渐进的过程,从新写的文件/函数开始做起

Define a clean build

CI 中添加 mypy,使得 static types 的正确性得到保证

  • Have a no-arguments script
  • Run on a(growing) subset of files
  • Write down options mypy.ini
  • Pin mypy version, because mypy is under development
  • Run in CI

Get started in week

1) check toplevel code

在没有任何 annotations 的情况下直接运行 mypy,它会给出一系列的 error

比如 gunicorn 的

gunicorn/six.py:129: error: Need type annotation for '_moved_attributes'
gunicorn/six.py:222: error: Need type annotation for '__path__'
gunicorn/six.py:450: error: Need type annotation for '__path__'

减少检查的文件数量(从一开始),逐步修复

2) check inside functions

更改配置 mypy.ini

[mypy]
follow_imports = silent
check_untyped_defs = True

Robots do te boring work

运行时追踪类型的工具

静态推测类型的工具

Static types in Python, oh my(py)!

Zulip 的技术博客,写于 2016 年

During 2016, we have annotated 100% (!) of our backend with static types using mypy, and thanks to mypy, we are on the verge of switching to Python 3. Zulip is now the largest open source Python project that has fully adopted static types, though I doubt we’ll hold that title for long :).

About mypy

When mypy has complete type annotations for a module as well as its dependencies, then it can provide a very strong consistency check, similar to what the compiler provides in statically typed languages. Mypy uses the typeshed repository of type “stubs” (type definitions for a module in the style of a header file) to provide type data for both the Python standard library and dozens of popular libraries like requests, six, and sqlalchemy. Importantly, mypy is designed for gradually adding types; if type data for an import isn’t available, it just treats that import as being consistent with anything.

达到 100% 覆盖率来源于

  • 1-week hackathon
  • a GSOC project
  • a party at the PyCon sprints

Benefits of using mypy

  • Static type annotations improve readability.
  • We can refactor with confidence.
  • We upgraded to Python 3.
  • mypy frequently catches bugs.
  • Static types highlight bad interfaces.

有一些代码是毫无意义的,它的目的是避免出错,但事实上那种情况根本不会发生 Simplify get_stream_backend.

Pain points

因为 typing 需要类型,所以会产生 Import cycles,因为本文时间久远。现在可以参考 mypy 的文档

Finding bugs in your first few days with mypy

  • Phase 1: Annotate the core library.
  • Phase 2: Annotate the bulk of the codebase.
  • Phase 3: Get to 100%.
  • Phase 4: Celebrate and write a blog post!

Recommendations for annotating code

  • Make sure to handle str vs. Text correctly. 方便从 2 过渡到 3
  • Remember to annotate class variables!
  • Avoid using a guess-and-check approach when adding annotations.
  • Use precise types where possible, avoid use Any
  • When using type: ignore to work around a potential mypy or typeshed bug, I recommend using the following style to record the original GitHub issue:
bad_code # type: ignore # https://github.com/python/typeshed/issues/372

Type-checked Python in the real world

本视频同样为 Python 类型检查的实践,演讲者为 Instagram 的工程师 Carl Meyer

  • Instagram 在生产环境中接了小部分流量到使用 Monkeytype 追踪的代码中
  • Instagram 最近使用 Pyre 替换了 mypy。因为它非常快,mypy 跑 5 分半,Pyre 只用 45 秒

P.S. 这个 PPT 做的真心不错,除了圆括号不够圆

Why type

如下代码

def process(self, items):
    for item in items:
        self.append(item.value.id)

items 是什么?

items 应当是一个 iterable 的,它的每一个项都具有 value 属性,且 value 有一个 id 属性

这有利于重用,我们可以传入任何符合该契约的类型的参数

但是我们需要阅读代码中的每一行来提炼出此契约,每当我记不清时,我就需要读一遍。而且当我们去修改此函数的时候,我们又难以保证原先符合契约的参数能够符合新的契约。尤其是代码量巨大的情况下,我们要一层一层去寻找此函数的调用方,确保它能够继续 work。更何况是有些小角落里传入了根本不符合契约的参数,它可能在未来的某个日子抛出一个 AttributeError。这种改动成本实在太高,不利于长时间的维护。所以我们需要一种更加显式的方式

from typing import Sequence
from .models import Item

def process(self, items: Sequence[Item]) -> None:
    for item in items:
        self.append(item.value.id)

其实这不是一件新事物,它只是将以往在 docstring 中做的抽离出作为注解的新形式存在。docstring 的缺点是难以保证有人修改了接口但是没有修改 docstring 中的描述,缺乏约束力。其实这种约束力来源于 mypy 的检查

How to even type

本小节主要讲解了 mypy 的使用 6:42,大部分都是很基础的东西。这里放几个我比较感兴趣的点

1)

在 Python 中允许容器类型是异构的(heterogeneous),但 mypy 希望是同构的(homogeneous)

l = [
    (2, 3),
    (4, 5),
]

l.append('hi')

# mypy will report
# t.py:6: error: Argument 1 to "append" of "list" has incompatible type "str"; expected "Tuple[int, int]"

可以通过显示的注解绕开

l: List[Union[str, Tuple[int, int]]] = [
    (2, 3),
    (4, 5),
]

但个人认为应当去思考为什么需要这么做。今天能 append 一个 str,说不定后天又要 append 一个 set,最后这个说不定变成了 List[Any]

2)
from typing import Optional

class Foo:
    def __init__(self, id):
        self.id = id

def get_foo(foo_id: Optional[int]) -> Optional[Foo]:
    if foo_id is None:
        return None
    return Foo(foo_id)

my_foo = get_foo(3)
my_foo.id

mypy 0.550 中不会给出任何错误,但在 mypy-0.600

t.py:15: error: Item "None" of "Optional[Foo]" has no attribute "id"

这是因为 Optional[Foo] 可以是 None 也可以是 Foomypy 在这里无法断定传入一个 int 后到底会返回何种类型

但我们知道,如果传入 int 则会返回 Foo;传入 None 则会返回 None。所以我们需要给 mypy 一点点提示

from typing import Optional, overload

@overload
def get_foo(foo_id: None) -> None:
    """overload"""

@overload
def get_foo(foo_id: int) -> Foo:
    """overload"""

def get_foo(foo_id: Optional[int]) -> Optional[Foo]:
    if foo_id is None:
        return None
    return Foo(foo_id)

my_foo = get_foo(3)
my_foo.id

遗憾的是目前 flake8 会报错 F811 redefinition of unused 'get_foo' from line 9
目前可以写在 pyi 中,比如 typeshed

3)

创建 Protocol 以解决 duck typing

from typing_extensions import Protocol

class Renderable(Protocol):
    def render(self) -> str:
        pass

def render(obj: Renderable) -> str:
    return obj.render()

class Foo:
    def render(self) -> str:
        return "Foo!"
render(Foo())
4)

但是我们有时候也需要充分发挥 Python 动态类型的优势,不再想被静态类型束缚。Python 提供了一些 escape hatch

  • Any
  • cast
  • # type: ignore
  • Stub 文件(.pyi) 可以帮助 cython 和 C 扩展

GRADUAL typing

23:46

  • annotated functions only
  • network effect
  • start with most-used
  • use CI to defend progress

关于 PyRe 的工作原理可以参考 Facebook 工程师 Pieter Hooimeijer 所带来的演讲 Types, Deeper Static Analysis, and you