本周依旧围绕 PyCon 2018 的视频
Clearer Code at Scale: Static Types at Zulip and Dropbox
本演讲的主题是通过 static types 使 code 变得 clearer
演讲者为 Greg Price,曾在 Dropbox 和 Zulip 工作
视频的 4:16 中解释了为什么需要 static types
Work incrementally
添加 static types 是一个渐进的过程,从新写的文件/函数开始做起
Define a clean build
CI 中添加 mypy,使得 static types 的正确性得到保证
- Have a no-arguments script
- Run on a(growing) subset of files
- Write down options mypy.ini
- Pin mypy version, because mypy is under development
- Run in CI
Get started in week
1) check toplevel code
在没有任何 annotations 的情况下直接运行 mypy,它会给出一系列的 error
比如 gunicorn 的
gunicorn/six.py:129: error: Need type annotation for '_moved_attributes'
gunicorn/six.py:222: error: Need type annotation for '__path__'
gunicorn/six.py:450: error: Need type annotation for '__path__'
减少检查的文件数量(从一开始),逐步修复
2) check inside functions
更改配置 mypy.ini
[mypy]
follow_imports = silent
check_untyped_defs = True
Robots do te boring work
运行时追踪类型的工具
- MonkeyType 需要Python 3.6+
- PyAnnotate 目前只能生成 Python 2 下的注释风格
静态推测类型的工具
Static types in Python, oh my(py)!
Zulip 的技术博客,写于 2016 年
During 2016, we have annotated 100% (!) of our backend with static types using mypy, and thanks to mypy, we are on the verge of switching to Python 3. Zulip is now the largest open source Python project that has fully adopted static types, though I doubt we’ll hold that title for long :).
About mypy
When mypy has complete type annotations for a module as well as its dependencies, then it can provide a very strong consistency check, similar to what the compiler provides in statically typed languages. Mypy uses the typeshed repository of type “stubs” (type definitions for a module in the style of a header file) to provide type data for both the Python standard library and dozens of popular libraries like requests, six, and sqlalchemy. Importantly, mypy is designed for gradually adding types; if type data for an import isn’t available, it just treats that import as being consistent with anything.
达到 100% 覆盖率来源于
- 1-week hackathon
- a GSOC project
- a party at the PyCon sprints
Benefits of using mypy
- Static type annotations improve readability.
- We can refactor with confidence.
- We upgraded to Python 3.
- mypy frequently catches bugs.
- Static types highlight bad interfaces.
有一些代码是毫无意义的,它的目的是避免出错,但事实上那种情况根本不会发生 Simplify get_stream_backend.
Pain points
因为 typing 需要类型,所以会产生 Import cycles,因为本文时间久远。现在可以参考 mypy 的文档
Finding bugs in your first few days with mypy
- Phase 1: Annotate the core library.
- Phase 2: Annotate the bulk of the codebase.
- Phase 3: Get to 100%.
- Phase 4: Celebrate and write a blog post!
Recommendations for annotating code
- Make sure to handle str vs. Text correctly. 方便从 2 过渡到 3
- Remember to annotate class variables!
- Avoid using a guess-and-check approach when adding annotations.
- Use precise types where possible, avoid use
Any
- When using type: ignore to work around a potential mypy or typeshed bug, I recommend using the following style to record the original GitHub issue:
bad_code # type: ignore # https://github.com/python/typeshed/issues/372
Type-checked Python in the real world
本视频同样为 Python 类型检查的实践,演讲者为 Instagram 的工程师 Carl Meyer
- Instagram 在生产环境中接了小部分流量到使用 Monkeytype 追踪的代码中
- Instagram 最近使用 Pyre 替换了 mypy。因为它非常快,mypy 跑 5 分半,Pyre 只用 45 秒
P.S. 这个 PPT 做的真心不错,除了圆括号不够圆
Why type
如下代码
def process(self, items):
for item in items:
self.append(item.value.id)
items
是什么?
items
应当是一个 iterable 的,它的每一个项都具有 value
属性,且 value
有一个 id
属性
这有利于重用,我们可以传入任何符合该契约的类型的参数
但是我们需要阅读代码中的每一行来提炼出此契约,每当我记不清时,我就需要读一遍。而且当我们去修改此函数的时候,我们又难以保证原先符合契约的参数能够符合新的契约。尤其是代码量巨大的情况下,我们要一层一层去寻找此函数的调用方,确保它能够继续 work。更何况是有些小角落里传入了根本不符合契约的参数,它可能在未来的某个日子抛出一个 AttributeError
。这种改动成本实在太高,不利于长时间的维护。所以我们需要一种更加显式的方式
from typing import Sequence
from .models import Item
def process(self, items: Sequence[Item]) -> None:
for item in items:
self.append(item.value.id)
其实这不是一件新事物,它只是将以往在 docstring 中做的抽离出作为注解的新形式存在。docstring 的缺点是难以保证有人修改了接口但是没有修改 docstring 中的描述,缺乏约束力。其实这种约束力来源于 mypy
的检查
How to even type
本小节主要讲解了 mypy
的使用 6:42,大部分都是很基础的东西。这里放几个我比较感兴趣的点
1)
在 Python 中允许容器类型是异构的(heterogeneous),但 mypy
希望是同构的(homogeneous)
l = [
(2, 3),
(4, 5),
]
l.append('hi')
# mypy will report
# t.py:6: error: Argument 1 to "append" of "list" has incompatible type "str"; expected "Tuple[int, int]"
可以通过显示的注解绕开
l: List[Union[str, Tuple[int, int]]] = [
(2, 3),
(4, 5),
]
但个人认为应当去思考为什么需要这么做。今天能 append
一个 str
,说不定后天又要 append
一个 set
,最后这个说不定变成了 List[Any]
2)
from typing import Optional
class Foo:
def __init__(self, id):
self.id = id
def get_foo(foo_id: Optional[int]) -> Optional[Foo]:
if foo_id is None:
return None
return Foo(foo_id)
my_foo = get_foo(3)
my_foo.id
mypy 0.550
中不会给出任何错误,但在 mypy-0.600
中
t.py:15: error: Item "None" of "Optional[Foo]" has no attribute "id"
这是因为 Optional[Foo]
可以是 None
也可以是 Foo
。mypy
在这里无法断定传入一个 int
后到底会返回何种类型
但我们知道,如果传入 int
则会返回 Foo
;传入 None
则会返回 None
。所以我们需要给 mypy
一点点提示
from typing import Optional, overload
@overload
def get_foo(foo_id: None) -> None:
"""overload"""
@overload
def get_foo(foo_id: int) -> Foo:
"""overload"""
def get_foo(foo_id: Optional[int]) -> Optional[Foo]:
if foo_id is None:
return None
return Foo(foo_id)
my_foo = get_foo(3)
my_foo.id
遗憾的是目前 flake8
会报错 F811 redefinition of unused 'get_foo' from line 9
目前可以写在 pyi
中,比如 typeshed
3)
创建 Protocol
以解决 duck typing
from typing_extensions import Protocol
class Renderable(Protocol):
def render(self) -> str:
pass
def render(obj: Renderable) -> str:
return obj.render()
class Foo:
def render(self) -> str:
return "Foo!"
render(Foo())
4)
但是我们有时候也需要充分发挥 Python 动态类型的优势,不再想被静态类型束缚。Python 提供了一些 escape hatch
Any
cast
# type: ignore
- Stub 文件(
.pyi
) 可以帮助 cython 和 C 扩展
GRADUAL typing
- annotated functions only
- network effect
- start with most-used
- use CI to defend progress
关于 PyRe 的工作原理可以参考 Facebook 工程师 Pieter Hooimeijer 所带来的演讲 Types, Deeper Static Analysis, and you