Luckylau's Blog

Python yield 使用浅析

yield的概念?

1.简单的斐波那契數列第一版:

1
2
3
4
5
6
7
8
def fab(max):
n, a, b = 0, 0, 1
while n < max:
print b
a, b = b, a + b
n = n + 1
if __name__ == '__main__':
fab(5)

缺点:直接在 fab 函数中用 print 打印数字会导致该函数可复用性较差,因为 fab 函数返回 None,其他函数无法获得该函数生成的数列。

2.简单的斐波那契數列第二版:

1
2
3
4
5
6
7
8
9
10
11
def fab(max):
n, a, b = 0, 0, 1
L = []
while n < max:
L.append(b)
a, b = b, a + b
n = n + 1
return L
if __name__ == '__main__':
for n in fab(5):
print n

缺点:该函数在运行中占用的内存会随着参数 max 的增大而增大,如果要控制内存占用,最好不要用 List。

3.简单的斐波那契數列第三版:

根据range与xrange的思想设计:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class Fab(object):
def __init__(self,max):
self.max=max
self.n,self.a,self.b=0,0,1
def __iter__(self):
return self
def next(self):
if self.n<self.max:
r=self.b
self.a, self.b = self.b, self.a + self.b
self.n=self.n+1
return r
raise StopIteration
if __name__ == '__main__':
for n in Fab(5):
print n

缺点:代码远远没有第一版的 fab 函数来得简洁。

如果我们想要保持第一版 fab 函数的简洁性,同时又要获得 iterable 的效果,yield 就派上用场了。

1
2
3
4
5
6
7
8
9
def fab(max):
n, a, b = 0, 0, 1
while n < max:
yield b
a, b = b, a + b
n = n + 1
if __name__ == '__main__':
for n in fab(5):
print n

​ yield 的作用就是把一个函数变成一个 generator,带有 yield 的函数不再是一个普通函数,Python 解释器会将其视为一个 generator,调用 fab(5) 不会执行 fab 函数,而是返回一个 iterable 对象!在 for 循环执行时,每次循环都会执行 fab 函数内部的代码,执行到 yield b 时,fab 函数就返回一个迭代值,下次迭代时,代码从 yield b 的下一条语句继续执行,而函数的本地变量看起来和上次中断执行前是完全一样的,于是函数继续执行,直到再次遇到 yield。

1
2
3
4
5
6
7
8
9
10
11
12
13
def fab(max):
n, a, b = 0, 0, 1
while n < max:
yield b
a, b = b, a + b
n = n + 1
if __name__ == '__main__':
f=fab(5)
print (f.next())
print (f.next())
print (f.next())
print (f.next())
print (f.next())

一个带有 yield 的函数就是一个 generator,它和普通函数不同,生成一个 generator 看起来像函数调用,但不会执行任何函数代码,直到对其调用 next()(在 for 循环中会自动调用 next())才开始执行。虽然执行流程仍按函数的流程执行,但每执行到一个 yield 语句就会中断,并返回一个迭代值,下次执行时从 yield 的下一个语句继续执行。看起来就好像一个函数在正常执行的过程中被 yield 中断了数次,每次中断都会通过 yield 返回当前的迭代值。比如在读取文件时候很好使用的。

1
2
3
4
5
6
7
8
9
def read_file(fpath):
BLOCK_SIZE = 1024
with open(fpath, 'rb') as f:
while True:
block = f.read(BLOCK_SIZE)
if block:
yield block
else:
return

Iterables,Generators,Yield?

​ 当你创建了一个列表,你可以一个一个的读取它的每一项,这叫做iteration。所有你可以用在for...in...语句中的都是可迭代的:比如lists,strings,files…因为这些可迭代的对象你可以随意的读取所以非常方便易用,但是你必须把它们的值放到内存里,当它们有很多值时就会消耗太多的内存.

1
2
3
4
if __name__ == '__main__':
mylist = [x * x for x in range(3)]
for i in mylist:
print i

​ 生成器也是迭代器的一种,但是你只能迭代它们一次.原因很简单,因为它们不是全部存在内存里,它们只在要调用的时候在内存里生成。

1
2
3
4
if __name__ == '__main__':
mygenerator = (x * x for x in range(3))
for i in mygenerator:
print i

​ 生成器和迭代器的区别就是用()代替[],还有你不能用for i in mygenerator第二次调用生成器:首先计算0,然后会在内存里丢掉0去计算1,直到计算完4.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
def createGenerator():
mylist = range(3)
for i in mylist:
yield i * i
if __name__ == '__main__':
mygenerator = createGenerator()
print(mygenerator)
for i in mygenerator:
print i
#output
<generator object createGenerator at 0x7f5930639730>
0
1
4

​ 在这里这个例子好像没什么用,不过当你的函数要返回一个非常大的集合并且你希望只读一次的话,那么它就非常的方便了.要理解Yield你必须先理解当你调用函数的时候,函数里的代码并没有运行.函数仅仅返回生成器对象,这就是它最微妙的地方:-)然后呢,每当for语句迭代生成器的时候你的代码才会运转.一旦函数运行并没有碰到yeild语句就认为生成器已经为空了.原因有可能是循环结束或者没有满足if/else之类的.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
class Bank(): # 让我们建个银行,生产许多ATM
crisis = False
def create_atm(self):
while not self.crisis:
yield "$100"
if __name__ == '__main__':
hsbc = Bank() # 当一切就绪了你想要多少ATM就给你多少
corner_street_atm = hsbc.create_atm()
print(corner_street_atm.next())
print(corner_street_atm.next())
print([corner_street_atm.next() for cash in range(5)])
hsbc.crisis = True # cao,经济危机来了没有钱了!
print(corner_street_atm.next())
wall_street_atm = hsbc.create_atm() # 对于其他ATM,它还是True
print(wall_street_atm.next())
hsbc.crisis = False # 麻烦的是,尽管危机过去了,ATM还是空的
print(corner_street_atm.next())
brand_new_atm = hsbc.create_atm() # 只能重新新建一个atm了
for cash in brand_new_atm:
print cash
#output
$100
$100
['$100', '$100', '$100', '$100', '$100']
<type 'exceptions.StopIteration'>
<type 'exceptions.StopIteration'>
<type 'exceptions.StopIteration'>
$100
$100
.
.
.

yield的源码分析?

在解释生成器之前,需要讲解一下Python虚拟机的调用原理。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
typedef struct _frame {
PyObject_VAR_HEAD
struct _frame *f_back; /* previous frame, or NULL */
PyCodeObject *f_code; /* code segment */
PyObject *f_builtins; /* builtin symbol table (PyDictObject) */
PyObject *f_globals; /* global symbol table (PyDictObject) */
PyObject *f_locals; /* local symbol table (any mapping) */
PyObject **f_valuestack; /* points after the last local */
/* Next free slot in f_valuestack. Frame creation sets to f_valuestack.
Frame evaluation usually NULLs it, but a frame that yields sets it
to the current stack top. */
PyObject **f_stacktop;
PyObject *f_trace; /* Trace function */
/* If an exception is raised in this frame, the next there are used to
* record the exception info (if any) originally in the thread state. See
* comments before set_exc_info() -- it's not obvious.
* Invariant: if _type is NULL, then so are _value and _traceback.
* Desired invariant: all three are NULL, or all three are non-NULL. That
* one isn't currently true, but "should be".
*/
PyObject *f_exc_type, *f_exc_value, *f_exc_traceback;
PyThreadState *f_tstate;
int f_lasti; /* Last instruction if called */
/* Call PyFrame_GetLineNumber() instead of reading this field
directly. As of 2.3 f_lineno is only valid when tracing is
active (i.e. when f_trace is set). At other times we use
PyCode_Addr2Line to calculate the line from the current
bytecode index. */
int f_lineno; /* Current line number */
int f_iblock; /* index in f_blockstack */
PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */
} PyFrameObject;

生成器的源码在Objects/genobject.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
PyObject *
PyGen_New(PyFrameObject *f)
{
PyGenObject *gen = PyObject_GC_New(PyGenObject, &PyGen_Type); # 创建生成器对象
if (gen == NULL) {
Py_DECREF(f);
return NULL;
}
gen->gi_frame = f; # 赋予代码块
Py_INCREF(f->f_code); # 引用计数+1
gen->gi_code = (PyObject *)(f->f_code);
gen->gi_running = 0; # 0表示为执行,也就是生成器的初始状态
gen->gi_weakreflist = NULL;
_PyObject_GC_TRACK(gen); # GC跟踪
return (PyObject *)gen;
}

send与next

1
2
3
4
5
6
7
8
9
10
11
12
static PyObject *
gen_iternext(PyGenObject *gen)
{
return gen_send_ex(gen, NULL, 0);
}
static PyObject *
gen_send(PyGenObject *gen, PyObject *arg)
{
return gen_send_ex(gen, arg, 0);
}

从上面的代码中可以看到,send和next都是调用的同一函数gen_send_ex,区别在于是否带有参数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
static PyObject *
gen_send_ex(PyGenObject *gen, PyObject *arg, int exc)
{
PyThreadState *tstate = PyThreadState_GET();
PyFrameObject *f = gen->gi_frame;
PyObject *result;
if (gen->gi_running) { # 判断生成器是否已经运行
PyErr_SetString(PyExc_ValueError,
"generator already executing");
return NULL;
}
if (f==NULL || f->f_stacktop == NULL) { # 如果代码块为空或调用栈为空,则抛出StopIteration异常
/* Only set exception if called from send() */
if (arg && !exc)
PyErr_SetNone(PyExc_StopIteration);
return NULL;
}
if (f->f_lasti == -1) { # f_lasti=1 代表首次执行
if (arg && arg != Py_None) { # 首次执行不允许带有参数
PyErr_SetString(PyExc_TypeError,
"can't send non-None value to a "
"just-started generator");
return NULL;
}
} else {
/* Push arg onto the frame's value stack */
result = arg ? arg : Py_None;
Py_INCREF(result); # 该参数引用计数+1
*(f->f_stacktop++) = result; # 参数压栈
}
/* Generators always return to their most recent caller, not
* necessarily their creator. */
f->f_tstate = tstate;
Py_XINCREF(tstate->frame);
assert(f->f_back == NULL);
f->f_back = tstate->frame;
gen->gi_running = 1; # 修改生成器执行状态
result = PyEval_EvalFrameEx(f, exc); # 执行字节码
gen->gi_running = 0; # 恢复为未执行状态
/* Don't keep the reference to f_back any longer than necessary. It
* may keep a chain of frames alive or it could create a reference
* cycle. */
assert(f->f_back == tstate->frame);
Py_CLEAR(f->f_back);
/* Clear the borrowed reference to the thread state */
f->f_tstate = NULL;
/* If the generator just returned (as opposed to yielding), signal
* that the generator is exhausted. */
if (result == Py_None && f->f_stacktop == NULL) {
Py_DECREF(result);
result = NULL;
/* Set exception if not called by gen_iternext() */
if (arg)
PyErr_SetNone(PyExc_StopIteration);
}
if (!result || f->f_stacktop == NULL) {
/* generator can't be rerun, so release the frame */
Py_DECREF(f);
gen->gi_frame = NULL;
}
return result;
}

参考:

http://www.ibm.com/developerworks/cn/opensource/os-cn-python-yield/

http://www.cnblogs.com/coder2012/p/4990834.html

Luckylau wechat
如果对您有价值,看官可以打赏的!