Python的wsgi理解

neutron源码学习基础知识储备之WSGI

WSGI的全称是Web Server Gateway Interface，翻译过来就是Web服务器网关接口。具体的来说，WSGI是一个规范，定义了Web服务器如何与Python应用程序进行交互，使得使用Python写的Web应用程序可以和Web服务器对接起来。WSGI一开始是在PEP-0333中定义的，最新版本是在Python的PEP-3333定义的。

为什么需要WSGI规范？

在Web部署的方案上，有一个方案是目前应用最广泛的：

首先，部署一个Web服务器专门用来处理HTTP协议层面相关的事情，比如如何在一个物理机上提供多个不同的Web服务（单IP多域名，单IP多端口等）这种事情。

然后，部署一个用各种语言编写（Java, PHP, Python, Ruby等）的应用程序，这个应用程序会从Web服务器上接收客户端的请求，处理完成后，再返回响应给Web服务器，最后由Web服务器返回给客户端。

要采用这种方案，Web服务器和应用程序之间就要知道如何进行交互。为了定义Web服务器和应用程序之间的交互过程，就形成了很多不同的规范。比如改进CGI性能的FasgCGI，Java专用的Servlet规范，还有Python专用的WSGI规范等。提出这些规范的目的就是为了定义统一的标准，提升程序的可移植性。在WSGI规范的最开始的PEP-333中一开始就描述了为什么需要WSGI规范。

WSGI存在的目的有两个：

让Web服务器知道如何调用Python应用程序，并且把用户的请求告诉应用程序。

让Python应用程序知道用户的具体请求是什么，以及如何返回结果给Web服务器。

WSGI中的角色？

在WSGI中定义了两个角色，Web服务器端称为server或者gateway，应用程序端称为application或者framework（因为WSGI的应用程序端的规范一般都是由具体的框架来实现的）。我们下面统一使用server和application这两个术语。

server端会先收到用户的请求，然后会根据规范的要求调用application端，如下图所示：

调用的结果会被封装成HTTP响应后再发送给客户端。

WSGI中间件 ?

WSGI Middleware（中间件）也是WSGI规范的一部分。上一章我们已经说明了WSGI的两个角色：server和application。那么middleware是一种运行在server和application中间的应用（一般都是Python应用）。middleware同时具备server和application角色，对于server来说，它是一个application；对于application来说，它是一个server。middleware并不修改server端和application端的规范，只是同时实现了这两个角色的功能而已。

1.Server收到客户端的HTTP请求后，生成了environ_s，并且已经定义了start_response_s。

2.Server调用Middleware的application对象，传递的参数是environ_s和start_response_s。

3.Middleware会根据environ执行业务逻辑，生成environ_m，并且已经定义了start_response_m。

4.Middleware决定调用Application的application对象，传递参数是environ_m和start_response_m。Application的application对象处理完成后，会调用start_response_m并且返回结果给Middleware，存放在result_m中。

5.Middleware处理result_m，然后生成result_s，接着调用start_response_s，并返回结果result_s给Server端。Server端获取到result_s后就可以发送结果给客户端了。

从上面的流程可以看出middleware应用的几个特点：

Server认为middleware是一个application。

Application认为middleware是一个server。

Middleware可以有多层。

WSGi示例代码？

在给出示例代码前我们需要了解wsgiref，它是官方给出的一个实现了WSGI标准用于演示用的简单Python内置库，实现了一个简单的WSGI Server和WSGI Application（在simple_server模块中），主要分为五个模块：simple_server， util， headers， handlers， validate。

注意：simple_server只支持单线程，做测试

WSGI对于应用程序有以下标准规定：

应用程序必须是一个可调用的对象，因此，应用程序可以是一个函数，一个类，或者一个重载了call的类的实例。

应用程序必须接受两个参数并且要按照位置顺序，分别是environ（环境变量），以及start_response函数（负责将响应的status code，headers写进缓冲区但不返回给客户端）。

应用程序返回的结果必须是一个可迭代的对象

由简入繁

from wsgiref.simple_server import make_server
def simple_app(environ,start_response):
    status="200 OK"
    response_headers=[('Content-type', 'text/plain')]
    start_response(status,response_headers)
    return [u"This is simple app demo".encode('utf-8')]
http=make_server('',8080,simple_app)
print ("Server on port 8080 ,listening ...")
http.serve_forever()

from wsgiref.simple_server import make_server
class App():
    def __call__(self, environ, start_response):
        status = "200 OK"
        response_headers = [('Content-type', 'text/plain')]
        start_response(status, response_headers)
        return [u"This is App".encode('utf-8')]
simple_app = App()
http = make_server('', 8080, simple_app) #只要是实现了__call__方法的实例也可以的
print ("Server on port 8080 ,listening ...")
http.serve_forever()

from wsgiref.simple_server import make_server
class class_app:
    def __init__(self, environ, start_response):
        self.env = environ
        self.start = start_response
    def __iter__(self):
        status = "200 OK"
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Class : My Own Hello World!"
app = class_app
httpd = make_server('', 8000, app)
print "Serving on port 8000..."
httpd.serve_forever()


from wsgiref.simple_server import make_server
URL_PATTERNS = (
    ('tags', 'tag_app'),
    ('about', 'about_app')
)
class Dispatcher(object):
    def _match(self, path):
        path = path.split("/")[1]
        for url, app in URL_PATTERNS:
            print("path:%s url:%s" % (path, url))
            if path == url:
                return app
    def __call__(self, environ, start_response):
        path = environ.get('PATH_INFO')
        app = self._match(path)
        if app:
            app = globals()[app]
            return app(environ, start_response)
        else:
            start_response("404 not found ", [('Content-type', 'text/plain')])
            return ["Page dose not exists!"]
def tag_app(environ, start_response):
    start_response("200 OK", [('Content-type', 'text/html')])
    return ["This is tag page!"]
def about_app(environ, start_response):
    start_response("200 OK", [('Content-type', 'text/html')])
    return ["This is about me page!"]
app = Dispatcher()
httpd = make_server('', 8000, app)
print "Serving on port 8000..."
httpd.serve_forever()

源码wsgiref解析?

wsgiref.simple_server 中make_server函数

# wsgiref/simple_server.py
def make_server(
    host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler
):
    """Create a new WSGI server listening on `host` and `port` for `app`"""
    server = server_class((host, port), handler_class)
    server.set_app(app)
    return server

make_server函数默认使用的服务器类为WSGI Server，调用了构造函数（但是它的构造函数到底藏在哪一层服务器上呢？），相对应的使用WSGIRequestHandler 类作为请求的处理类（这两个类都定义在wsgiref.simple_server模块中），在实例化一个WSGI Server后设置它的application后返回该实例。

server_class=WSGIServer

WSGI Server作为一个服务器，自然免不了要调用socket来建立TCP连接，因此这里的WSGI Server是基于Python的内置网络库BaseHTTPServer.py以及SocketServer.py实现的。

WSGI Server继承了HTTPServer,HTTPServer继承了TCPServer,TCPServer继承了BaseServer，在

BaseServerr中有handle_request函数

#SocketServer.py
def handle_request(self):
    """Handle one request, possibly blocking.
    Respects self.timeout.
    """
    # Support people who used socket.settimeout() to escape
    # handle_request before self.timeout was available.
    timeout = self.socket.gettimeout()
    if timeout is None:
        timeout = self.timeout
    #self.timeout是BaseServer类的属性，默认是None
    elif self.timeout is not None:
        timeout = min(timeout, self.timeout)
    fd_sets = _eintr_retry(select.select, [self], [], [], timeout)
    #处理EINTR，当捕获到某个信号且相应信号处理函数返回时，这个系统调用被中断，调用返回错误，设置errno为EINTR。
    if not fd_sets[0]:
        self.handle_timeout()
        return
    self._handle_request_noblock()

def _eintr_retry(func, *args):
    """restart a system call interrupted by EINTR"""
    while True:
        try:
            return func(*args)
        except (OSError, select.error) as e:
            if e.args[0] != errno.EINTR:
                rais

#SocketServer.py
def _handle_request_noblock(self):
    """Handle one request, without blocking.
    I assume that select.select has returned that the socket is
    readable before this function was called, so there should be
    no risk of blocking in get_request().
    """
    try:
        request, client_address = self.get_request()
    except socket.error:
        return
    if self.verify_request(request, client_address):
        try:
            self.process_request(request, client_address)
        except:
            self.handle_error(request, client_address)
            self.shutdown_request(request)
    else:
        self.shutdown_request(request)

关于使用select解决EINTR错误请参考这里：PEP 475 – Retry system calls failing with EINTR

因为我们把timeout设置为None，导致select.select永远不会超时，因此如果一直没有客户端连接服务器，服务器就会阻塞在select函数。当一个EINTR错误提出时，select可以重复调用。

通过select函数当我们确认已经收到了来自客户端的请求连接，此时调用accept函数不会阻塞时，于是调用handle_request_noblock函数,在函数中再依次调用了verify_request, process_request, finish_request。

#SocketServer.py 
def get_request(self):
    """Get the request and client address from the socket.
    May be overridden.
    """
    return self.socket.accept() #定义在TCPServer
def verify_request(self, request, client_address):
    """Verify the request.  May be overridden.
    Return True if we should proceed with this request.
    """
    return True
def process_request(self, request, client_address):
    """Call finish_request.
    Overridden by ForkingMixIn and ThreadingMixIn.
    """
    self.finish_request(request, client_address)
    self.shutdown_request(request)
def finish_request(self, request, client_address):
    """Finish one request by instantiating RequestHandlerClass."""
    self.RequestHandlerClass(request, client_address, self)
def shutdown_request(self, request):
    """Called to shutdown and close an individual request."""
    self.close_request(request)
def close_request(self, request):
    """Called to clean up an individual request."""
    pass

handle_request——->handle_request_noblock——–>get_request——–>verify_request——->

process_request———>finish_request———>RequestHandlerClass

RequestHandlerClass在simple_server 传入的是WSGIRequestHandler

handler_class=WSGIRequestHandler

RequestHandlerClass主要用于处理请求，生成一些必要的环境参数之后才传给负责发送响应请求的ServerHandler

WSGIRequestHandler的handle()继承如下，最后追踪到wsgiref/handles.py:BaseHandler

def run(self, application):
       """Invoke the application"""
       # Note to self: don't move the close()!  Asynchronous servers shouldn't
       # call close() from finish_response(), so if you close() anywhere but
       # the double-error branch here, you'll break asynchronous servers by
       # prematurely closing.  Async servers must return from 'run()' without
       # closing if there might still be output to iterate over.
       try:
           self.setup_environ()
           self.result = application(self.environ, self.start_response)
           self.finish_response()
       except:
           try:
               self.handle_error()
           except:
               # If we get an error handling an error, just give up already!
               self.close()
               raise   # ...and let the actual server figure it out.
  def finish_response(self):
       """Send any iterable data, then close self and the iterable
       Subclasses intended for use in asynchronous servers will
       want to redefine this method, such that it sets up callbacks
       in the event loop to iterate over the data, and to call
       'self.close()' once the response is finished.
       """
       try:
           if not self.result_is_file() or not self.sendfile():
               for data in self.result:
                   self.write(data)
               self.finish_content()
       finally:
           self.close()

ServerHandler函数主要功能集中在run函数上，同时start_response函数也定义在同一文件中，start_response函数（在application中调用）也必须要按照PEP-333标准定义

最终所有的数据都在finish_response()中写回给客户端。finish_response函数调用了write函数，write函数每次调用时都会检查headers是否已发送，否则先发送headers在发送data。

start_response函数源码

def start_response(self, status, headers,exc_info=None):
    """'start_response()' callable as specified by PEP 333"""
    if exc_info:
        try:
            if self.headers_sent:
                # Re-raise original exception if headers sent
                raise exc_info[0], exc_info[1], exc_info[2]
        finally:
            exc_info = None        # avoid dangling circular ref
    elif self.headers is not None:
        raise AssertionError("Headers already set!")
    assert type(status) is StringType,"Status must be a string"
    assert len(status)>=4,"Status must be at least 4 characters"
    assert int(status[:3]),"Status message must begin w/3-digit code"
    assert status[3]==" ", "Status message must have a space after code"
    if __debug__:
        for name,val in headers:
            assert type(name) is StringType,"Header names must be strings"
            assert type(val) is StringType,"Header values must be strings"
            assert not is_hop_by_hop(name),"Hop-by-hop headers not allowed"
    self.status = status
    self.headers = self.headers_class(headers)
    return self.write