基于python socket认识HTTP 1.0和HTTP1.1

使用socket建立一个简单的HTTP server

#-*-encoding:utf-8-*-
from socket import *

s = socket(AF_INET, SOCK_STREAM)
s.bind(('127.0.0.1', 8888))
s.listen(2)

while True:
    client, addr = s.accept()
    print 'got a connection from: ', addr
    print client.recv(1024)
    msg = '''HTTP/1.0 200 OK
Server: python
Date: Fri, 01 Aug 2014 06:44:11 GMT
Content-Type: text/html;charset=UTF-8

<h1>hello</h1>
'''
    print msg
    client.sendall(msg)

    client.close()

使用浏览器或者curl访问http://127.0.0.1:8888,有:

$ curl http://127.0.0.1:8888
<h1>hello</h1>

如何处理大量数据

以HTTP server想客户端发送打来那个数据为例,如果数据量很大,或者数据是动态生成的,那么可以多次调用sendall(),也就是:

client.sendall(msg1)
client.sendall(msg2)
#...
client.close()

TCP报文的头部并不会指定其携带的数据量。关于TCP报文格式,可以参考TCP报文格式

那么客户端如何接收大量数据呢?在socket.recv – three ways to turn it into recvall (Python)给出了几种方法,代码如下:

import socket,struct,sys,time

Port=2222

#assume a socket disconnect (data returned is empty string) means  all data was #done being sent.
def recv_basic(the_socket):
    total_data=[]
    while True:
        data = the_socket.recv(8192)
        if not data: break
        total_data.append(data)
    return ''.join(total_data)

def recv_timeout(the_socket,timeout=2):
    the_socket.setblocking(0)
    total_data=[];data='';begin=time.time()
    while 1:
        #if you got some data, then break after wait sec
        if total_data and time.time()-begin>timeout:
            break
        #if you got no data at all, wait a little longer
        elif time.time()-begin>timeout*2:
            break
        try:
            data=the_socket.recv(8192)
            if data:
                total_data.append(data)
                begin=time.time()
            else:
                time.sleep(0.1)
        except:
            pass
    return ''.join(total_data)

End='something useable as an end marker'
def recv_end(the_socket):
    total_data=[];data=''
    while True:
            data=the_socket.recv(8192)
            if End in data:
                total_data.append(data[:data.find(End)])
                break
            total_data.append(data)
            if len(total_data)>1:
                #check if end_of_data was split
                last_pair=total_data[-2]+total_data[-1]
                if End in last_pair:
                    total_data[-2]=last_pair[:last_pair.find(End)]
                    total_data.pop()
                    break
    return ''.join(total_data)

def recv_size(the_socket):
    #data length is packed into 4 bytes
    total_len=0;total_data=[];size=sys.maxint
    size_data=sock_data='';recv_size=8192
    while total_len<size:
        sock_data=the_socket.recv(recv_size)
        if not total_data:
            if len(sock_data)>4:
                size_data+=sock_data
                size=struct.unpack('>i', size_data[:4])[0]
                recv_size=size
                if recv_size>524288:recv_size=524288
                total_data.append(size_data[4:])
            else:
                size_data+=sock_data
        else:
            total_data.append(sock_data)
        total_len=sum([len(i) for i in total_data ])
    return ''.join(total_data)


##############
def start_server(recv_type=''):
    sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sock.bind(('',Port))
    sock.listen(5)
    print 'started on',Port
    while True:
        newsock,address=sock.accept()
        print 'connected'
        if recv_type=='size': result=recv_size(newsock)
        elif recv_type=='end': result=recv_end(newsock)
        elif recv_type=='timeout': result=recv_timeout(newsock)
        else: result=newsock.recv(8192) 
        print 'got',result


if __name__=='__main__':
    #start_server()
    #start_server(recv_type='size')
    #start_server(recv_type='timeout')
    start_server(recv_type='end')

def send_size(data):
    sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sock.connect(('localhost',Port))
    sock.sendall(struct.pack('>i', len(data))+data)
    sock.close()

def send_end(data):
    sock=socket.socket(socket.AF_INET,socket.SOCK_STREAM)
    sock.connect(('localhost',Port))
    sock.sendall(data+End)
    sock.close()

HTTP1.0与HTTP 1.1

HTTP1.0现在已经很少使用,其一个特点是请求一个资源则建立一个TCP连接。假设http://127.0.0.1:80这个网页中还有两个图片资源,那么针对这个url,web浏览器将建立三次TCP连接。相反,如果使用HTTP 1.1协议,那么可以使用一次TCP连接获取这三个资源(网页html、2个图片)。那么这两种获取资源的方式该怎么实现?

HTTP1.0很简单,使用上面的方法就行了。

而HTTP1.1呢? 笔者在翻看python3的http模块的文档21.22. http.server — HTTP servers时,找到了下面的对参数protocol_version的解释:

protocol_version
This specifies the HTTP protocol version used in responses. If set to 'HTTP/1.1', the server will permit HTTP persistent connections; however, your server must then include an accurate Content-Length header (using send_header()) in all of its responses to clients. For backwards compatibility, the setting defaults to 'HTTP/1.0'.

即在HTTP的响应头上指定Content-Length的值,客户端(web浏览器)据此区分多个资源。

指定Content-Length这一方法适合响应静态内容,如果响应内容是动态的话,无法获取内容的长度,那么可以使用Transfer-Encoding: chunked来代替Content-Length

关于chunk,可以看一下:

Chunked transfer encoding
分块传输编码
transfer-encoding:chunked的含义

另外,keep-alive也是HTTP1.1的重要内容,可以参考HTTP协议头部与Keep-Alive模式详解。

你可能感兴趣的:(基于python socket认识HTTP 1.0和HTTP1.1)