python中使用urllib2偽造http報(bào)頭的2個(gè)方法

字號：小 中 大

    這篇文章主要介紹了python中使用urllib2偽造http報(bào)頭的2個(gè)方法,即偽造http頭信息,需要的朋友可以參考下
    在采集網(wǎng)頁信息的時(shí)候，經(jīng)常需要偽造報(bào)頭來實(shí)現(xiàn)采集腳本的有效執(zhí)行
    下面，我們將使用urllib2的header部分偽造報(bào)頭來實(shí)現(xiàn)采集信息
    方法1、
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    #encoding=utf-8
    #filename:urllib2-header.py
    import urllib2
    import sys
    #抓取網(wǎng)頁內(nèi)容-發(fā)送報(bào)頭-1
    url= http://www.xxx.net
    send_headers = {
     'host':'www.xxx.net',
     'user-agent':'mozilla/5.0 (windows nt 6.2; rv:16.0) gecko/20100101 firefox/16.0',
     'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
     'connection':'keep-alive'
    }
    req = urllib2.request(url,headers=send_headers)
    r = urllib2.urlopen(req)
    html = r.read() #返回網(wǎng)頁內(nèi)容
    receive_header = r.info() #返回的報(bào)頭信息
    # sys.getfilesystemencoding()
    html = html.decode('utf-8','replace').encode(sys.getfilesystemencoding()) #轉(zhuǎn)碼:避免輸出出現(xiàn)亂碼
    print receive_header
    # print '####################################'
    print html
    方法2、
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    #encoding=utf-8
    #filename:urllib2-header.py
    import urllib2
    import sys
    url = 'http://www.xxx.net'
    req = urllib2.request(url)
    req.add_header('referer','http://www.xxx.net/')
    req.add_header('user-agent','mozilla/5.0 (windows nt 6.2; rv:16.0) gecko/20100101 firefox/16.0')
    r = urllib2.urlopen(req)
    html = r.read()
    receive_header = r.info()
    html = html.decode('utf-8').encode(sys.getfilesystemencoding())
    print receive_header
    print '#####################################'
    print html

感谢您访问我们的网站，您可能还对以下资源感兴趣：

国产成人无码手机在线视频_亚洲另类欧美小说图片区_成人在线手机版视频视频_亚洲另类欧美日本

python中使用urllib2偽造http報(bào)頭的2個(gè)方法

字號： 小 中 大

字號：小中大