python爬虫——scrapy的使用( 二 ) _中间

这就是项目的中间件配置了。参数挺多，不过设置代理，我们只需要关注几个参数即可。
找到re，这个类就是我们的项目下载器的中间件，我们的代理配置主要写在函数里。
改成
def process_request(self, request, spider):# Called for each request that goes through the downloader# middleware.# Must either:# - return None: continue processing this request# - or return a Response object# - or return a Request object# - or raise IgnoreRequest: process_exception() methods of#installed downloader middleware will be called# 代理IP由快代理赞助proxy = '112.192.158.65:20823'# 做了个处理,http与httpsif request.url.startswith("http://"):request.meta['proxy'] = "http://%s" % proxyelif request.url.startswith("https://"):request.meta['proxy'] = "https://%s" % proxyreturn None
修改后如图
.py文件也需要修改，需要修改两处：
1、参数，记住爬虫要模拟用户的真实请求。找到参数，改成自己浏览器的ua，不知道怎么找ua的同学可以看我前面的一篇教程（点击跳转）中有提到。
修改后
2、找到ES参数，Ctrl + f搜索，注释掉这个配置保存，启动配置
注释后，修改后
好的，在改下我们的爬虫代码，目录下的baidu.py文件。因为我们用了代理，所以去访问下查IP的网站，看看是否用上代理了。
# -*- coding: utf-8 -*-import scrapyclass BaiduSpider(scrapy.Spider):name = 'baidu'allowed_domains = ['baidu.com']start_urls = ['https://www.baidu.com/s?ie=UTF-8&wd=ip']def parse(self, response):filename = 'baidu.html'with open(filename, 'wb') as f:f.write(response.body)
好的，现在在目录下运行下，看看有没有文件出来。
有了，打开文件看下。对吧，这里查到的IP也是我们在代码样例中的IP 。说明成功用上代理了。
使用代理总结下，需要修改及注意那些点。
嗯，主要注意这两个点。
其实还有一种设置代理的方法，我就先不写了，交给大家去研究吧。
进阶学习：