Skip to content

Scrapy – Web Crawling with a Proxy Network

October 29, 2013

I have been using Scrapy for a couple of weeks now. It wasn’t giving me any sort of errors. The day I changed my system proxy, it showed up an error, something like this:

proxy error

So when some error like this shoots up, you know its because of a manual proxy setting.
Scrapy provides a simple solution to this:

  • A new python script to be added.
  • Editing to be done in the

1. Go into your project directory (lets say /home/you/Documents/Project/sample).

2. Create a file and add the following code:

# Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
import base64

# Start your middleware class
class ProxyMiddleware(object):
    # overwrite process request
    def process_request(self, request, spider):
        # Set the location of the proxy
        request.meta['proxy'] = "http://YOUR_PROXY:PORT"

        # Use the following lines if your proxy requires authentication
        proxy_user_pass = "USERNAME:PASSWORD"
        # setup basic authentication for the proxy
        encoded_user_pass = base64.encodestring(proxy_user_pass)
        request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

3. Add the following lines in your script:

    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
    'sample.middlewares.ProxyMiddleware': 100,

and try crawling with the same spider again. You’ll find that it works now šŸ˜€ .

Note:- You must make changes to the above mentioned code depending on your Project name and Proxy.


From → Linux

  1. andress permalink

    thanks for this!

  2. @andress: glad that it helped you.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: