Full Version: Simple Python Proxy Scraper
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Proxy Scraper Source Code
Have fun guys, this will give you a simple idea on how Python proxy scrapers work.
Hey man, looking pretty good.

Where do you get your website proxy lists from?
(06-03-2016, 05:23 PM)insidious15 Wrote: [ -> ]Hey man, looking pretty good.

Where do you get your website proxy lists from?
You can Google: "Proxy Lists".
Or you can also search for: "Proxy Sources".
Then you can use my link checker to check if the proxy sources are working.
You can find it on my GitHub page:
You can improve your for loop at the end. Instead of for x in range(len(somelist)) use
for item in list: and then refer to current item just as item. Here is a python talk that could help you get better at looping, also you can have config file with list of proxies, that would make it easier to add new ones.
My multithreading proxy scraper:
# -*- coding: utf-8 -*-
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
import re, os
from urllib2 import urlopen

Path = os.path.dirname(os.path.realpath(__file__))

with open(Path+'\\url.txt', 'r') as file:
    urls = file.readlines()

def parseproxy(url):
        source = urlopen(url).read()
        return None

    proxies = re.findall( r'[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\:[\d]{1,6}', source[5:], re.M|re.I)

    with open(Path+'\\proxy.txt', "a") as file:
        for proxy in proxies:

    print '[PARSED] - ', url.strip(), '['+str(len(proxies))+']'

pool = ThreadPool(100)
results =, urls)


example file url.txt: