Sinisterly

Full Version: Simple Python Proxy Scraper
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Proxy Scraper Source Code
Have fun guys, this will give you a simple idea on how Python proxy scrapers work.
Hey man, looking pretty good.

Where do you get your website proxy lists from?
(06-03-2016, 05:23 PM)insidious15 Wrote: [ -> ]Hey man, looking pretty good.

Where do you get your website proxy lists from?
You can Google: "Proxy Lists".
Or you can also search for: "Proxy Sources".
Then you can use my link checker to check if the proxy sources are working.
You can find it on my GitHub page: https://www.github.com/Undercore/
You can improve your for loop at the end. Instead of for x in range(len(somelist)) use
for item in list: and then refer to current item just as item. Here is a python talk that could help you get better at looping https://www.youtube.com/watch?v=EnSu9hHGq5o, also you can have config file with list of proxies, that would make it easier to add new ones.
My multithreading proxy scraper:
Code:
# -*- coding: utf-8 -*-
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
import re, os
from urllib2 import urlopen

Path = os.path.dirname(os.path.realpath(__file__))

with open(Path+'\\url.txt', 'r') as file:
    urls = file.readlines()
    file.close()

def parseproxy(url):
    try:
        source = urlopen(url).read()
    except:
        return None

    proxies = re.findall( r'[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}\:[\d]{1,6}', source[5:], re.M|re.I)

    with open(Path+'\\proxy.txt', "a") as file:
        for proxy in proxies:
            file.write(proxy+'\n')    
        file.close()

    print '[PARSED] - ', url.strip(), '['+str(len(proxies))+']'

pool = ThreadPool(100)
results = pool.map(parseproxy, urls)

pool.close()

pool.join()
example file url.txt:
Code:
http://best-proxy-list-ips.blogspot.com/feeds/posts/default?alt=rss
http://bestpremiumproxylist.blogspot.ru/feeds/posts/default?alt=rss