chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Thread Rating:
  • 1 Vote(s) - 5 Average


How does google know my bot is a bot? filter_list
Author
Message
How does google know my bot is a bot? #1
So I've been working on a bot in python using mechanize and the main problem I'm having is that when my bot does a certain action that normally doesn't trigger a "mobile phone verification screen" when I'm doing it myself, it does with the bot.

I've setup random user agents for each request and I can't think of any other means, given that proxies wouldn't impact it if requests from my IP from my browser work fine. Anyone got any ideas?

Reply

RE: How does google know my bot is a bot? #2
It could have something to do with JavaScript not being executed by mechanize, as far as I know, mechanize can't parse JS.

[+] 1 user Likes TechSaavy's post
Reply

RE: How does google know my bot is a bot? #3
(06-12-2016, 01:14 PM)TechSaavy Wrote: It could have something to do with JavaScript not being executed by mechanize, as far as I know, mechanize can't parse JS.

Very interesting idea, I'll check that out. Maybe ill try with urllib if that has JS parsing.

Reply

RE: How does google know my bot is a bot? #4
Google does more user checking than a user agent alone. However, if it's something such as a search, just use Google's API. APIs exist for more than one Google service, in fact most of them, so if it's something that can use an API, in this case, it should.

As far as what else it does, I'm not sure. I myself have never been able to quite "crack the code" on what Google does. Perhaps it's worth a Google.

(06-12-2016, 01:23 PM)superMAUS Wrote:
(06-12-2016, 01:14 PM)TechSaavy Wrote: It could have something to do with JavaScript not being executed by mechanize, as far as I know, mechanize can't parse JS.

Very interesting idea, I'll check that out. Maybe ill try with urllib if that has JS parsing.

Also, no it doesn't have JavaScript parsing. However, I doubt this is what is causing Google to raise flags on your bot. If you feel the need to regardless, there is a solution, many in fact. There's an old module I found a while ago which actually enabled the parsing of it, but it required Node.JS to be available on the system. You could also use the QT python bindings for webkit, call the evaluateJavaScript function, but I'm not sure how much "coverage" that would offer, i.e. if it can actually be the same as using a browser to execute JavaScript rather than a bot. Webkit is the renderer, so I think it would be able to.
(This post was last modified: 06-12-2016, 01:30 PM by Stocking.)

[+] 1 user Likes Stocking's post
Reply

RE: How does google know my bot is a bot? #5
Selenium and mechanize are the real deal. I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).

[+] 1 user Likes m0dem's post
Reply

RE: How does google know my bot is a bot? #6
(06-25-2016, 02:40 PM)m0dem Wrote: Selenium and mechanize are the real deal.  I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).

Exactly this. Selenium is definitely the route you want to take, as you don't need to spoof anything like user agents; it runs the supplied browser and uses hooks to interact with whatever page you feed it. It's also has a super simple and robust API, so I'd jump right into it if I were you.

[+] 1 user Likes Inori's post
Reply

RE: How does google know my bot is a bot? #7
Most (if not all) major search engines have IP blacklists, so your proxies could be the problem. Try doing whatever you're doing without a proxy and see if it still gives you that mobile verification page. If it still does it, keep reading.

Try adding some extra headers to make it more believable that you're an actual browser. I conducted a test where I sent a request to a website with my normal browser and then sent a request with mechanize (the only header I changed was the user-agent). The top is the mechanize request, the bottom is my browser's request.

[Image: 85ab308ae4f44726852798caf48e635d.png]

Also, does the browser object accept and store cookies and send them with every request? You might want to implement that if you haven't already, unless it's automatic. I wouldn't know since I use the requests module for all of my shit.

[+] 1 user Likes meow's post
Reply

RE: How does google know my bot is a bot? #8
(06-25-2016, 02:40 PM)m0dem Wrote: Selenium and mechanize are the real deal.  I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).
Thanks, this sounds like a really good suggestion.

(06-25-2016, 06:01 PM)meow Wrote: Most (if not all) major search engines have IP blacklists, so your proxies could be the problem. Try doing whatever you're doing without a proxy and see if it still gives you that mobile verification page. If it still does it, keep reading.

Try adding some extra headers to make it more believable that you're an actual browser. I conducted a test where I sent a request to a website with my normal browser and then sent a request with mechanize (the only header I changed was the user-agent). The top is the mechanize request, the bottom is my browser's request.

[Image: 85ab308ae4f44726852798caf48e635d.png]

Also, does the browser object accept and store cookies and send them with every request? You might want to implement that if you haven't already, unless it's automatic. I wouldn't know since I use the requests module for all of my shit.

Cheers meow. I'll try out your suggestions. In terms of cookies, I think mechanize does already store them so I doub't that the issue.
(This post was last modified: 06-26-2016, 02:35 AM by superMAUS.)

Reply

RE: How does google know my bot is a bot? #9
(06-25-2016, 04:58 PM)Shinoa Wrote:
(06-25-2016, 02:40 PM)m0dem Wrote: Selenium and mechanize are the real deal.  I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).

Exactly this. Selenium is definitely the route you want to take, as you don't need to spoof anything like user agents; it runs the supplied browser and uses hooks to interact with whatever page you feed it. It's also has a super simple and robust API, so I'd jump right into it if I were you.

And also you can make it so the brower doesn't actually pop up onto your screen (if you don't want that). Google about a "headless" Selenium web driver thingy. Biggrin

Reply

RE: How does google know my bot is a bot? #10
There's no way you can stop Google from detecting your bot. It looks for "bot-like" actions and serves those users a captcha. You could even use your real browser, search a dork, and keep clicking next page and you'd be served a captcha after 20 or so pages. Bing doesn't rate limit a direct search where you parse html, but it has a limit on its API. Yandex limits both API and direct but the API has decent limits.

(06-12-2016, 01:26 PM)Stocking Wrote: Google does more user checking than a user agent alone. However, if it's something such as a search, just use Google's API. APIs exist for more than one Google service, in fact most of them, so if it's something that can use an API, in this case, it should.

As far as what else it does, I'm not sure. I myself have never been able to quite "crack the code" on what Google does. Perhaps it's worth a Google.

Google's search API is deprecated and it was rate limited anyway.

Reply






Users browsing this thread: 1 Guest(s)