Sinisterly
How does google know my bot is a bot? - Printable Version

+- Sinisterly (https://sinister.ly)
+-- Forum: Coding (https://sinister.ly/Forum-Coding)
+--- Forum: Python (https://sinister.ly/Forum-Python)
+--- Thread: How does google know my bot is a bot? (/Thread-How-does-google-know-my-bot-is-a-bot)

Pages: 1 2 3


How does google know my bot is a bot? - superMAUS - 06-12-2016

So I've been working on a bot in python using mechanize and the main problem I'm having is that when my bot does a certain action that normally doesn't trigger a "mobile phone verification screen" when I'm doing it myself, it does with the bot.

I've setup random user agents for each request and I can't think of any other means, given that proxies wouldn't impact it if requests from my IP from my browser work fine. Anyone got any ideas?


RE: How does google know my bot is a bot? - TechSaavy - 06-12-2016

It could have something to do with JavaScript not being executed by mechanize, as far as I know, mechanize can't parse JS.


RE: How does google know my bot is a bot? - superMAUS - 06-12-2016

(06-12-2016, 01:14 PM)TechSaavy Wrote: It could have something to do with JavaScript not being executed by mechanize, as far as I know, mechanize can't parse JS.

Very interesting idea, I'll check that out. Maybe ill try with urllib if that has JS parsing.


RE: How does google know my bot is a bot? - Stocking - 06-12-2016

Google does more user checking than a user agent alone. However, if it's something such as a search, just use Google's API. APIs exist for more than one Google service, in fact most of them, so if it's something that can use an API, in this case, it should.

As far as what else it does, I'm not sure. I myself have never been able to quite "crack the code" on what Google does. Perhaps it's worth a Google.

(06-12-2016, 01:23 PM)superMAUS Wrote:
(06-12-2016, 01:14 PM)TechSaavy Wrote: It could have something to do with JavaScript not being executed by mechanize, as far as I know, mechanize can't parse JS.

Very interesting idea, I'll check that out. Maybe ill try with urllib if that has JS parsing.

Also, no it doesn't have JavaScript parsing. However, I doubt this is what is causing Google to raise flags on your bot. If you feel the need to regardless, there is a solution, many in fact. There's an old module I found a while ago which actually enabled the parsing of it, but it required Node.JS to be available on the system. You could also use the QT python bindings for webkit, call the evaluateJavaScript function, but I'm not sure how much "coverage" that would offer, i.e. if it can actually be the same as using a browser to execute JavaScript rather than a bot. Webkit is the renderer, so I think it would be able to.


RE: How does google know my bot is a bot? - m0dem - 06-25-2016

Selenium and mechanize are the real deal. I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).


RE: How does google know my bot is a bot? - Inori - 06-25-2016

(06-25-2016, 02:40 PM)m0dem Wrote: Selenium and mechanize are the real deal.  I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).

Exactly this. Selenium is definitely the route you want to take, as you don't need to spoof anything like user agents; it runs the supplied browser and uses hooks to interact with whatever page you feed it. It's also has a super simple and robust API, so I'd jump right into it if I were you.


RE: How does google know my bot is a bot? - meow - 06-25-2016

Most (if not all) major search engines have IP blacklists, so your proxies could be the problem. Try doing whatever you're doing without a proxy and see if it still gives you that mobile verification page. If it still does it, keep reading.

Try adding some extra headers to make it more believable that you're an actual browser. I conducted a test where I sent a request to a website with my normal browser and then sent a request with mechanize (the only header I changed was the user-agent). The top is the mechanize request, the bottom is my browser's request.

[Image: 85ab308ae4f44726852798caf48e635d.png]

Also, does the browser object accept and store cookies and send them with every request? You might want to implement that if you haven't already, unless it's automatic. I wouldn't know since I use the requests module for all of my shit.


RE: How does google know my bot is a bot? - superMAUS - 06-26-2016

(06-25-2016, 02:40 PM)m0dem Wrote: Selenium and mechanize are the real deal.  I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).
Thanks, this sounds like a really good suggestion.

(06-25-2016, 06:01 PM)meow Wrote: Most (if not all) major search engines have IP blacklists, so your proxies could be the problem. Try doing whatever you're doing without a proxy and see if it still gives you that mobile verification page. If it still does it, keep reading.

Try adding some extra headers to make it more believable that you're an actual browser. I conducted a test where I sent a request to a website with my normal browser and then sent a request with mechanize (the only header I changed was the user-agent). The top is the mechanize request, the bottom is my browser's request.

[Image: 85ab308ae4f44726852798caf48e635d.png]

Also, does the browser object accept and store cookies and send them with every request? You might want to implement that if you haven't already, unless it's automatic. I wouldn't know since I use the requests module for all of my shit.

Cheers meow. I'll try out your suggestions. In terms of cookies, I think mechanize does already store them so I doub't that the issue.


RE: How does google know my bot is a bot? - m0dem - 07-03-2016

(06-25-2016, 04:58 PM)Shinoa Wrote:
(06-25-2016, 02:40 PM)m0dem Wrote: Selenium and mechanize are the real deal.  I was programming a web bot and I went through the old-fashioned HTTP modules (urllib...), but I kept getting blocked by anti-bot stuff until I used Selenium webdriver (and some mechanize).

Exactly this. Selenium is definitely the route you want to take, as you don't need to spoof anything like user agents; it runs the supplied browser and uses hooks to interact with whatever page you feed it. It's also has a super simple and robust API, so I'd jump right into it if I were you.

And also you can make it so the brower doesn't actually pop up onto your screen (if you don't want that). Google about a "headless" Selenium web driver thingy. Biggrin


RE: How does google know my bot is a bot? - Eclipse - 07-03-2016

There's no way you can stop Google from detecting your bot. It looks for "bot-like" actions and serves those users a captcha. You could even use your real browser, search a dork, and keep clicking next page and you'd be served a captcha after 20 or so pages. Bing doesn't rate limit a direct search where you parse html, but it has a limit on its API. Yandex limits both API and direct but the API has decent limits.

(06-12-2016, 01:26 PM)Stocking Wrote: Google does more user checking than a user agent alone. However, if it's something such as a search, just use Google's API. APIs exist for more than one Google service, in fact most of them, so if it's something that can use an API, in this case, it should.

As far as what else it does, I'm not sure. I myself have never been able to quite "crack the code" on what Google does. Perhaps it's worth a Google.

Google's search API is deprecated and it was rate limited anyway.