chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Thread Rating:
  • 0 Vote(s) - 0 Average


Lunar Data Mining Twitter with Tweepy filter_list
Author
Message
Data Mining Twitter with Tweepy #1
In between genetic algorithms and neural networks, I'm starting to do more crazy shit at work I thought I'd never be capable of. One of these topics is data mining, which is way easier than I thought it would be. Here I'll just go through the basics of mining twitter.

Getting API access

Our first step is registering our application. Log into Twitter and head here to register and get your keys, secrets and tokens. After creating your app, head to the "keys and access tokens" tab to get what you need to authenticate your app's usage.

First, get the Consumer Key and Consumer Secret and store them in a text file, we'll structure key storage later.
[Image: 20f46e4fca8d488fa250265917a0ef79.png]

Second, click the "Create Access Token" button at the bottom of the page and store your token and secret with the consumer key and secret.
[Image: ffcc65e050024dbe8aeae3976697c3e1.png]

We'll store the keys in a YAML file like so, since its syntax is easy and simple.
[Image: f59cf43c3978407cb58822ad2a0187b1.png]

Before anything else, let's fetch our keys from that YAML file. Copying the below script should do the trick.
Code:
from yaml import safe_load

def get_keys():
   _dict=safe_load(open('auth.yml','r').read())
   return [_dict["consumer"]["key"],_dict["consumer"]["secret"],_dict["access"]["token"],_dict["access"]["secret"]]
This will read and parse the file into a dictionary, and we return its values as a list.

Onto the API. First, we need to install Tweepy. Python being convenient as ever, just run pip install tweepy. After tweepy finishes installing, we'll get started with authenticating and fetching tweets.

Authenticating with Tweepy

OAuth and caching your keys is a piece of cake with this module. Just do the following, making sure to include the above key-fetching code in the file
Code:
# we'll be using the cursor later, so you don't
# need to import it yet
from tweepy import API,Cursor,OAuthHandler

# wrap setting with method for later adaptability
def set_auth(con_key,con_secret,acc_token,acc_secret):
   # declare auth and api as global variables
   # so they can be reached elsewhere
   global auth,api

   # create the authentication handler and
   # set the access tokens
   auth=OAuthHandler(con_key,con_secret)
   auth.set_access_token(acc_token,acc_secret)

   # initialize an API endpoint using the
   # authentication object
   api=API(auth)

Getting Tweets (i)

After running the previous two snippets together (shown below), we can point the Tweepy Cursor to our account's timeline and get our 10 latest tweets like so.

Code:
from tweepy import API,Cursor,OAuthHandler
from yaml import safe_load

# define methods here
...

# use a star operator on the return of get_keys() to use the array
# elements as different arguments when authenticating
set_auth(*get_keys())

# for the last 10 tweets in the authenticated account's timeline...
for tweet in Cursor(api.home_timeline).items(10):
   # print out the author and content
   print("@%s (%s):\n%s\n"%(tweet.user.screen_name,tweet.user.name,tweet.text))
(we could put this in a method, but it's not totally necessary at this point)

Quick note: we might run into problems with unicode, so you may want to add the following to sanitize anything you're going to print:
Code:
# unicode sanitization
uni=lambda obj: str(obj).encode('iso-8859-1',errors='backslashreplace').decode('iso-8859-1')

# example
print(uni(unicode_string_variable))

Et voila! Here's my output
Spoiler:
Code:
@Hai (Hai Lam):
I'm crying... https://t.co/WchBTuDq3B

@MetalHammer (Metal Hammer):
Joe Giron was @Pantera\u2019s official photographer. He opens up his photo archive for us: https://t.co/Slqe4EyOZg https://t.co/NRJqsE0XY3

@StatsBritain (Stats Britain):
RT @SBWritersRoom: When ur whole squad gets stuck in nets https://t.co/n5k8jbaXym

@VICE (VICE):
This newspaper is written by refugees, for refugees https://t.co/KEXYLyVHr7

@GWR (GuinnessWorldRecords):
Watch X Games BMX veteran Kevin Robinson smash Bicycle Backflip record https://t.co/hpSPa6PZv9 @xgames @KRobBMX https://t.co/7viMYtE9gk

@NASA (NASA):
RT @MarsCuriosity: Join me on #Mars amid Murray Buttes. #360video best on PC or @YouTube app https://t.co/yhP65Vl23u #JourneyToMars https:/\u2026

@MetalHammer (Metal Hammer):
.@MotleyCrue frontman Vince Neil on drugs, rehab and death https://t.co/a9DybJ5yum https://t.co/3fHbO0tkGF

@VICE (VICE):
Your next Uber driver might be a robot: https://t.co/ePZ0YK5mqy

@ThisDayInMETAL (THIS DAY IN METAL):
RT @litaford: Just a typical week at @TheWhiskyAGoGo in 1977. https://t.co/yyV9bn2N4e

@BlackSabbath (BlackSabbath):
Final bow tonight in Bristow
Video: @KellyOsbourne https://t.co/fbZP2LwHvf

Getting Tweets (ii)

One of my favorite things about this module is its capacity to keep the connection open as a stream and keep the tweets coming. Plus, we can track pretty much whatever we want.

The first step is to create our Stream class, so Tweepy knows what to do with incoming data.
Code:
from tweepy.streaming import StreamListener   # base Stream class
from tweepy import API,OAuthHandler,Stream    # we can get rid of the Cursor class, so don't import it
from json import loads

class MyStream(StreamListener):
   def on_data(self,data):
       # parse the data from JSON to a dictionary
       tweet=loads(data)

       # print out what we did before
       print("@%s (%s):\n%s\n"%(tweet["user"]["screen_name"],uni(tweet["user"]["name"]),unt(tweet["text"])))

       # return True to validate the data
       return True

   def on_error(self,status):
       print("Error on_data: "+status)
       return True

And now we can run the stream!
Code:
# start the stream with our credentials
stream=Stream(auth,MyStream())
# filter by @users, #hastags, or topics
stream.filter(track=['#python','Programming','@GolangBestLang'])

And there you have it! I'll do another tutorial (hopefully before I head out of town) on storing your data then analyzing it with Vega and Vincent.

[+] 1 user Likes Inori's post
Reply

RE: Data Mining Twitter with Tweepy #2
No Pseudo code, literally unplayable


Reply

RE: Data Mining Twitter with Tweepy #3
(08-22-2016, 05:08 AM)Killpot Wrote: No Pseudo code, literally unplayable

I was thinking about adding pseudocode, but it's hard to figure out when learning to use an api. Obviously, when teaching this, I'll be able to go 1 on 1 with the students and explain everything in more detail.

Reply






Users browsing this thread: 1 Guest(s)