I wanted to play around with some tweets, but I quickly discovered that getting a hand on a corpus is not that easy because of Twitter terms of service. It is up to every one to create their own corpus.
Register a twitter application on https://dev.twitter.com/terms/api-terms. Application name is not important, you only want to get its credentials.
Download the last version of twitter-sampler.
Download credentials.clj and fill in the blanks with the credentials of your application.
Run the following command:
java -jar twitter-sampler-1.0.0-SNAPSHOT-standalone.jar -c credentials.clj -n 1000 tweets.json
credentials.clj is the file containing your credentials,
1000 is the number of tweets you want to download and
is the file where the tweets should be saved.
You should now have a corpus of tweets to play with.