Levenshtein distance function for Pig and Hadoop

Need to do a whole slew of fuzzy string comparisons?  Have a Hadoop cluster at your disposal?  Use the above gist to give you a Levenshtein distance function that you can use within Pig.

 

Loading mentions Retweet

Comments [0]

Measuring @twitterapi; per-tweet "delivery time" is approx. 16 ms

Assuming a model of

delivery time = constant time + (number tweets returned * time per tweet)

and based on data gathered from polling statuses/home_timeline from "similar" users at 15 second intervals both using since_id and not using since_id, the computed median per tweet "delivery time" by the Twitter API (excluding constant processing done on every request) is 16.425 ms.  The range from the second and third quartiles is 15.400-17.638 ms.  I'm defining "per tweet delivery time" as the time to do any per-tweet processing and the time to send it down the wire to the client.

Read the rest of this post »

Loading mentions Retweet

Comments [0]

One year on @twitterapi

One year ago today I posted this tweet.

Loading mentions Retweet

Comments [1]

To add to my list of things to build in my copious spare time

Racer = RC car + wireless video and actuation control + lots of cardboard + video game cabinet.  Final product is an awesome recreation of a lo-fi video game.

Loading mentions Retweet

Comments [0]

Marin Century 2010

Rode the Marin Century this weekend - 6800' - 7000' feet of hills in damp cold weather.  Loved it, except I don't like climbing in massive groups.  Next year, the Mount Tam version is on my calendar.

Loading mentions Retweet

Comments [0]

iOS4 jailbreaking

I jailbroke my iPhone 4 running iOS 4.0.1 using jailbreakme last night.  And then I finally installed the apps to make my phone useful again -- from the photo:

  • Intelliscreen - that gives me a lock screen that is actually useful!  It displays my calendar and the current weather right when i turn it on as well as give me some useful icons in the top bar;
  • MyWi - Turns on the ability to do tethering on your iPhone. I don't like doing tethering over WiFi to my phone (as some apps like PDANet will do), but MyWi will allow you to tether the "correct" way over the USB cable;
  • Five Icon Dock - to get one more app down there; and
  • Notified - logs all my notifications to give me historical access to those pop ups instead of the ephemeral way they are displayed today.
Loading mentions Retweet

Comments [0]

New cycling goal: Berkeley Hills Death Ride

I tried to ride the Berkeley Hills Death Ride last weekend - in a word: painful.  I only got through the first four of six hills (although, hill six requires some fire trailing - probably not the best on skinny tires).  It's now my "once a month" ride until I finish it.

Loading mentions Retweet

Comments [0]

Getting out and about for @twitterapi

I've been talking and speaking at a few places recently about the @twitterapi, and I figure its time to update some presentations.  Last week @themattharris and I spoke at The Hacker Dojo covering some of the latest features that we've put out, as well as answering a bunch of questions around basic-auth shutdown.  It was a blast!  Great energy down there.  @themattharris put all his slides up so you can read them too:

Read the rest of this post »

Loading mentions Retweet

Comments [3]

OAuth 1.0a test strings for the taking - get them while they're hot

Say, hypothetically, you were re-factoring a lot of code that verifies OAuth 1.0a signatures.  You would probably want a comprehensive list of strings to test against, no?  @episod and @danadanger created the OAuth UTF-8 character map, and I adapted that into a YML file of just the test strings.  Hope you find it useful.  I'm bashing some code against it right now.  You get some stuff that looks like the following:

--- 
tests: 
  inputs: 
  - 23456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abc
  - !binary |
    ZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXp7fH1+f8KAwoHCgsKDwoTChcKGwofC
    iMKJworCi8KMwo3CjsKPwpDCkcKSwpPClMKV

  - !binary |
    wpbCl8KYwpnCmsKbwpzCncKewp/CoMKhwqLCo8KkwqXCpsKnwqjCqcKqwqvC
    rMKtwq7Cr8KwwrHCssKzwrTCtcK2wrfCuMK5wrrCu8K8wr3CvsK/w4DDgcOC
    w4PDhMOFw4bDhw==

 

Loading mentions Retweet

Comments [0]

Delegated identity verification for uploadAndPost (OAuth Echo for uploadAndPost)

OAuth Echo was born out of the need to delegate identity verification — it's answering the question "how do you prove your identity to a third party service?".  This is becoming really important given that @twitterapi is shutting down basic auth pretty soon.  However, we just focused on getting the upload use case working (based on TwitPic API).  We, unfortunately, didn't focus on uploadAndPost at all.  The nice thing about uploadAndPost is that it is "fire and forget" API.  Processing the uploaded item may take some time, and with the API it can be done asynchronously from the Twitter client application.  So, it's time to specify how uploadAndPost will work.
 
(Note: this post is going to be specifically for Twitter, however, I am generalizing this enough so that it can be used by other services easily.  I'll post a more generic specification at a future date).
 
As per my previous posts on OAuth Echo, there are four parties involved:
  • the User who is using Twitter through a particular Twitter application (and, presumably, has already OAuthorized Twitter through that application);
  • the Consumer, or the Twitter application that is attempting to interact with the 3rd party media provider (e.g. the photo sharing site)
  • the Delegator, or the 3rd party media provider; and
  • the Service Provider a.k.a. Twitter.
The challenge with uploadAndPost is that it requires the text that is being posted from the Consumer be mutated on its way to the Service Provider:
  • the user wants to send a photo up to a media provider along with a "caption" for it;
  • the Delegator will store the photo, then generate a URL with it; and
  • the Delegator will send up to the Service Provider the caption with the URL appended to it.
 
However, the Consumer has a problem because it is charged with generating the signature, but it doesn't know the final caption for the signature because it doesn't know the URL.  This means Twitter is going to need to do some work.
 
So, here it goes: User wants to upload a photo.  The Consumer is going to call uploadAndPost on the Delegator with a POST.  The POST should contain the image, but it should also contain two or three additional items as headers or as POST parameters (I apologise for the multiple names for some of the parameters — I'm attempting to clean up, in retrospect, the OAuth Echo parameters for upload):
  • X-OAuth-Endpoint or X-Auth-Service-Provider (if using POST parameters, then x_oauth_endpoint or x_auth_service_provider) — effectively, this is the endpoint to call on Twitter.  For example, this could be http://api.twitter.com/1/status/update.json;
  • X-OAuth-Payload (x_oauth_payload) — optional, but needed if parameters need to be passed to the X-OAuth-Endpoint.  These parameters should be form url encoded;
  • X-OAuth-Authorization or X-Verify-Credentials-Authorization (x_oauth_authorization or x_verify_credentials_authorization) — Consumer should create all the OAuth parameters necessary so as to be able to make an OAuth request to X-OAuth-Endpoint with the X-OAuth-Payload.  Effectively, if the Consumer were calling Twitter directly, then this would be the OAuth header that it would send along (e.g. it should look like 'OAuth oauth_consumer_key="...", oauth_token="...", oauth_signature_method="...", oauth_signature="...", oauth_timestamp="...", oauth_nonce="...", oauth_version="..."')
 
The Delegator, at this point, presumably, needs to generate a URL for the media that has been uploaded (whether or not it stores it at this point, is left up to the implementor).  Unlike the upload case, the way the Delegator is presumably going to verify that the media should be stored is whether or not calling the X-OAuth-Endpoint succeeds (if the call is denied due to bad credentials or for any other reason the call may have been denied, the Delegator will, presumably, drop the image).
 
To continue the call chain, the Delegator should call what was specified in X-OAuth-Endpoint (presumably, it should verify that it is a Twitter.com endpoint and it is not simply being asked to call arbitrary URLs on the web), and it should take the value passed in X-OAuth-Authorization to use in its Authorization header.  To specify the URL to append to the message, it should also include a X-OAuth-Append-Payload header and Twitter will append that message to the end of the status text in the case of calling http://api.twitter.com/1/status/update.json.
 
This is not yet implemented on the Twitter side, but it will hopefully be soon.  I'm definitely soliciting feedback.
Loading mentions Retweet

Comments [6]

About