Levenshtein distance function for Pig and Hadoop
Need to do a whole slew of fuzzy string comparisons? Have a Hadoop cluster at your disposal? Use the above gist to give you a Levenshtein distance function that you can use within Pig.
Comments [0]
Need to do a whole slew of fuzzy string comparisons? Have a Hadoop cluster at your disposal? Use the above gist to give you a Levenshtein distance function that you can use within Pig.
Comments [0]
Assuming a model of
delivery time = constant time + (number tweets returned * time per tweet)
and based on data gathered from polling statuses/home_timeline from "similar" users at 15 second intervals both using since_id and not using since_id, the computed median per tweet "delivery time" by the Twitter API (excluding constant processing done on every request) is 16.425 ms. The range from the second and third quartiles is 15.400-17.638 ms. I'm defining "per tweet delivery time" as the time to do any per-tweet processing and the time to send it down the wire to the client.
Comments [0]
Racer = RC car + wireless video and actuation control + lots of cardboard + video game cabinet. Final product is an awesome recreation of a lo-fi video game.
Comments [0]
Comments [0]
I jailbroke my iPhone 4 running iOS 4.0.1 using jailbreakme last night. And then I finally installed the apps to make my phone useful again -- from the photo:
Comments [0]
Comments [0]
I've been talking and speaking at a few places recently about the @twitterapi, and I figure its time to update some presentations. Last week @themattharris and I spoke at The Hacker Dojo covering some of the latest features that we've put out, as well as answering a bunch of questions around basic-auth shutdown. It was a blast! Great energy down there. @themattharris put all his slides up so you can read them too:
Comments [3]
Say, hypothetically, you were re-factoring a lot of code that verifies OAuth 1.0a signatures. You would probably want a comprehensive list of strings to test against, no? @episod and @danadanger created the OAuth UTF-8 character map, and I adapted that into a YML file of just the test strings. Hope you find it useful. I'm bashing some code against it right now. You get some stuff that looks like the following:
---
tests:
inputs:
- 23456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abc
- !binary |
ZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXp7fH1+f8KAwoHCgsKDwoTChcKGwofC
iMKJworCi8KMwo3CjsKPwpDCkcKSwpPClMKV
- !binary |
wpbCl8KYwpnCmsKbwpzCncKewp/CoMKhwqLCo8KkwqXCpsKnwqjCqcKqwqvC
rMKtwq7Cr8KwwrHCssKzwrTCtcK2wrfCuMK5wrrCu8K8wr3CvsK/w4DDgcOC
w4PDhMOFw4bDhw==
Comments [0]
Comments [6]