Archive for May, 2010

Hey Mark, Stop Acting Like An Asshole

Tuesday, May 25th, 2010

There are a lot of aspects of the recent Facebook privacy debacle that evoke hostility towards the website. A shift in privacy context and the assumed consent of users likely the top list for most. However, I believe that Mark Zuckerberg and company are doing a lot to take would could have been a heated public debate about the nature of privacy online and turning it into a national flamewar. What are they doing exactly? They are acting like condescending assholes.

(more…)

Body Text Extraction

Thursday, May 6th, 2010

A while back, when I was a Berkman intern, fellow Berktern Brian Young and I spent an afternoon modifying a Python script called BTE (Body Text Extraction). The script is an automated way to pull out the principal portion of text (the body) of an HTML document, and works by finding the portion of the document that has the highest ratio of text to tags. At the time I was interested in using BTE in a web application to do real time body extraction, which meant I needed something that was fast. BTE wasn’t quite fast enough, so Brian and I made it faster (for the nerdy among you, we improved BTE from O(n3) to O(n2)).

So why am I posting this now? Well Brian and I contacted BTE’s author, Aidan Finn, regarding the changes, and Aidan has recently incorporated our changes into the official code.

Big thanks go to Brian. I couldn’t have done the coding myself (I didn’t and still don’t know Python,) and while at the end we weren’t sure who did what, I’m sure Brian’s 1337 computer hacking skills were far greater than mine.

More information on BTE can be found here, and the code can be found at GitHub.