Posts Tagged ‘Berkman Center’

Body Text Extraction

Thursday, May 6th, 2010

A while back, when I was a Berkman intern, fellow Berktern Brian Young and I spent an afternoon modifying a Python script called BTE (Body Text Extraction). The script is an automated way to pull out the principal portion of text (the body) of an HTML document, and works by finding the portion of the document that has the highest ratio of text to tags. At the time I was interested in using BTE in a web application to do real time body extraction, which meant I needed something that was fast. BTE wasn’t quite fast enough, so Brian and I made it faster (for the nerdy among you, we improved BTE from O(n3) to O(n2)).

So why am I posting this now? Well Brian and I contacted BTE’s author, Aidan Finn, regarding the changes, and Aidan has recently incorporated our changes into the official code.

Big thanks go to Brian. I couldn’t have done the coding myself (I didn’t and still don’t know Python,) and while at the end we weren’t sure who did what, I’m sure Brian’s 1337 computer hacking skills were far greater than mine.

More information on BTE can be found here, and the code can be found at GitHub.

If All The World Is A Stage Then I Have Front Row Seats

Thursday, July 2nd, 2009

Of all the changes that have occurred since I started interning at Berkman, the most profound has to be my outlook on media and the news. Having lived in rural Upstate New York for most of my life, with a brief stint in rural Virginia, I am use to the news reporting on a distinct and distant world. I would watch the news and I would be interested, but rarely would I find that my personal life or the lives of those around me would be directly impacted by what I had watched. If effects were felt, they would be weeks or months after the fact, which lead to the feeling that there was no correlation between the world that I lived in and the world that the news reported on. It was as if the world that the news reported on was contained entirely in the glass tube of my television.

Now I have moved to Boston, and everything has changed. Berkman is a place where one is enveloped in news. Some of the world’s brightest minds are here researching the spread of news on Internet services such as Twitter, cataloging news articles from all over the world, and figuring out how to make as much quality and free (as in speech) news accessible to as many people as possible. All of this makes Berkman a great place to be when news hits. Discovering the news becomes more than just watching CNN or cruising Google News, it becomes observing the environment around me. The center develops a controlled frenzy; academic conversations occur just minutes after a story breaks and planning begins on blog posts, reports, and research projects. In short, tangible changes occur around me. Hell, even my day to day life has been affected by the news.

The changes go beyond how the news affects my world, as I have also seen my world affect the news.  I have seen an Internet expert speak on national television just hours after attending a presentation he gave. I’ve had the privilege of witnessing the release of a report (along with its on-the-fly media campaign,) responding to the media’s obsession with Twitter and current events. And pretty much any time there is a breaking story, I can expect to see articles from Berkman fellows and staff analyzing the story or the spread of the story.

All in all, I feel like I have gone from the nose bleeds to the front row of this production known as “The News.” I don’t think I paid for front row tickets; I must have snuck up here. I’ll have to keep my eyes open for the ushers.

The First Month

Monday, June 29th, 2009

It’s the end of June, which means it is the end of my first month of working on TermsWatch. So what has happened in this first month?

Enter the EFF

The first few days of the month were spent settling in at Berkman and getting my new(ish) laptop ready for work. By Thursday I was ready to get down to business. Well, it just so happened that Thursday was launch day for the EFF‘s latest project, TOSBack. I spent Thursday afternoon playing around with TOSBack and finding out as much as I could about it. I then spent the next few days running around like a chicken with its head cut off; TOSBack does half of what I was going to do. I thought to myself, “well what do I do now?”

Fortunately there are cooler heads than I at Berkman, and they decided it was best to give the EFF a call. We had a few phone calls with Tim Jones, Activism & Technology Manager at the EFF and the man behind TOSBack. Turns out Berkman and the EFF have a lot of similar hopes and dreams regarding a service such as TOSBack. It also turns out that Tim was about to go on vacation for the summer, so nobody was going to be working on the project for a couple of months. All in all, this turned out to be a great opportunity for everybody involved and it was agreed that I would spend the summer working on the TOSBack code.

Symfony & Text Extraction

With the TOSBack code in hand I went to work. The first order of business was to port TOSBack over to Symfony, a web application framework. A framework such as Symfony has several advantages, including taking care of some tedious aspects of creating an application, such as checking input for security issues and generating administration pages. All in all this was a fairly painless process.

The latest, and current, problem that I have been tackling is how to extract the important information from a web page. Fortunately, as is becoming a common occurrence this summer, it turns out there are quite a few bright people in our small area who have done work like this, and they are all easy to talk to. I’ve spent nearly a week talking to these bright people, gaining insight into various approaches and understanding exactly what it is I need to do. I have a pretty good idea what it is I am going to do now (for those interested in the technical stuff, check out this summary of the extractor,) and with a new week on the horizon, I hope to get this thing working quickly.

My First (Half) Week at Berkman

Wednesday, June 3rd, 2009

Its Wednesday night and I have just finished my third day of being a Berkman intern. The last three days have been a whirlwind compared to my previous ten months of lounging around in New Paltz. I’m not sure where to begin, so I’ll ask the random number generator to give me a number (4.) Now I’ll feed that number into my where-to-begin function, and wait a bit (it is a surprisingly inefficient function,) aaaaaannnnddd, ahh, my project.

I think I am in the strangest position of all the interns. While most (all?) of the other interns are working for existing projects, I am basically starting my own. This means I am pretty independent and self-directed. A few of the interns have asked me the question ‘Where will you be working this summer?’ To be honest, I can basically work wherever I want. I’ve already been scoping out my potential work areas, weighing their pros and cons, and I think I like my options.

As far as actual progress on the project goes, today was the first day that I got real work done. You might be able to consider yesterday work; I spent the day setting up my new laptop to be my development environment. But today was really the first day I did work on the project itself. That work included creating a rough draft of a design document and looking through some code I wrote back in February when I first thought of this program. I look forward to showing both to a number of people at the center, getting feedback, and refining both.

I’ll wrap this post up the same way I wrapped up my day, with my fellow interns. At the end of the day we had our first ‘intern hour,’ a time when all of the interns get together to talk, or to present, or to be presented to. I’m not going to lie, in past experiences I have found these sorts of “intern activities” to be boring and forced. This activity, however, was surprisingly engaging. We were given control over the discussion, and we took the that control and ran. We went all over the place: planning a discussion on Twitter, talking about music, movies, books, future Google killers, and what else we can do to benefit both our academic community and our physical community. It was a great discussion, and I think it was a great reflection on the group. While I have only known the other interns for a few days (which basically means I don’t know them at all) this discussion did remind me that this is a special group of very bright, very motivated, and very moral people.

I’m glad I’m here. I am going to enjoy this summer.

Interning at Harvard

Tuesday, May 19th, 2009

Yup, via a chain of events I have landed an internship at The Berkman Center, a center within the Harvard Law School. Starting June 1st I will be developing TermsWatch, a web service that will provide notification of updates to, and plain English explanations of, those Terms of Use and Terms of Service agreements (Terms) that every website and piece of software makes you consent to.

The whole thing started back in February when Facebook updated its Terms of Use. The update occurred on February 4th, but nobody noticed the changes until the 15th (keep in mind that Facebook has around 175 million active users.)

Furthermore, Facebook’s Terms of Use included an implied consent clause regarding changes. As many as 175 million users consented to the February 4th changes completely unaware that they were consenting to anything or that any change had occurred. This lack of notice presented an obvious problem, so I began to think about a program that would monitor Facebook’s Terms of Use and alert individuals when a change was detected.

Before I could begin working on the program Facebook implemented its new, democratic process for updating its Terms (now called the Statement of Rights and Responsibilities.) Satisfied that notice was now being given, I joined many other Facebook users in commenting on the proposed Rights and Responsibilities. While it was great that users were given a voice in the process, it also became clear that most of us (myself included) have no idea what a lot of the language means in the Statement of Rights and Responsibilities, or why it needs to be included. One particular clause angered and disturbed a lot of users by allowing Facebook the right to transfer and sublicense its ability to reproduce and modify users’ content. Fortunately, a number of individuals were able to explain why such a clause is required (so Facebook can allow third party applications to access and use its users’ data.) Still, it became obvious that the dense legalese of the Statement of Rights and Responsibilities is too difficult for most (aside from experts and professionals in law) to read and understand.

It was about this time that Google started the application process for its Summer of Code. At first I scanned through the list of project ideas related to technology and society (my emerging area of interest,) but after a while I thought it would be neat to work on a generalized version of the program I had thought of in February that would monitor a site’s (or software provider’s) Terms for changes. I also thought about the difficulty of reading Terms and decided that the program would be much more useful if it included a way for legal experts to attach plain English explanations to the Terms. With all this in mind I wrote an application for the Summer of Code.

All Summer of Code programs need to be written for one of the available mentoring organizations. Since this program appeared to be a perfect fit for it, I applied with the Berkman Center as my mentor. Since I was part of last year’s Summer of Code, I figured I would easily be part of this year’s, so I sat back and waited for the good word. Less than four days after the submission deadline I got the not so good word that my application was considered ineligible. But, no sooner had I found out about the ineligibility than I received an email from the Berkman Center; they loved the idea and they asked me to apply to their internship program and spend the summer in Cambridge. Well, I couldn’t say no to that, so I applied. The process went smoothly, I was accepted, and now I am starting to pack, because I have to move in less than two weeks.