Retrotag Your Weblog - Tagging with the Yahoo! and Tagyu APIs (part 1)

Published in Application Programming Interfaces on Wednesday, November 23rd, 2005

Many people had blogs prior to the onset of tagging content. Like many others, I will be implementing the use of tags and would like to 1. retro tag old posts and 2. come up with a decent set of tags to use in the process. Here, in a two part series, we look at two API driven solutions for building a set of tags and getting past content up to speed.

Part two will be delayed until tomorrow (the 25th), as there seems to be some weird error cropping up for only that post. Strange but true, perhaps in the light of day tomorrow the problem resolves itself :-)

Preamble

A little twist for this API review. After the last three API posts ran a little long, I thought I'd break this up into two: today the general ideas, and tomorrow the code and the breakdown. Enjoy!

The tag generation

It has been proclaimed that Everyone Must Have Tags!. It took me a bit before I really felt that tags would be more useful then, say, multiple categories. However, when I finally thought of tags from a meta-keyword standpoint, and the idea of people using tags for discovery, it became clear to me that they could be useful on Fiftyfoureleven, by providing more meta-data and greater flexibility than simply having the categories.

The hurdle to implementing became the old content: over 250 resources and 110 posts. Though not many by some standards, nonetheless retrofitting tags to my old posts was going to take some time. Fortunately there are two API powered solutions out there that can help me get things done a bit quicker:

  1. The Yahoo! term extraction API
  2. Tagyu - Tag suggestions for content

The same, but different

The APIs being offered by Yahoo! and Tagyu are somewhat the same, but the services are fundamentally different.

The same

From the API viewpoint, this instalment branches out from the comfort of sending GET based requests, to using cURL to send POST requests. This in itself not too difficult, but many folks who use PHP for building dynamic websites likely may not have needed to dive into cURL.

The difference

The two services that we will be using to build our tagset and tag up our posts are not the same. Yahoo! provides a term extraction service, looking at the content and extracting what it deems as key terms.

Tagyu, on the other hand, is a purpose built tag suggestion tool. The downside of Tagyu is that it's not quite ready for free public usage - not in the sense of its tag suggestions, which are quite good, but at the time of writing you can only ping the Tagyu server once a minute from the same IP address. Developer accounts, available sometime in the future, will eventually allow up to 1000 queries a day with no rate restrictions.

Making it happen

The idea is quite simple. Start by retrieving the data for an entry from your websites' database, send that data as a post to the API service, receive the XML response, unserialize and store it in a database along with the respective post id.

While "good on paper", I realized that some serious thinning of the suggested tags would be necessary in order to get the job done right. This more so in the case of Yahoo!, which suggested upwards of 2400 terms, compared to Tagyu which responded with about 240.

So in the end, the process involved the following:

  1. Select from my DB all of the content that I want tagged.
  2. Build a request and send it to the API server for each piece of content.
  3. Take the returned data, unserialize it, and store the suggested tag values along with the relevent post id in a table - this will eventually be our lookup table for post and tag ids.
  4. Query the data from the lookup table for distinct tag values, and store these in another table - this will be our main tag table.
  5. Work thru the tag table, deleting suggestions that you feel aren't necessary. This hand edit is quite useful and can be taxing. 2400 entries in the Yahoo! suggestions!
  6. Once you have settled on your tagset, run it against your lookup table, deleting tags that you no longer want to use, and then in the end replace the tag values with their respective tag ids.

The results

Having run the above process for both APIs, its clear that the Tagyu data is much better suited to auto-tagging of posts. The Yahoo! term suggestion did offer many of the same suggestions, unfortunately it also offered many, many more. In the end it was too much noise.

At the moment tags haven't been enabled on the site. Posts are tagged, however there remains other updates to finish before they go live.

The code

Stay tuned tomorrow for the rundown on working with the APIs outlined above. In the meantime (because we all have soo much spare time), you can get caught up on cURL here (10 minute read) and here (15 minute read). Glossing over one of those articles will definitely give you a head start on the second part of this article.

Comments and Feedback

Money! Been looking for an automated service to retrotag for a long time. Hurry up with that tutorial, will ya?

Haha, it's on the way. One more sleep :-)

Great stuff Mike- you test out these developer gadgets so we don't have to endure the sucky ones :)

Would comparing results from both sources and only including matches produce a more concise tag set or would it merely blow out response/processing time? Hmmm...

Andrew, interesting idea.

I had to repeat it to myself three or four times as my flu-foggy brain took awhile to catch on :-)

I'll suggest that in the next article, as I don't have time to try it out myself. In addition, I think some people will be more liberal with tags than others, so in the end it takes some playing around with to get it right anyways...

Hey all, part two will be a day late, as it seems that there is a weird 500 error cropping up for only that post. Not quite sure why, works on my local copy of the site, but not up here... Sorry for the delay!

A day late? I want my pesetas back.

Great info. Thanks!

Tom

Home » Blog » Web Development » Programming and Scripts » Application Programming Interfaces

Check out the blog categories for older content

The latest from my personal website,
Mike Papageorge.com

SiteUptime Web Site Monitoring Service

Sitepoint's web devlopment books have helped me out on many occasions both for finding a quick solution to a problem but also to level out my knowlegde in weaker areas (JavaScript, I'm looking at you!). I am recommending the following titles from my bookshelf:

The Principles Of Successful Freelancing

I started freelancing by diving in head first and getting on with it. Many years and a lot of experience later I was still able to take away some gems from this book, and there are plenty I wish I had thought of beforehand. If you are new to freelancing and have a lot of questions (or maybe don't know what questions to ask!) do yourself a favor and at least check out the sample chapters.

The Art & Science Of JavaScript

The author line-up for this book says it all. 7 excellent developers show you how to get your JavaScript coding up to speed with 7 chapters of great theory, code and examples. Metaprogramming with JavaScript (chapter 5 from Dan Webb) really helped me iron out some things I was missing about JavaScript. That said each chapter really helped me to develop my JavaScript skills beyond simple Ajax calls and html insertion with libs like JQuery.

The PHP Anthology: 101 Essential Tips, Tricks & Hacks

Like the other books listed here, this provides a great reference for the PHP developer looking to have the right answers from the right people at their fingertips. I tend to pull this off the shelf when I need to delve into new territory and usually find a workable solution to keep development moving. This only needs to happen once and you recoup the price of the book in time saved from having to develop the solution or find the right pattern for getting the job done..