Punting on Feedzirra Dependencies

June 29, 2009 — Code

After watching Ryan Bates’ video on feed parsing I decided to give Feedzirra a try. Though I did eventually get the Feedzirra gem working, rounding up and installing the dependencies was painful enough that I decided not to use it.

If you’re having trouble installing Feedzirra I encourage you to evaluate whether you really need it. In one application I needed to parse and display a Twitter feed and did the following to accomplish most of what was covered in Railscast #168.

1. Install libxml-ruby

The libxml-ruby parser is quite fast. If you’re using REXML or hpricot to parse XML you should consider switching as the difference in speed is significant. To install, add the following to config/environment.rb in your Rails app:


config.gem "libxml-ruby", :lib => "xml"

and then run:


sudo rake gems:install

or if you wish to install manually:


sudo gem install libxml-ruby

The only dependency that might not be found in a standard *nix setup is the libxml development files, so you may need to install a package called libxml2-dev (on Ubuntu) or libxml-devel (on CentOS).

2. Create a Model For Your Feed

You need to create a model to store your feeds in the database, so use the Rails generator script:


script/generate model Tweet \
  guid:string url:string body:text published_at:datetime

and then add this method to the skeleton ActiveRecord model:


def self.update_from_feed

  # fetch XML document
  uri = URI.parse("http://twitter.com/...")
  res = Net::HTTP.start(uri.host, uri.port){ |http| http.get(uri.path) }
  
  # quit if non-2xx HTTP response
  return unless res.code[0...1] == "2"
  
  # parse and save
  doc = XML::Parser.string(res.body).parse
  doc.find("/rss/channel/item").each do |item|
    guid = item.find_first("guid").content
    unless exists? :guid => guid
      create!(
        :body         => item.find_first("description").content,
        :url          => item.find_first("link").content,
        :published_at => item.find_first("pubDate").content,
        :guid         => guid
      )
    end
  end
end

3. Try it out

Make sure to run the migration to create the tweets table:


rake db:migrate

then start a console session:


script/console

and fetch/parse/save your tweets:


Tweet.count             # should return 0
Tweet.update_from_feed  # fetches tweets
Tweet.count             # should return n > 0

Your tweets table should now be populated with the latest data.

Note that although libxml-ruby is fast, you should be aware of the payload you’re receiving on each update and think about how often you can afford to fetch and parse a multi-megabyte feed.


comments powered by Disqus