Saturday, January 29, 2011

Free PHP Rotten Tomatoes Scraping API

In continuation to my previous posts on PHP and ASP.NET IMDb Scraping API’s, I have designed one more movie information scraping API for RottenTomatoes.com. Rotten Tomatoes is one of the most popular movie review website which provides a comprehensive movie rating system involving critics from all the popular news networks.

This API includes scraping of lot of attributes from Rotten Tomatoes along with the functionality of scraping newly released DVD list. You can extend the PHP code on your own to include more functionality or drop a comment here and I’ll try to help you as much as I can.

Here is a list of all the attributes it scraps from Rotten Tomatoes page:

  1. TITLE
  2. YEAR
  3. POSTER
  4. ALL_CRITICS_PERCENTAGE
  5. ALL_CRITICS_AVERAGE_RATING
  6. ALL_CRITICS_COUNT
  7. USER_PERCENTAGE
  8. USER_AVERAGE_RATING
  9. USER_COUNT
  10. GENRES
  11. SYNOPSIS
  12. MPAA_RATING
  13. RUNTIME
  14. RELEASE_DATE
  15. BOX_OFFICE
  16. DIRECTORS
  17. WRITERS
  18. CAST
  19. REVIEWS

How to use this API:
Include the class file on your php page
include("rottentomatoes.php");
Instantiate the class and get the results in an array:
$rottentomatoes = new RottenTomatoes();
$movieArray = $rottentomatoes->getMovieInfo("Inception");

How to get the New DVD Release list:
$rottentomatoes = new RottenTomatoes();
$dvdArray = $rottentomatoes->getNewDvdReleases();

You can try this scraper on my lab page: http://lab.abhinayrathore.com/rottentomatoes/

Download PHP Source Code: http://lab.abhinayrathore.com/rottentomatoes/rottentomatoes2.htm

Example usage: http://lab.abhinayrathore.com/rottentomatoes/usage.txt

If you need some more additions or corrections to this API, please drop a comment here and I’ll try to include it asap.

16 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Abhinay you are a genius! I was thinking about asking you if there were any plans on covering Rotten Tomatoes as well, but I decided that it was asking for too much after the wonderful job you did on the IMDb Scraper. Will there be a compiled dll as well, for this release as well?

    Sincerely,

    Martien Oranje

    ReplyDelete
  3. I think I found a way to scrape big posters as well (400x600), I don't know if you find that interesting. Anyway here's how:

    Go to the movies' picture page:
    http://www.rottentomatoes.com/m//pictures/

    Search for this line (only the number after context is variable). This should give you a the first poster shown when browsing RT for posters.
    <img src="http://content[0-9].flixster.com/rtmovie/

    It might seem like I'm spamming, but I'm actively developing my application right now,

    Cheers,

    Martien

    ReplyDelete
  4. Do you think you could add the short snippets from critics that shows on each rt page?

    ReplyDelete
  5. Update:
    Added short critic reviews scraping feature.

    ReplyDelete
  6. Hi, you might have corrupted your source code last time you saved it, it has html and web characters. Specifically, '>' and '>', and '<' for '<'.

    Unfortunately, even after I removed/replaced these, it still failed, silently this time (no error, though I do have them enabled on my server).

    ReplyDelete
  7. I have a RSS feed of movie titles and show times at my local cinema. I need help using this feed as an input to your PHP script then calling it as a shell command from Geektool. Can you help?

    ReplyDelete
  8. Thanks! This could be very handy.

    One addition I might make is to handle searches that might return multiple results. The first result would be used by default, but the user could choose from other results.

    I would combine this with a refined google query. For example:

    site:RottenTomatoes.com "movie info" search_terms

    By searching for movie info in quotes, Google will return only the movie pages. In your current version, a search for Kevin Bacon doesn't work. With the refined search terms, Google returns a list of Kevin Bacon movies. This could allow the user to search for an actor, or perhaps movies with the word love in the title, and then choose from the list returned by Google.

    ReplyDelete
  9. thanx abhinay, is it possible to create asp.net c# version of this?

    ReplyDelete
  10. can u provide Movie image not url

    ReplyDelete
  11. Great library, thanks for this source, it helps learn how to write scrapers for any other website. Is it possible to do search and scrap all the results, or just one movie only?

    ReplyDelete
  12. Hi Abhinay,
    Ive seen lots of implementation rotten tomatoes api with wordpress. is it possible for blogspot?

    ReplyDelete
  13. Hi Great stuff Abhinay, but how can i scrape the poster and photos?

    ReplyDelete
  14. Is there a way to get full reviews instead of the default quotes ?
    Would love to know if there is.

    ReplyDelete
  15. Bummer, they changed the formatting so that the image url is no longer provided. Awesome work though!

    ReplyDelete
  16. how to get POSTER image and video

    ReplyDelete

Thanks a lot for your valuable comments :)