Saturday, October 30, 2010

FREE! PHP IMDb Scraper/API for new IMDb Template


IMDb is undoubtedly the leading information source for media information and is the top target of web scraping for movie lovers around the world. Unfortunately IMDb does not provide an API to access its database so web scraping is the only resort for us. PHP being one of the most commonly used and powerful web development language enables easy web scraping with the power of PCRE (Perl Compatible Regular Expressions).

For my recent project on a Movie Catalog (http://movies.abhinayrathore.com), I needed a  IMDb scraper and found one built by Tyler Hall. His version was not robust enough to scrap all kind of movie pages so I extended it and made it more robust to support different type of titles, BUT recently IMDb changed its page template and most of the old scrapers stopped working including mine. So, I modified my scraper to accommodate the new template changes and considered it as my moral responsibility to contribute back to the developer community.

This new scraper is very robust and capable enough to handle a wide variety of new template modifications. Apart from the regular information it even goes deep to scan extra media images and release dates.

Click here for a Demo

Last Updated: Feb 1, 2014

Major changes in Feb 20, 2013 version:

  1. Now we use the combined information page to scrape the data. This page doesn't change quite often and we can get complete list of individual departments.
  2. Add a few more entities; producers, musicians, cinematographers, editors etc. Removed metascore information. Removed small poster url.
  3. You can now pass a second boolean parameter to the getMovieInfo() and getMovieInfoById() functions to disable the extra information. By default it is set to true and may slow down the scraping. If you don't need all the extra info like Storyline, Release Dates, Recommendations or Media Images, just pass false as second parameter to these methods. Example $movieArray = $imdb->getMovieInfo("The Godfather", false);.
  4. Information for individuals in the list of directors, cast, writers etc. is now in an associative array with key being the IMDb id of the individual.

UPDATE:
As some of you might have noticed, Google is preventing automated script access to its search result pages. I have created 2 search functions for Google and Bing so you can use whichever one works best for you. I have converted the code to use Bing as of now and will look for other alternatives if we run into some hurdles. Keep me updated if you have any better ideas :)

Here is a list of all the attributes it scraps from the IMDb page:

  1. TITLE_ID
  2. TITLE
  3. YEAR
  4. RATING
  5. GENRES
  6. STARS
  7. DIRECTORS
  8. WRITERS
  9. CAST
  10. PRODUCERS
  11. MUSICIANS
  12. CINEMATOGRAPHERS
  13. EDITORS
  14. ALSO_KNOWN_AS
  15. RELEASE_DATE
  16. RELEASE_DATES
  17. PLOT
  18. POSTER
  19. POSTER_LARGE
  20. RUNTIME
  21. TOP_250
  22. OSCARS
  23. AWARDS
  24. NOMINATIONS
  25. STORYLINE
  26. TAGLINE
  27. MEDIA_IMAGES
  28. MPAA_RATING
  29. VOTES
  30. RECOMMENDED_TITLES
  31. VIDEOS

How to use this PHP Scraper?
Include the class file on your php page
include("imdb.php");
Instantiate the class and get the results in an array:
$imdb = new Imdb();
$movieArray = $imdb->getMovieInfo("The Godfather");

You can try this scraper on my lab page: http://lab.abhinayrathore.com/imdb/

To download the PHP Source Code directly use this link: http://lab.abhinayrathore.com/imdb/imdb_php.htm

Fork it on GitHub: https://github.com/abhinayrathore/PHP-IMDb-Scraper

Example usage: http://lab.abhinayrathore.com/imdb/usage.htm

Proxy script for downloading or displaying Media images on your website: http://lab.abhinayrathore.com/imdb/imdbImage.txt

To implement you own IMDb Web Service API to return data in XML, JSON or JSONP format, use this script along with the API: http://lab.abhinayrathore.com/imdb/imdbWebService.htm

To implement IMDb.com's search suggestions on your website, please follow this post: http://web3o.blogspot.com/2011/10/imdb-search-suggestions-with-jquery.html

If you find any part of this scraper broken or incorrect, please drop a comment here and I’ll try to fix it as soon as possible.

IMDb has a leechers policy in place for media images. You may not be able to use the URL for some of the images to display on your website. As a workaround you can use a PHP Proxy to display or download those images. I’ve written a small proxy script to grab the images: http://lab.abhinayrathore.com/imdb/imdbImage.txt. To use this script you just need to pass the image URL as a request parameter:
<img src="imdbImage.php?url=<?=$url?>" />

NOTE: For users outside of USA
IMDb will automatically redirect you to titles listed in the language used for release in your country (Read more).
To see films listed under their original titles regardless of your country region you will have to modify this script to scrap the titles from http://akas.imdb.com because http://www.imdb.com will automatically redirect you to your country specific title page.

Happy Scraping :)

325 comments:

  1. Thanks a lot for this script!
    I have zero knowledge of php but got this script running on windows using xampp.

    If you use xampp you might see this error:
    Call to undefined function: curl_init()

    Open php.ini file and uncomment this line:
    extension=php_curl.dll

    Then restart the server and its fixed!

    ReplyDelete
  2. Could you add the MPAA rating?

    ReplyDelete
  3. I am pretty new to PHP. I got xampp installed and working now. let me know, how to test this scraper ?

    Asmaka.

    ReplyDelete
  4. MPAA Rating included: http://lab.abhinayrathore.com/imdb/imdb.txt

    ReplyDelete
  5. Is it possible to get the full plot, instead of a cut off one?

    ReplyDelete
    Replies
    1. replace this line

      $arr['plot'] = trim(strip_tags($this->match('/< p itemprop="description">(.*?)(<\/p>|<a)/ms', $html, 1)));

      with this

      $arr['plot'] = trim(strip_tags($this->match('/(Storyline)(.*?)(Written by|<a)/ms', $html, 2)));

      Delete
  6. My server is not located in the USA, can you get it to scrape the USA title, instead of the one from the country it is in?

    ReplyDelete
  7. It is independent of what country you are in. Internet is same everywhere so it can scrap international movie pages as well.

    ReplyDelete
  8. $arr['votes'] = $this->match('/>(([0-9]+),([0-9]+)).*?votes<\/a>/ms', $html, 1);

    Is this true?

    ReplyDelete
  9. serhatyolacan,
    Try this:
    $arr['votes'] = $this->match('/href="ratings".*?>([0-9]+,?[0-9]*) votes<\/a>\)/ms', $html, 1);
    This will match even if there are 10 votes or 500,000 votes.

    I've also added votes to the scraping list. Get the latest imdb.txt file from the link above

    ReplyDelete
  10. Here is the another question :)

    Is it possible to scrap actor pictures?

    And cache media + actor images to our servers?

    And another question...

    Is it possible to call strings maually? For example:

    < div class="title" >
    < ?php echo $title ? >
    < /div >

    < div class="actorslist" >
    < ?php echo $cast ? >
    < /div >

    ...

    ReplyDelete
  11. Actually i know lots of wordpress functions. And i'm thinking wordpress plugin with some options. Or theme functions. I don't like imdbphp2 script. So i found your imdb class.

    Plugin will work like this: (You can see in top of sidebar)

    http://www.odfi.tv/yabanci-filmler/mutant-gunlukleri-chronicles-2008-turkce-divx-hd-online-izle/

    You will use custom field with "movie name" or "movie id" (prefer to use id).

    All informations will be saved to sql. So they will be cached with post. If you want to upgrade imdb infos. You just only need to update this post.

    This is great idea but i have crap php knowledge. So if possible help please :)

    ReplyDelete
  12. It is not independent of what country you are in, because for the new harry potter movie, i get "Haris Poteris ir mirties relikvijos - 1 dalis" as the title.

    ReplyDelete
  13. Anonymous, I am using Google to search for the titles on IMDb as it is more accurate then IMDb search, and I believe Google is automatically detecting your country locale and redirecting your request to the locale specific IMDb page.

    For example: The Spanish site for the Harry Potter movie is http://www.imdb.es/title/tt0926084/

    For a workaround, you might have to modify this parser a little bit. In the first run, let it give you the movie info in your locale. Then take the move id (which is same for all locales) and reformat a new url like http://www.imdb.com/title/tt0926084/ (note imdb.com in place of imdb.es), then use this new .com url directly to parse the movie info in english.

    Hope this helps :)

    ReplyDelete
  14. serhatyolacan, you idea of including the actors images is pretty good, but we cannot include the images in media images because that way it will be difficult to distinguish actors images from other ones. What I am planning to do is to return an associative array with actor name and his/her image. But for that I'll have to include some extra functions which I'll plan to do in the next version.

    As for the wordpress plugin, I've never worked on those... but you've given a new idea to explore :)

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. It is not being converted into a different domain, but the title is being translated.

    Here is an example:
    Red Eye on IMDb is: http://www.imdb.com/title/tt0421239/.

    If scrape this url, I get a a title of "Naktinis reisas"

    If i visit the url from my server, I see that a new line has been added "Original title: Red Eye"

    Here is a paste of the source when viewed on my server(look at line 419 - 432): http://pastebin.com/Xr87t7ny

    ReplyDelete
  17. serhatyolacan: you can print individual array elements:
    < div class="title" >
    < ?php echo $movieArray['title'] ? >
    < /div >

    ReplyDelete
  18. Anonymous, I guess you'll have to add another field in the parser to parse Original Title.
    Let me know if you want me to send you the regular expression for that.

    ReplyDelete
  19. I have a question about the regular expressions. I am trying to write this code in c# and I couldn't understand the lines like

    $arr['title_id'] = $this->match('/id="(tt[0-9]+)\|imdb/ms', $html, 1);

    what does the "ms" do in the end of the regex. And also can you tell me what match function do exactly. As I understand it searces for the expression in the html string but I couldn't understand what it returns.

    Thanks..

    ReplyDelete
  20. Anonymous,
    I am already working on a C#/ASP.net based IMDb Scraper and it'll be ready in coming weeks... So if you can hold that long, I'll post the new library on this blog :)

    To learn more about PHP regular expressions you can search on Google and you'll get all kind of tutorials.
    As for the "ms" in the pattern, read more about PCRE pattern modifiers: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

    ReplyDelete
  21. I get the general idea about scrapping but I am stuck in the "genre" part. I am putting the regex '/Genre.?:(.*?)(<\/div>|See more)/ms' into the regular expression software and it found nothing. Did the template of imdb changed or what??

    ReplyDelete
  22. Hello.
    Can you provide your index.php usage? I have zero experience with php.
    Thanks.

    ReplyDelete
  23. To get the original title I added the following line:

    $arr['orig_title'] = trim($this->match('/<span class="title-extra">\\n(.*?) \\n<i>/ms', $html, 1));

    ReplyDelete
  24. This is a great script. Thanks for it.
    What I would like to do is this:
    (wordpress)
    1. During a new post, I type in a movie title into a custom field named, 'Title'...
    2. When I publish the post, call the imdb script using the title I entered in the custom field.
    3. Populate fields for that post ID with the scrape results.

    Any help from you or anyone is greatly

    ReplyDelete
  25. This is incredible, I was going to make a one, probably a crappy one, but no, you did, you are the man. Thanks!

    ReplyDelete
  26. On my server say:
    "Fatal error: Call to undefined function: str_ireplace() in xxxxxxxx/imdb.php on line 15)"
    on imdb.php file.

    Line invoked is:
    $title = str_ireplace('the ', '', $title);

    (tested with provided usage.php).

    Any ideas?

    ReplyDelete
  27. You are using an older version of PHP.
    str_ireplace was introduced in PHP 5: http://php.net/manual/en/function.str-ireplace.php

    You can even remove/comment out this line and the scraper should work fine without it :)
    BUT, its better if you upgrade to the latest PHP version.

    ReplyDelete
  28. Speedy, serious, precisely.
    I love this man!

    I can't upgrade webserver provided, meanwhile my phpinfo() reply with:
    PHP Version 5.2.14

    If I comment str_ireplace, another error accorred:

    Fatal error: Call to undefined function: stripos() in xxxxx/imdb.php on line 18

    Of course line 18 is:
    if(stripos($html, "302 Moved") !== false)



    .... you'll expect a gift for Christmas!

    ReplyDelete
  29. how can i use this to be stored in a database in mysql?

    ReplyDelete
  30. How can i add the search box? thanks

    ReplyDelete
  31. Victor, you can search on google for storing data to MySql using PHP. It is out of scope for this project.

    ReplyDelete
  32. Anonymous, you can look at the html code of the test page (http://lab.abhinayrathore.com/imdb/) on how to add the search box.

    ReplyDelete
  33. Here is the code to get the actors out
    http://pastebin.com/pJtY064h

    ReplyDelete
  34. This is awesome, Abhinay!
    I have one issue, though. The $arr['genres'] seems too complicated for me.
    I wanted to insert the genres into a SQL DB, but i cant manage to do it, because the only result $movieArray[genres] is giving me is the word 'Array'.
    I was planing to do at the end 'INSERT INTO table_name [...] VALUES $movieArrays[genres] ]...]
    Is there anything I can do about it?
    Thanks in advance

    ReplyDelete
  35. Carlos,
    You can convert a PHP Array into a comma separated string using implode function:
    $value = is_array($value)?implode(",",$value):$value;
    (First check is it is an array, if it is then convert it into a comma separated string)

    ReplyDelete
  36. Great script! How can use it to scrape imdbTV?

    ReplyDelete
  37. Hi

    For some reason sometimes the movie poster doesn't appear. There's no error and the path is correct but it just stays blank. IMDB anti-leech maybe?
    Anyway I solved it by storing the image in my disc and only then showing it.

    Any ideas on how to retrieve the new "Stars" field on IMDB? The cast shows unknown actors most of the time.

    Cheers,
    Pipanni

    ReplyDelete
    Replies
    1. you can display the movie image (poster)

      getMovieInfo("The Godfather");
      $poster = $movieArray['poster'];
      $title = $movieArray['title'];
      ?>
      <img src="<?php echo $poster; ?>" alt="<?php echo $title; ?>"

      Delete
  38. Hey Pedro,
    Yes IMDb does have an anti-leech policy.
    Also, I've added the code to scrap "Stars" field from IMDb page.

    ReplyDelete
  39. Hi,

    Really it is a nice blog, I would like to tell you that you have given me much knowledge about it. Thanks for everything.

    Extract Web

    ReplyDelete
  40. How is it that when I search for 'House of Flying Daggers' my script returns 'Shi mian mai fu' and your own hosted scraper returns 'House of Flying Daggers'.

    ReplyDelete
  41. Wiethoofd,
    What country are you located in?
    It's because IMDb is redirecting you to your country specific locale page. Example Italian Page: http://www.imdb.it/title/tt0385004/. You can try replacing the locale code with "com" in these url's and try if you can get to the English Page.

    ReplyDelete
  42. I'm located in the Netherlands.

    Even when I search directly for the IMDb-ID or request the .com/title/ page it returns the Chinese title.

    I managed to preg_match the 'Also Known As:' title but in most cases the English/International title is required.

    When using the 'IMDb API' it returns the correct title: http://imdbapi.com/?i=tt0385004

    ReplyDelete
  43. IMPORTANT: For all the users who are outside of USA...
    You might see titles listed in the language used for release in your country (Read more).

    To see films listed under their original titles regardless of your country region you will have to modify this script to scrap the titles from http://akas.imdb.com because "http://www.imdb.com" will automatically redirect you to your country specific title.

    Additionally, I have modified the script to scan all the AKA Titles as well and try to extract USA Title from that list. The USA_Title may not be the correct one all the time, so you can modify the script to extract the exact titles according to your needs.

    Please go ahead and test the new version of this script to see if it works for you.

    ReplyDelete
  44. Thanks for the attempted fix, but using the http://akas.imdb.com instead of the http://www.imdb.com doesn't work either. Not in the scraper nor my browsers. The USA Title on the other hand does work.

    Howcome the scraper isn't using the http://akas.imdb.com/title/tt0385004/combined page (note the 'combined' part) to scrape all the info, this should contain much more information than the regular movie description page.

    Changing the Google I'm Feeling lucky search link to search for 'site:imdb.com' instead of 'imdb' should always result in an imdb-page to scrape.

    ReplyDelete
  45. I noticed the IMDb scraper sometimes scrapes a banner for a poster image when there is no poster available (the matching fails?)

    I added the next line after $arr['poster'] to get rid of banner-images, if any.
    if(preg_match('/^http:\/\/ad.doubleclick.net\//', $arr['poster'])) $arr['poster'] = "";

    Try it yourself with the movie 'Kooky'

    ReplyDelete
  46. hi,
    i would want to give in the IMDB link.
    lets say $url.
    and i would want to store the imdb info to my database, how do i do this?

    i haven't been doing php for a while so this is kind of hard for me

    ReplyDelete
  47. Thanks very much for this, it's great!

    I modified/created two functions for those that don't want to use the entire file and only need the imdb url and thumbnail (well that's what I needed).

    http://php.pastebin.com/7xWuZui6

    ReplyDelete
  48. Hi

    I've been using your scraper on a project of mine and today I came across a bug that's puzzling me. Any movie directed by Roland Emmerich won't show its director. It stays blank. I would try to correct this myself but my regex skills are basic.
    You can replicate this bug on any movie by Roland Emmerich: "Stargate", "Independence Day", "Godzilla", "The Day After Tomorrow". :P

    Cheers,
    Pipanni (Pedro)

    ReplyDelete
  49. Pedro,
    Thanks a bunch for pointing out this bug.
    It was a bug in the regex where it was looking for a closing div "</div>" or "and " for filtering out the directors div container.
    And because "Roland Emmerich" contains an "and " in between, it was never stripping out the complete container.

    I've fixed the bug and it should be working fine for both directors and writers :)

    ReplyDelete
  50. is this possible with Tv shows as well? i have been looking over the code and don't see that info... any help would be appreciated

    ReplyDelete
  51. I love this - but I can't get it to function right out of the gate. I've tested it and the URL pulls up the correct page, but when used in the function getMovieInfo the variable $html always pops a 302 moved error. Are there any compatibility issues with using godaddy that anyone knows about? here's my test link:
    http://www.movielint.com/db/imdb_test.php

    ReplyDelete
  52. How to get the "Filming Locations" and "Company" (without the links/a-tags)?

    ReplyDelete
  53. "is this possible with Tv shows as well? i have been looking over the code and don't see that info... any help would be appreciated"

    Yes, it would be great to customize this to only focus on TV shows and TV movies.

    ReplyDelete
  54. Hello Abhinay

    Is there any sure way to tell the difference between a movie and a tv show episode?
    I could check the running time and stop a movie insertion in the DB if that value is under 60 minutes but that would be a bit lame.

    As soon as my movies site is done I'll let you know, as you're definitely in my "Thanks" list. :)

    ReplyDelete
  55. It appears Google is now blocking scripting attempts to use their service, probably thanks to Bing and the like.

    ReplyDelete
  56. It doesnt work anymore :s
    Can we have any news from the creator ?
    It was an awesome api :/

    ReplyDelete
  57. Anonymous,
    The scraper seems to be working on my side (USA)... what country are u located in? It might be some local problem.

    ReplyDelete
  58. The scraper is also fine here in Portugal.
    Abhinay, anything about my previous question? (how to distinguish a tv show from a movie)

    ReplyDelete
  59. I am in france, and it's not working for now. I made some change to the google search url, and it worked fine until now, so i will try to change it back tomorrow to see what happen.

    ReplyDelete
  60. I couldn't wait to finish my work to work on it =)

    And with the proper url, it's working !
    Might be some trouble with the google.fr search maybe... Don't know :s

    Thanks anyway for your API, Very usefull one !

    ReplyDelete
  61. Big thanks for your library, good work !

    ReplyDelete
  62. I've been using this API for a while and it works great, but I am running across an error for a few new movies, and you can test it inside the demo here on the site.

    But movies like 'Season of the Witch' and 'The Chronicles of Narnia: Voyage of the Dawn Treader' continually reproduce an error 'title not on IMDB' but they are.

    Not sure what the glitch is.

    ReplyDelete
  63. Seems like imdb changes something because i didn't get results for all movies i searched for. All movies are not on imdb...

    ReplyDelete
  64. Well the API is using Google's 'Im Feeling Lucky' search so I just replaced that with the actual IMDB url and it seems to have cleared a few errors.

    ReplyDelete
  65. Hey. Nice script. I thinking: maybe it is possible to make js what will get info from imdb.php without reloading the page and fill some html code with specific entries from info have got? It would be interesting to make such thing.

    ReplyDelete
  66. For the people wanter to scrape TV Shows, here's how to do it:

    You'll need:
    - episode & series nr.
    - to add a line to the search before it is coded to URL

    Add the following to you search query: Moviename + " (#" + season + "." + episode + ") (TV Serie)"

    It should look like this: "House (#1.4) (TV Series)"

    Good luck!

    ReplyDelete
  67. I've been looking for a way to catalog a collection of films. A simple catalog at that; just a title, year, director and genre.

    As opposed to manually building this database, I considered an semi-automated route.

    I decided to build, (we'll call it an "application"), where it takes user input (a movie title), feeds that to your scraper then writes the relevant data to a database. Easy.

    Everything was going fine at first. However, while working on this application it appears my server's IP may have been banned from accessing IMDb as I continue to receive "No Title found on IMDb!".

    I've tested the scraper without any of my modifications and I still continue to receive that error. Have you seen something like this in the past and how do / did you avoid the same fate with your labs demo?

    Thanks.

    ReplyDelete
  68. Anonymous,
    I guess you are still using the old version of this scraper.
    Try the latest version from top link and see if you get the same error.

    ReplyDelete
  69. Hi there Abhinay! Thanks for your reply. I am indeed using the latest version of your scraper. Shortly after I posted my first message, it started to work again. It worked for about 30 minutes I'd say and has now just begun throwing the "No Title found on IMDb!" errors.

    Perhaps I am trying to make too many requests within a short amount of time so either Google or IMDb's firewall is temporarily blocking my server's IP?

    ReplyDelete
  70. HI Abhinay Rathore, thanx for great script.
    I`m using your lab page because I don`t have any knowledge about PHP and I can`t use your source php on my server. If possibl, I will be so happy if you help me to use this php script in miy own server.
    thanx

    ReplyDelete
  71. Excellent script Abhinay,

    I wonder if it would be possible to get the full cast list for a movie. That would mean another scrap as the full cast now resides on a separate page, where it used to be all on one page.

    The "new" IMDB layout is a totally different discussion ;-)

    It always points to /fullcredits#cast

    Ciao
    Stef,

    ReplyDelete
  72. Hi... great blog!

    I'm experiencing problems with it...
    I first could run your script + test php without any problem...
    I didn't modify anything and now I'm getting:
    No Title found on IMDb!
    any idea?

    Thanks!!!

    ReplyDelete
  73. Not working the following error:

    Notice: Undefined offset: 0 in C:\wamp\www\IMDB\imdb.php on line 34
    ERROR No Title found on IMDb!

    ReplyDelete
  74. Script works perfectly.. Stop posting stupid comments.. If it doesn't work, its due you! stop crying and learn PHP & HTML.

    ReplyDelete
  75. Hello. Is it possible to add Plot keywords?

    ReplyDelete
  76. And is it possible to add reward and nomination counts behind oscars? Thanks for great work...

    ReplyDelete
  77. // Awards
    $arr['awards'] = trim($this->match('/([0-9]+) wins/ms',$html, 1));

    // Nominations
    $arr['nominations'] = trim($this->match('/([0-9]+) nominations/ms',$html, 1));

    ReplyDelete
  78. If you need posters names for something:

    $arr['poster_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster']));
    $arr['poster_small_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster_small']));
    $arr['poster_large_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster_large']));

    ReplyDelete
  79. Full size posters and poster names:

    $arr['poster_full'] = substr($arr['poster'], 0, strrpos($arr['poster'], "_V1.")) . "_V1._SY0.jpg";
    $arr['poster_full_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster_full']));

    ReplyDelete
  80. You can set a condition to find a way if you search according to a series or movie?

    ReplyDelete
  81. how print the url of the movie imdb?

    ReplyDelete
  82. < ?php echo $movieArray['imdb_url'] ? >

    ReplyDelete
  83. thanks master! :D

    I can not solutions "I'm feeling lucky" feature. Use the new script but still not working :S

    ReplyDelete
  84. Awesome script!, thx for sharing!.

    ReplyDelete
  85. If someone add Cinematography and Original Music it will be perfect.

    ReplyDelete
  86. I'm working on a new way to find the movie imdb url to avoid the "I'm feeling lucky " google

    ReplyDelete
  87. You can change the language to Spanish from the results?

    ReplyDelete
  88. I tried to add Cinematography and Original Music and something else too in full credits. But i can't write their functions. Because they are in "/$title_id/fullcredits" page and they are in same table elements. Its not looking like cast tables (so can't copy foreach loop for cast). It seems to write their functions impossible or hard :/

    ReplyDelete
  89. Movie language:

    $arr['language'] = trim(strip_tags($this->match('/Language.?:<\/h4>(.*?)<\/div>/ms', $html, 1)));

    ReplyDelete
  90. Now trying to add filming location, production in movie page. And cinematography, original music, producer, make up... in /fullcredits page.

    It seems like hard for me but i will try :)

    ReplyDelete
  91. $arr['language'] = trim(strip_tags($this->match('/Language.?:<\/h4>(.*?)<\/div>/ms', $html, 1)));

    That is not to learn the language of the movie?

    I want to know how to get me back for information in Spanish

    ReplyDelete
  92. Dear Abhinay,

    Can you write example code or function for $/fullcredits page (with foreach loop for multiple names and for array) please.

    Thank you!

    ReplyDelete
  93. // Countries

    $arr['country'] = array();
    foreach($this->match_all('/(.*?)<\/a>/ms', $this->match('/Country.?:(.*?)(<\/div>|>.?and )/ms', $html, 1), 1) as $m)
    {
    array_push($arr['country'], $m);
    }

    ReplyDelete
  94. Does this script still work? I tried the example url and it doesn't return anything. I want to use this but also don't want to waste my time on something that doesn't work anymore.

    ReplyDelete
  95. Anonymous, this script does work on my side. Try using the Bing search function instead of the Google one.

    ReplyDelete
  96. Hello Abhinay. Cast and media images not working. Please check.

    ReplyDelete
  97. How hard would it be to add in the link for the movie trailer? I've been fiddling with it but can't quite get it.

    ReplyDelete
  98. Updated to 2.3 and started to work again. Sorry for bothering.

    ReplyDelete
  99. Hello,

    since an Update (most likely yesterday) the IMDB Rating is not grabbed correctly anymore. Can you look after it?

    Thanks!

    ReplyDelete
  100. Thanks for the tip Anonymous, I've fixed the rating issue.

    ReplyDelete
  101. Hey Abhinay, had the same RegEx, but couldn't post it in here for the tags. Will you maintain this scraper the next month? I'd like to rely on it for a project :-)

    If so, please please keep the API for $scraper->scrapMovieInfo consistent. I grab the HTML with my own code.

    BTW: Nice work. It's certainly no fun to write that many regexs.

    ReplyDelete
  102. $arr['rating'] = $this->match('/itemprop="ratingValue">([0-9].[0-9])<\/span>/ms', $html, 1);

    ReplyDelete
  103. Can you help me to generate the same XML file as you generated. I needed that code for my project www.flickonline.com. Thanks a bunch.

    ReplyDelete
  104. Hey, wow script works perfekt only 2 things i have problems.
    If i type this i get only "ARRAY" in my php file.
    < div class="cast">

    < div class="genres">


    All others like $movieArray['rating'] oder votes or title,... works perfekt.

    Please help me with this problem :)

    Greets Daniel

    ReplyDelete
  105. Daniel, for converting arrays to comma delimited string, use this statement:
    implode(",", $movieArray['cast'])

    ReplyDelete
  106. Hey Abhinay Rathore

    WOW works perfect now wiht the implode
    Big thx for the help and the script

    Greets Daniel

    ReplyDelete
  107. Hi Abhinay

    I've been using your script for a while and everything was fine, but one or two months ago the movie poster stopped downloading. I've tried using your php proxy but to no avail. Could IMDB be blocking my website? Everything works fine from localhost. :(

    Over

    ReplyDelete
  108. Pedro, it might be possible that IMDb is monitoring leechers pretty heavily.
    What you can try is, in your CURL, try to spoof the user agent. You can search on Google for ways of doing it.

    ReplyDelete
  109. I haven't read all the comments, but one way to get the orginal title from non-us users it to:

    $arr['orginal_title'] = trim($this->match('/class="title-extra">(.*?)</ms', $html, 1));

    Works for me.

    Great class btw!

    ReplyDelete
  110. fyi 'Votes' don't work anymore.

    Tanx for a good script!

    ReplyDelete
  111. I got some improvements. Here is my edited scrapper:

    http://pastebin.com/4Bef6m1R

    If i made a mistake please poke :)

    ReplyDelete
  112. Thanks a lot serhatyolacan,
    I've included a couple of additions from your script, but I did not include the full credits as it contains some unwanted information and fetching one more URL would slow down the scraping a bit. But anyone who needs all that info can definitely profit from it :)

    ReplyDelete
  113. Hi Abhinay,
    Great script. Got it working right out the box. Thanks alot. I do however have 1 problem. There are some movies that your script wont scrape. They also wont work on your demo page.

    I only have this issue with 2 movies:
    tt0458339 : Captain America: The First Avenger
    tt1201607 : Harry Potter: Deadly Hallows Part 2

    Perhaps im overlooking something, but i can find what is going wrong. Tried both Google & Bing search. PHP is up-to-date. All other movies work fine, only these 2 wont.

    Can you help me please.

    Regards,
    Wouter

    ReplyDelete
  114. Wouter, I don't see any problem with these titles. They are working just fine here.

    ReplyDelete
  115. Hi Abhinay,
    I have the same problem with these two titles
    tt0458339 : Captain America: The First Avenger
    tt1201607 : Harry Potter: Deadly Hallows Part 2

    Only if i search over the imdb number eg. tt0458339

    Regards,
    Gajan

    ReplyDelete
  116. Gajan and Wouter,
    The problem with searching using certain title id's is fixed now. Instead of searching for complete url match, its now only searching for IMDb title id's and it can now capture all alternate urls for a title id search.
    Also, added a new function "getMovieInfoById" to directly get results from IMDb if you know the id.

    ReplyDelete
  117. @Abhinay, hello sir,

    I have a begginer question.

    I want to print out the Genres, but when I use
    echo $movieArray['genres'].'
    ;

    It give me a text result: 'Array' and this is all I get.

    I saw your previous replys, like to implode, but I don't know how to use it, can you give me please a php line wich will echo the specific the genres?

    Thank you.

    ReplyDelete
  118. Anonymous,
    Check the usage example file link above.
    For your particular problem:
    echo implode(", ", $movieArray['genres']);

    ReplyDelete
  119. Thank you Abhinay Rahtore, echo implode(", ", $movieArray['genres']); worked briliant.

    I will post here when I finish the project, maybe you will like the result.

    ReplyDelete
  120. Hello Ahmed. This function allows full movie url or title id. You will like it :)

    http://pastebin.com/sYmiPayy

    ReplyDelete
  121. Hello Abhinay.
    I am student currently pursuing BTECH-CSE.
    I am currently working on a "web crawling and scrapping in PHP" project and would like to have your valuable help and suggestions on the same. I would be grateful if you mail me at
    sumitdeb1001@gmail.com
    thanks in advance...

    ReplyDelete
  122. Small fix:
    $url= "http://www.google.com/search?q=on+site:imdb.com+" . rawurlencode($title);

    ReplyDelete
  123. Hello Abhinay ,

    i need your help :( can you help me? i have 2 page ,
    in a first page i have a text filed + submit button , and in second page i have a Movie Title , Movie Rate , Movie Plot and ,,,

    now how can i use your code , in first page , when i type imdb link and click in button , show information Title , Movie rate , and ,,, in second page

    ps : i started to learn php in few past month

    ReplyDelete
  124. dear Abhinay plz accept my previews comment , and help me :( tnX

    ReplyDelete
  125. AmirReza, please go ahead and learn form handling in PHP. That'll help you with this and similar problems in future.
    Some examples: http://www.w3schools.com/php/php_forms.asp and http://www.tizag.com/phpT/examples/formex.php

    ReplyDelete
  126. fyi 'Votes' don't work anymore.

    ReplyDelete
  127. Here is my result partially based on your script, http://plusimdb.com
    Thank you a lot.

    ReplyDelete
  128. Just a suggestion, you could use in your post a tag or something called:

    Last update: Day-Month-Year

    This way, we will know when you update anything, since you keep the same version at every modify made.

    ReplyDelete
  129. great work man .. thanks for the latest (sept 22 2011) fix....

    ReplyDelete
  130. Hi. how can i grab the trailer link with this?

    Its working great anyways :D

    ReplyDelete
  131. If someone need the Trivia from movies, here it is:

    $arr['trivia'] = trim(strip_tags($this->match('/Trivia<\/h4>(.*?)(|<span)/ms', $html, 1)));

    ReplyDelete
  132. here's my version

    https://github.com/Islander/Isy_IMDB

    cheers abhinay :)

    ReplyDelete
  133. Hi Abhi..I would like to tell you that you have given me much knowledge about it. That was really a great script.

    ReplyDelete
  134. Hi
    Can i search by title-id ? How ?
    please 1 sample code
    search by title name :
    http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=Titanic&o=xml

    search by title-ID name : ???????
    please help thanks
    Regards

    ReplyDelete
  135. M2H, this is a pretty versatile api so you can put title-id in place of the movie name:
    http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=tt1234567&o=xml

    ReplyDelete
  136. This comment has been removed by the author.

    ReplyDelete
  137. @Abhinay Rathore :
    Very very good
    Thanks a lot
    King Regards.
    again thanks :)

    ReplyDelete
  138. i saved ur given file as imdb.php,and then i created test.php:
    getMovieInfo("The Godfather");
    echo $imdb->arr['title'];
    ?>

    please helpme anyone its not working....

    ReplyDelete
  139. Its a great piece of code, one question, where do you get the autocomplete results from... can you put a suggest.php sample code?

    ReplyDelete
  140. i've been able to pick up the results from google suggest but not particularly from imdb results.

    ReplyDelete
  141. Anonymous, I am working on an easy to implement code for pulling search suggestions from IMDb. I'll post the code on this blog pretty soon :) Stay tuned!

    ReplyDelete
  142. Can you please provide the regular expression for locations

    ReplyDelete
  143. Hey Abhinay, this is a really great script! It really helps me out :)

    Is there a way to input multiple imdb urls and then get the data spit out into columns rather than rows?

    Also, how can I get it to separate cast members by a comma rather than a new line (
    )?

    Thanks!

    Thanks!

    ReplyDelete
  144. Is it possible to use the scrapper to get the top250 or the top action genre or the top 1990's movies?

    ReplyDelete
  145. Hello,
    Thx for this job.
    I use this scraper in my free software : xbne (http://passion-xbmc.org/downloads/?sa=view;id=23)
    - Is it possible to add trailer ?
    - Is it possible to add a tag in :
    1) Event images for advertising picture. Ex :(<a title="Mickey Rourke at event...)
    2) Thumbs format (Height > Width)
    3) Fanart format (Whidth > Heigth)

    Thanks..
    Vincent

    ReplyDelete
  146. hi how do i display the top 10 releases from im db ? by the way nice and useful script the only problem is you have to specify a imdb id for it to work :) i would like to be able to get the top 10 box office movies in the uk can somebody explain how it can be done please because i am no good with object oriented programming :) thanks

    ReplyDelete
  147. My request to google is being blocked.

    ReplyDelete
  148. 1000 times thankyou. This is awesome and works great.

    All hail Rathore for he has given the gift of awesomeness!

    ReplyDelete
  149. Here is what I have achieved http://demo.plusimdb.com

    ReplyDelete
  150. As of this morning it seems that the MPAA is broken again. Any help would be appreciated, I can't seem to figure it out.

    Oh and this script is amazing.

    ReplyDelete
  151. MPAA is still broken in the script but I was able to get it working. Thanks again for the great tool!

    ReplyDelete
  152. Abhinay, thanks for making this freely available. Not only has your API been very reliably effective for my project, but I'm also just learning PHP, so have benefitted greatly from being able to look through how your API works. Thanks!

    ReplyDelete
  153. Dear Abhinay

    Is your movie catalog script open source? can i have this script.
    I have many movies and i use some different software for manage the list but your script is very very perfect and usefull.

    Thanks

    ReplyDelete
    Replies
    1. Mostafa, my movie catalog script is not open source as of now! BUT time permitting, I am planning to launch this as open source :)

      Delete
  154. just noticed there is an updated script, but using either, i get no movie found using google.com , google.co.uk or bing.com.

    i am only scrapping the imdb number and the rating (to compare vs my own).

    any ideas on a fix?

    thanks

    ReplyDelete
  155. Hey, can I include this script in a commercial project? Donation to you will be given of course :)

    ReplyDelete
    Replies
    1. Yes you can definitely use this script for any project you wish to. Just beware of IMDb's screen scraping policies: http://www.imdb.com/help/show_article?conditions :)

      Delete
    2. Thank you, you are great! Where can I donate?

      Delete
    3. Absolutely Awesome. Many thanks for keeping this updated.

      Delete
    4. Fantastic script. I seem to be having a php coding issue. Apostrophes are coded as ' in plot and tagline strings. I've tried several php functions to decode the string, but none work. Can you suggest a solution?

      Delete
    5. that should have been Ambersand-Pound-x27; but it was converted by your page.

      Delete
  156. Script is very good. How do i additionally get the User Reviews too.

    ReplyDelete
  157. Hi!
    Is there a way to integrate it in WORDPRESS?
    Would be great.

    Thanks!

    ReplyDelete
  158. Hi, could you please provide a full cast and crew information.
    And also modified scrapper for persons?

    ReplyDelete
  159. Hi,

    I think IMDB have changed something on their site. If there is no poster available it shows the wrong output. For example 'Pepsi Smash' - if you enter this on your demo page it shows facebook images.

    ReplyDelete
    Replies
    1. http://lab.abhinayrathore.com/imdb/?m=pepsi+smash&submit=Search

      Delete
  160. Yeap, the imdb changed something because some of the posters can be retrieved anymore, any updates on this? Thank you.

    ReplyDelete
    Replies
    1. It looks fine to me on this side. Can you send some example titles that are not working?

      Delete
    2. Look at the previous comment: http://lab.abhinayrathore.com/imdb/?m=pepsi+smash&submit=Search

      Delete
    3. So Pepsi Smash & don't be tardy for the wedding

      Delete
    4. Somehow searching for "Pepsi Smash" returns me "Basic Instinct" here in USA. What country are you located in? It's possible that IMDb is forwarding to your locale site.

      Delete
    5. I'm using http://akas.imdb.com

      But the problem is with tv series and documentaries. Last week it pulled a poster from the TV series, now it grabs /images/widgets/facebook_share.png

      I'm from The Netherlands - so it may be possible that it searches on the UK site. But if you use akas.imdb.com it grabs from US site? But there is a change on the IMDB site.

      Delete
  161. The Release Date is apparently taken from the main page, which may display the release date in another country instead of the original release date.
    Example: http://akas.imdb.com/title/tt0219400/ shows the Turkey release date on the main page.

    I guess the Release Date would have to be calculated as the earliest date from the releaseinfo page.

    ReplyDelete
  162. I have also encounter a other 'small' thing. It now also grabs the line 'See full cast and crew' with the stars. This is also new. So you have the poster problem and this thing.

    ReplyDelete
  163. Hi!

    Pls help...

    I only need to media images. How can I retrieve it? imdbimage.php might use.

    thanks

    Best regards

    Gergő

    ReplyDelete
  164. How can i use it, if i have a database with 15.000 movies with the link of imdb and i want to copy the data in my own database??

    and if i want to add a new movie, how can i do to only put the url of the movie and this script scrap all the data and save into database?

    ReplyDelete
  165. Hi. I am using the imdb.php file and the test file you have for demonstration but it reutrns an error no matter what movie I search for. It seems like the error is happening in the "geturl" function. Can anyone help me figure this out? I am only an intermediate at best with php so please be patient with me! :)

    ReplyDelete
  166. Awesome script, thank you!!

    Why don't use the "combined" (http://akas.imdb.com/title/tt0120338/combined)

    There is important information like:

    Original Music
    Cinematography
    Film Editing
    Production Companies

    Regards!

    ReplyDelete
  167. It was working perfectly but a few days ago the image posters url's stopped working.

    Is there a fix for this?

    ReplyDelete
  168. Dear Abhinay Rathore,
    first of all, thank you so much for your script. It works & it is flexible.

    I need your help. Can you please add info for whether the title is a movie or documentary or tv series?

    thanks in advance, BR

    CAGRI

    ReplyDelete
  169. Dear Abhinay Rathore and other friends,

    Why can not I get a "-" with the code below even if a movie doesn't have a original name in IMDB? What am I doing wrong? thanks in advance, BR
    CAGRI

    if (!is_null(trim($movieArray['original_title']))) {echo $movieArray['original_title'];} else {echo '-';}

    ReplyDelete
    Replies
    1. Dear Abhinay Rathore
      I also tried if(!empty(...)). Result is same. I can't get "-" for the movies without any original name.

      do you have any other recommendation?

      thanks in advance, BR
      CAGRI

      Delete
    2. sorry for my false post, it is working with empty(...).
      thanks so much

      Delete
  170. Dear Abhinay Rathore and other friends,

    for the movie tt0078771 - IMDB Title:Love on the Run. it is an French movie. Original name is:L'amour en fuite

    I use getMovieInfoById function.

    My strange detection is:
    I get tt0078771's IMDB title as original name & I get original name as IMDB title. I also checked IMDB page (everything seems OK). What can be the reason for this movie's titles place replacement case?

    thanks in advance, BR
    CAGRI

    ReplyDelete
    Replies
    1. please ignore my previous question.
      for the new situation, my question has changed:

      I use getMovieInfoById function in imdb class. I live in Turkey. I don't use akas.imdb.com. for the movie tt0078771, I can not get IMDB title that is Love on the Run.
      what I get is:
      IMDB ID TITLE ORIGINAL TITLE USA TITLE
      tt0078771 L'amour en fuite - Love on the Run

      but it should be as below:
      IMDB ID TITLE ORIGINAL TITLE USA TITLE
      tt0078771 Love on the Run L'amour en fuite Love on the Run

      in demo page, if I input tt0078771 then I get title & original title ok but there exists no USA title in this case.

      Can you help? I am very confused.

      thanks in advance, BR
      CAGRI

      Delete
  171. UPDATE Note: in order to echo also 1 win and/or 1 nomination

    I changed these and it works:
    $arr['awards'] = trim($this->match('/(\d+) win/ms',$html, 1));

    $arr['nominations'] = trim($this->match('/(\d+) nomination/ms',$html, 1));

    what i've changed:
    wins to win
    nominations to nomination

    try with tt0115561. it has 1 win only.

    BR
    CAGRI

    ReplyDelete
  172. Dear Abhinay Rathore and other friends,

    update requirement:
    if the movie had only 1 oscar, script echo empty. oscars to oscar does no effect because IMDB writes "Won Oscar." if there is only 1 oscar. Since it doesn't use numerical 1 value, result comes empty.

    can you please update this issue

    thanks in advance, BR
    CAGRI

    ReplyDelete
  173. Hi Abhinay,

    If you update the script this year please think in the "combined" option (http://www.imdb.com/title/tt0088247/combined)

    I want to put some important information like:

    Original Music
    Cinematography
    Film Editing
    Production Companies

    I tried to implement myself with no luck, im very bad at scraping.

    Regards and happy new year!

    ReplyDelete
  174. There still something wrong retrieving Posters. It's not a mistake in the script - but it's something in the website from IMDB. For example if you search for Men Are Such Fools (tt0030433) you will get this URL: http://lab.abhinayrathore.com/imdb/imdbImage.php?url=http://b.scorecardresearch.com/p?c1=2&c2=6034961&c3=&c4=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0030433%2F&c5=c6=&15=&cj=1

    ReplyDelete

Thanks a lot for your valuable comments :)