IMDb is undoubtedly the leading information source for media information and is the top target of web scraping for movie lovers around the world. Unfortunately IMDb does not provide an API to access its database so web scraping is the only resort for us. PHP being one of the most commonly used and powerful web development language enables easy web scraping with the power of PCRE (Perl Compatible Regular Expressions).
For my recent project on a Movie Catalog (http://movies.abhinayrathore.com), I needed a IMDb scraper and found one built by Tyler Hall. His version was not robust enough to scrap all kind of movie pages so I extended it and made it more robust to support different type of titles, BUT recently IMDb changed its page template and most of the old scrapers stopped working including mine. So, I modified my scraper to accommodate the new template changes and considered it as my moral responsibility to contribute back to the developer community.
This new scraper is very robust and capable enough to handle a wide variety of new template modifications. Apart from the regular information it even goes deep to scan extra media images and release dates.
Last Updated: Feb 1, 2014
Major changes in Feb 20, 2013 version:
- Now we use the combined information page to scrape the data. This page doesn't change quite often and we can get complete list of individual departments.
- Add a few more entities; producers, musicians, cinematographers, editors etc. Removed metascore information. Removed small poster url.
- You can now pass a second boolean parameter to the getMovieInfo() and getMovieInfoById() functions to disable the extra information. By default it is set to true and may slow down the scraping. If you don't need all the extra info like Storyline, Release Dates, Recommendations or Media Images, just pass false as second parameter to these methods. Example $movieArray = $imdb->getMovieInfo("The Godfather", false);.
- Information for individuals in the list of directors, cast, writers etc. is now in an associative array with key being the IMDb id of the individual.
Here is a list of all the attributes it scraps from the IMDb page:
- TITLE_ID
- TITLE
- YEAR
- RATING
- GENRES
- STARS
- DIRECTORS
- WRITERS
- CAST
- PRODUCERS
- MUSICIANS
- CINEMATOGRAPHERS
- EDITORS
- ALSO_KNOWN_AS
- RELEASE_DATE
- RELEASE_DATES
- PLOT
- POSTER
- POSTER_LARGE
- RUNTIME
- TOP_250
- OSCARS
- AWARDS
- NOMINATIONS
- STORYLINE
- TAGLINE
- MEDIA_IMAGES
- MPAA_RATING
- VOTES
- RECOMMENDED_TITLES
- VIDEOS
How to use this PHP Scraper?
Include the class file on your php page
include("imdb.php");
Instantiate the class and get the results in an array:
$imdb = new Imdb();
$movieArray = $imdb->getMovieInfo("The Godfather");
You can try this scraper on my lab page: http://lab.abhinayrathore.com/imdb/
To download the PHP Source Code directly use this link: http://lab.abhinayrathore.com/imdb/imdb_php.htm
Fork it on GitHub: https://github.com/abhinayrathore/PHP-IMDb-Scraper
Example usage: http://lab.abhinayrathore.com/imdb/usage.htm
Proxy script for downloading or displaying Media images on your website: http://lab.abhinayrathore.com/imdb/imdbImage.txt
To implement you own IMDb Web Service API to return data in XML, JSON or JSONP format, use this script along with the API: http://lab.abhinayrathore.com/imdb/imdbWebService.htm
To implement IMDb.com's search suggestions on your website, please follow this post: http://web3o.blogspot.com/2011/10/imdb-search-suggestions-with-jquery.html
If you find any part of this scraper broken or incorrect, please drop a comment here and I’ll try to fix it as soon as possible.
IMDb has a leechers policy in place for media images. You may not be able to use the URL for some of the images to display on your website. As a workaround you can use a PHP Proxy to display or download those images. I’ve written a small proxy script to grab the images: http://lab.abhinayrathore.com/imdb/imdbImage.txt. To use this script you just need to pass the image URL as a request parameter:
<img src="imdbImage.php?url=<?=$url?>" />
NOTE: For users outside of USA
IMDb will automatically redirect you to titles listed in the language used for release in your country (Read more).
To see films listed under their original titles regardless of your country region you will have to modify this script to scrap the titles from http://akas.imdb.com because http://www.imdb.com will automatically redirect you to your country specific title page.
Happy Scraping :)
Thanks a lot for this script!
ReplyDeleteI have zero knowledge of php but got this script running on windows using xampp.
If you use xampp you might see this error:
Call to undefined function: curl_init()
Open php.ini file and uncomment this line:
extension=php_curl.dll
Then restart the server and its fixed!
Could you add the MPAA rating?
ReplyDeleteI am pretty new to PHP. I got xampp installed and working now. let me know, how to test this scraper ?
ReplyDeleteAsmaka.
MPAA Rating included: http://lab.abhinayrathore.com/imdb/imdb.txt
ReplyDeleteIs it possible to get the full plot, instead of a cut off one?
ReplyDeletereplace this line
Delete$arr['plot'] = trim(strip_tags($this->match('/< p itemprop="description">(.*?)(<\/p>|<a)/ms', $html, 1)));
with this
$arr['plot'] = trim(strip_tags($this->match('/(Storyline)(.*?)(Written by|<a)/ms', $html, 2)));
My server is not located in the USA, can you get it to scrape the USA title, instead of the one from the country it is in?
ReplyDeleteIt is independent of what country you are in. Internet is same everywhere so it can scrap international movie pages as well.
ReplyDelete$arr['votes'] = $this->match('/>(([0-9]+),([0-9]+)).*?votes<\/a>/ms', $html, 1);
ReplyDeleteIs this true?
serhatyolacan,
ReplyDeleteTry this:
$arr['votes'] = $this->match('/href="ratings".*?>([0-9]+,?[0-9]*) votes<\/a>\)/ms', $html, 1);
This will match even if there are 10 votes or 500,000 votes.
I've also added votes to the scraping list. Get the latest imdb.txt file from the link above
Here is the another question :)
ReplyDeleteIs it possible to scrap actor pictures?
And cache media + actor images to our servers?
And another question...
Is it possible to call strings maually? For example:
< div class="title" >
< ?php echo $title ? >
< /div >
< div class="actorslist" >
< ?php echo $cast ? >
< /div >
...
Actually i know lots of wordpress functions. And i'm thinking wordpress plugin with some options. Or theme functions. I don't like imdbphp2 script. So i found your imdb class.
ReplyDeletePlugin will work like this: (You can see in top of sidebar)
http://www.odfi.tv/yabanci-filmler/mutant-gunlukleri-chronicles-2008-turkce-divx-hd-online-izle/
You will use custom field with "movie name" or "movie id" (prefer to use id).
All informations will be saved to sql. So they will be cached with post. If you want to upgrade imdb infos. You just only need to update this post.
This is great idea but i have crap php knowledge. So if possible help please :)
It is not independent of what country you are in, because for the new harry potter movie, i get "Haris Poteris ir mirties relikvijos - 1 dalis" as the title.
ReplyDeleteAnonymous, I am using Google to search for the titles on IMDb as it is more accurate then IMDb search, and I believe Google is automatically detecting your country locale and redirecting your request to the locale specific IMDb page.
ReplyDeleteFor example: The Spanish site for the Harry Potter movie is http://www.imdb.es/title/tt0926084/
For a workaround, you might have to modify this parser a little bit. In the first run, let it give you the movie info in your locale. Then take the move id (which is same for all locales) and reformat a new url like http://www.imdb.com/title/tt0926084/ (note imdb.com in place of imdb.es), then use this new .com url directly to parse the movie info in english.
Hope this helps :)
serhatyolacan, you idea of including the actors images is pretty good, but we cannot include the images in media images because that way it will be difficult to distinguish actors images from other ones. What I am planning to do is to return an associative array with actor name and his/her image. But for that I'll have to include some extra functions which I'll plan to do in the next version.
ReplyDeleteAs for the wordpress plugin, I've never worked on those... but you've given a new idea to explore :)
This comment has been removed by the author.
ReplyDeleteIt is not being converted into a different domain, but the title is being translated.
ReplyDeleteHere is an example:
Red Eye on IMDb is: http://www.imdb.com/title/tt0421239/.
If scrape this url, I get a a title of "Naktinis reisas"
If i visit the url from my server, I see that a new line has been added "Original title: Red Eye"
Here is a paste of the source when viewed on my server(look at line 419 - 432): http://pastebin.com/Xr87t7ny
serhatyolacan: you can print individual array elements:
ReplyDelete< div class="title" >
< ?php echo $movieArray['title'] ? >
< /div >
Anonymous, I guess you'll have to add another field in the parser to parse Original Title.
ReplyDeleteLet me know if you want me to send you the regular expression for that.
I have a question about the regular expressions. I am trying to write this code in c# and I couldn't understand the lines like
ReplyDelete$arr['title_id'] = $this->match('/id="(tt[0-9]+)\|imdb/ms', $html, 1);
what does the "ms" do in the end of the regex. And also can you tell me what match function do exactly. As I understand it searces for the expression in the html string but I couldn't understand what it returns.
Thanks..
Anonymous,
ReplyDeleteI am already working on a C#/ASP.net based IMDb Scraper and it'll be ready in coming weeks... So if you can hold that long, I'll post the new library on this blog :)
To learn more about PHP regular expressions you can search on Google and you'll get all kind of tutorials.
As for the "ms" in the pattern, read more about PCRE pattern modifiers: http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
I get the general idea about scrapping but I am stuck in the "genre" part. I am putting the regex '/Genre.?:(.*?)(<\/div>|See more)/ms' into the regular expression software and it found nothing. Did the template of imdb changed or what??
ReplyDeleteHello.
ReplyDeleteCan you provide your index.php usage? I have zero experience with php.
Thanks.
To get the original title I added the following line:
ReplyDelete$arr['orig_title'] = trim($this->match('/<span class="title-extra">\\n(.*?) \\n<i>/ms', $html, 1));
This is incredible, I was going to make a one, probably a crappy one, but no, you did, you are the man. Thanks!
ReplyDeleteOn my server say:
ReplyDelete"Fatal error: Call to undefined function: str_ireplace() in xxxxxxxx/imdb.php on line 15)"
on imdb.php file.
Line invoked is:
$title = str_ireplace('the ', '', $title);
(tested with provided usage.php).
Any ideas?
You are using an older version of PHP.
ReplyDeletestr_ireplace was introduced in PHP 5: http://php.net/manual/en/function.str-ireplace.php
You can even remove/comment out this line and the scraper should work fine without it :)
BUT, its better if you upgrade to the latest PHP version.
Speedy, serious, precisely.
ReplyDeleteI love this man!
I can't upgrade webserver provided, meanwhile my phpinfo() reply with:
PHP Version 5.2.14
If I comment str_ireplace, another error accorred:
Fatal error: Call to undefined function: stripos() in xxxxx/imdb.php on line 18
Of course line 18 is:
if(stripos($html, "302 Moved") !== false)
.... you'll expect a gift for Christmas!
how can i use this to be stored in a database in mysql?
ReplyDeleteHow can i add the search box? thanks
ReplyDeleteVictor, you can search on google for storing data to MySql using PHP. It is out of scope for this project.
ReplyDeleteAnonymous, you can look at the html code of the test page (http://lab.abhinayrathore.com/imdb/) on how to add the search box.
ReplyDeleteHere is the code to get the actors out
ReplyDeletehttp://pastebin.com/pJtY064h
This is awesome, Abhinay!
ReplyDeleteI have one issue, though. The $arr['genres'] seems too complicated for me.
I wanted to insert the genres into a SQL DB, but i cant manage to do it, because the only result $movieArray[genres] is giving me is the word 'Array'.
I was planing to do at the end 'INSERT INTO table_name [...] VALUES $movieArrays[genres] ]...]
Is there anything I can do about it?
Thanks in advance
Carlos,
ReplyDeleteYou can convert a PHP Array into a comma separated string using implode function:
$value = is_array($value)?implode(",",$value):$value;
(First check is it is an array, if it is then convert it into a comma separated string)
Great script! How can use it to scrape imdbTV?
ReplyDeleteHi
ReplyDeleteFor some reason sometimes the movie poster doesn't appear. There's no error and the path is correct but it just stays blank. IMDB anti-leech maybe?
Anyway I solved it by storing the image in my disc and only then showing it.
Any ideas on how to retrieve the new "Stars" field on IMDB? The cast shows unknown actors most of the time.
Cheers,
Pipanni
you can display the movie image (poster)
DeletegetMovieInfo("The Godfather");
$poster = $movieArray['poster'];
$title = $movieArray['title'];
?>
<img src="<?php echo $poster; ?>" alt="<?php echo $title; ?>"
Hey Pedro,
ReplyDeleteYes IMDb does have an anti-leech policy.
Also, I've added the code to scrap "Stars" field from IMDb page.
Hi,
ReplyDeleteReally it is a nice blog, I would like to tell you that you have given me much knowledge about it. Thanks for everything.
Extract Web
How is it that when I search for 'House of Flying Daggers' my script returns 'Shi mian mai fu' and your own hosted scraper returns 'House of Flying Daggers'.
ReplyDeleteWiethoofd,
ReplyDeleteWhat country are you located in?
It's because IMDb is redirecting you to your country specific locale page. Example Italian Page: http://www.imdb.it/title/tt0385004/. You can try replacing the locale code with "com" in these url's and try if you can get to the English Page.
I'm located in the Netherlands.
ReplyDeleteEven when I search directly for the IMDb-ID or request the .com/title/ page it returns the Chinese title.
I managed to preg_match the 'Also Known As:' title but in most cases the English/International title is required.
When using the 'IMDb API' it returns the correct title: http://imdbapi.com/?i=tt0385004
IMPORTANT: For all the users who are outside of USA...
ReplyDeleteYou might see titles listed in the language used for release in your country (Read more).
To see films listed under their original titles regardless of your country region you will have to modify this script to scrap the titles from http://akas.imdb.com because "http://www.imdb.com" will automatically redirect you to your country specific title.
Additionally, I have modified the script to scan all the AKA Titles as well and try to extract USA Title from that list. The USA_Title may not be the correct one all the time, so you can modify the script to extract the exact titles according to your needs.
Please go ahead and test the new version of this script to see if it works for you.
Thanks for the attempted fix, but using the http://akas.imdb.com instead of the http://www.imdb.com doesn't work either. Not in the scraper nor my browsers. The USA Title on the other hand does work.
ReplyDeleteHowcome the scraper isn't using the http://akas.imdb.com/title/tt0385004/combined page (note the 'combined' part) to scrape all the info, this should contain much more information than the regular movie description page.
Changing the Google I'm Feeling lucky search link to search for 'site:imdb.com' instead of 'imdb' should always result in an imdb-page to scrape.
I noticed the IMDb scraper sometimes scrapes a banner for a poster image when there is no poster available (the matching fails?)
ReplyDeleteI added the next line after $arr['poster'] to get rid of banner-images, if any.
if(preg_match('/^http:\/\/ad.doubleclick.net\//', $arr['poster'])) $arr['poster'] = "";
Try it yourself with the movie 'Kooky'
hi,
ReplyDeletei would want to give in the IMDB link.
lets say $url.
and i would want to store the imdb info to my database, how do i do this?
i haven't been doing php for a while so this is kind of hard for me
Thanks very much for this, it's great!
ReplyDeleteI modified/created two functions for those that don't want to use the entire file and only need the imdb url and thumbnail (well that's what I needed).
http://php.pastebin.com/7xWuZui6
Hi
ReplyDeleteI've been using your scraper on a project of mine and today I came across a bug that's puzzling me. Any movie directed by Roland Emmerich won't show its director. It stays blank. I would try to correct this myself but my regex skills are basic.
You can replicate this bug on any movie by Roland Emmerich: "Stargate", "Independence Day", "Godzilla", "The Day After Tomorrow". :P
Cheers,
Pipanni (Pedro)
Pedro,
ReplyDeleteThanks a bunch for pointing out this bug.
It was a bug in the regex where it was looking for a closing div "</div>" or "and " for filtering out the directors div container.
And because "Roland Emmerich" contains an "and " in between, it was never stripping out the complete container.
I've fixed the bug and it should be working fine for both directors and writers :)
is this possible with Tv shows as well? i have been looking over the code and don't see that info... any help would be appreciated
ReplyDeleteI love this - but I can't get it to function right out of the gate. I've tested it and the URL pulls up the correct page, but when used in the function getMovieInfo the variable $html always pops a 302 moved error. Are there any compatibility issues with using godaddy that anyone knows about? here's my test link:
ReplyDeletehttp://www.movielint.com/db/imdb_test.php
How to get the "Filming Locations" and "Company" (without the links/a-tags)?
ReplyDelete"is this possible with Tv shows as well? i have been looking over the code and don't see that info... any help would be appreciated"
ReplyDeleteYes, it would be great to customize this to only focus on TV shows and TV movies.
Hello Abhinay
ReplyDeleteIs there any sure way to tell the difference between a movie and a tv show episode?
I could check the running time and stop a movie insertion in the DB if that value is under 60 minutes but that would be a bit lame.
As soon as my movies site is done I'll let you know, as you're definitely in my "Thanks" list. :)
It appears Google is now blocking scripting attempts to use their service, probably thanks to Bing and the like.
ReplyDeleteIt doesnt work anymore :s
ReplyDeleteCan we have any news from the creator ?
It was an awesome api :/
Anonymous,
ReplyDeleteThe scraper seems to be working on my side (USA)... what country are u located in? It might be some local problem.
The scraper is also fine here in Portugal.
ReplyDeleteAbhinay, anything about my previous question? (how to distinguish a tv show from a movie)
I am in france, and it's not working for now. I made some change to the google search url, and it worked fine until now, so i will try to change it back tomorrow to see what happen.
ReplyDeleteI couldn't wait to finish my work to work on it =)
ReplyDeleteAnd with the proper url, it's working !
Might be some trouble with the google.fr search maybe... Don't know :s
Thanks anyway for your API, Very usefull one !
Big thanks for your library, good work !
ReplyDeleteI've been using this API for a while and it works great, but I am running across an error for a few new movies, and you can test it inside the demo here on the site.
ReplyDeleteBut movies like 'Season of the Witch' and 'The Chronicles of Narnia: Voyage of the Dawn Treader' continually reproduce an error 'title not on IMDB' but they are.
Not sure what the glitch is.
Seems like imdb changes something because i didn't get results for all movies i searched for. All movies are not on imdb...
ReplyDeleteWell the API is using Google's 'Im Feeling Lucky' search so I just replaced that with the actual IMDB url and it seems to have cleared a few errors.
ReplyDeleteHey. Nice script. I thinking: maybe it is possible to make js what will get info from imdb.php without reloading the page and fill some html code with specific entries from info have got? It would be interesting to make such thing.
ReplyDeleteFor the people wanter to scrape TV Shows, here's how to do it:
ReplyDeleteYou'll need:
- episode & series nr.
- to add a line to the search before it is coded to URL
Add the following to you search query: Moviename + " (#" + season + "." + episode + ") (TV Serie)"
It should look like this: "House (#1.4) (TV Series)"
Good luck!
I've been looking for a way to catalog a collection of films. A simple catalog at that; just a title, year, director and genre.
ReplyDeleteAs opposed to manually building this database, I considered an semi-automated route.
I decided to build, (we'll call it an "application"), where it takes user input (a movie title), feeds that to your scraper then writes the relevant data to a database. Easy.
Everything was going fine at first. However, while working on this application it appears my server's IP may have been banned from accessing IMDb as I continue to receive "No Title found on IMDb!".
I've tested the scraper without any of my modifications and I still continue to receive that error. Have you seen something like this in the past and how do / did you avoid the same fate with your labs demo?
Thanks.
Anonymous,
ReplyDeleteI guess you are still using the old version of this scraper.
Try the latest version from top link and see if you get the same error.
Hi there Abhinay! Thanks for your reply. I am indeed using the latest version of your scraper. Shortly after I posted my first message, it started to work again. It worked for about 30 minutes I'd say and has now just begun throwing the "No Title found on IMDb!" errors.
ReplyDeletePerhaps I am trying to make too many requests within a short amount of time so either Google or IMDb's firewall is temporarily blocking my server's IP?
HI Abhinay Rathore, thanx for great script.
ReplyDeleteI`m using your lab page because I don`t have any knowledge about PHP and I can`t use your source php on my server. If possibl, I will be so happy if you help me to use this php script in miy own server.
thanx
Excellent script Abhinay,
ReplyDeleteI wonder if it would be possible to get the full cast list for a movie. That would mean another scrap as the full cast now resides on a separate page, where it used to be all on one page.
The "new" IMDB layout is a totally different discussion ;-)
It always points to /fullcredits#cast
Ciao
Stef,
Hi... great blog!
ReplyDeleteI'm experiencing problems with it...
I first could run your script + test php without any problem...
I didn't modify anything and now I'm getting:
No Title found on IMDb!
any idea?
Thanks!!!
Not working the following error:
ReplyDeleteNotice: Undefined offset: 0 in C:\wamp\www\IMDB\imdb.php on line 34
ERROR No Title found on IMDb!
Script works perfectly.. Stop posting stupid comments.. If it doesn't work, its due you! stop crying and learn PHP & HTML.
ReplyDeleteHello. Is it possible to add Plot keywords?
ReplyDeleteAnd is it possible to add reward and nomination counts behind oscars? Thanks for great work...
ReplyDelete// Awards
ReplyDelete$arr['awards'] = trim($this->match('/([0-9]+) wins/ms',$html, 1));
// Nominations
$arr['nominations'] = trim($this->match('/([0-9]+) nominations/ms',$html, 1));
If you need posters names for something:
ReplyDelete$arr['poster_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster']));
$arr['poster_small_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster_small']));
$arr['poster_large_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster_large']));
Full size posters and poster names:
ReplyDelete$arr['poster_full'] = substr($arr['poster'], 0, strrpos($arr['poster'], "_V1.")) . "_V1._SY0.jpg";
$arr['poster_full_name'] = strtolower(preg_replace("/^(http:\/\/.*\/)?/i",'',$arr['poster_full']));
You can set a condition to find a way if you search according to a series or movie?
ReplyDeletehow print the url of the movie imdb?
ReplyDelete< ?php echo $movieArray['imdb_url'] ? >
ReplyDeletethanks master! :D
ReplyDeleteI can not solutions "I'm feeling lucky" feature. Use the new script but still not working :S
Awesome script!, thx for sharing!.
ReplyDeleteIf someone add Cinematography and Original Music it will be perfect.
ReplyDeleteI'm working on a new way to find the movie imdb url to avoid the "I'm feeling lucky " google
ReplyDeleteYou can change the language to Spanish from the results?
ReplyDeleteI tried to add Cinematography and Original Music and something else too in full credits. But i can't write their functions. Because they are in "/$title_id/fullcredits" page and they are in same table elements. Its not looking like cast tables (so can't copy foreach loop for cast). It seems to write their functions impossible or hard :/
ReplyDeleteMovie language:
ReplyDelete$arr['language'] = trim(strip_tags($this->match('/Language.?:<\/h4>(.*?)<\/div>/ms', $html, 1)));
Now trying to add filming location, production in movie page. And cinematography, original music, producer, make up... in /fullcredits page.
ReplyDeleteIt seems like hard for me but i will try :)
$arr['language'] = trim(strip_tags($this->match('/Language.?:<\/h4>(.*?)<\/div>/ms', $html, 1)));
ReplyDeleteThat is not to learn the language of the movie?
I want to know how to get me back for information in Spanish
Dear Abhinay,
ReplyDeleteCan you write example code or function for $/fullcredits page (with foreach loop for multiple names and for array) please.
Thank you!
// Countries
ReplyDelete$arr['country'] = array();
foreach($this->match_all('/(.*?)<\/a>/ms', $this->match('/Country.?:(.*?)(<\/div>|>.?and )/ms', $html, 1), 1) as $m)
{
array_push($arr['country'], $m);
}
Does this script still work? I tried the example url and it doesn't return anything. I want to use this but also don't want to waste my time on something that doesn't work anymore.
ReplyDeleteAnonymous, this script does work on my side. Try using the Bing search function instead of the Google one.
ReplyDeleteHello Abhinay. Cast and media images not working. Please check.
ReplyDeleteHow hard would it be to add in the link for the movie trailer? I've been fiddling with it but can't quite get it.
ReplyDeleteUpdated to 2.3 and started to work again. Sorry for bothering.
ReplyDeleteHello,
ReplyDeletesince an Update (most likely yesterday) the IMDB Rating is not grabbed correctly anymore. Can you look after it?
Thanks!
Thanks for the tip Anonymous, I've fixed the rating issue.
ReplyDeleteHey Abhinay, had the same RegEx, but couldn't post it in here for the tags. Will you maintain this scraper the next month? I'd like to rely on it for a project :-)
ReplyDeleteIf so, please please keep the API for $scraper->scrapMovieInfo consistent. I grab the HTML with my own code.
BTW: Nice work. It's certainly no fun to write that many regexs.
$arr['rating'] = $this->match('/itemprop="ratingValue">([0-9].[0-9])<\/span>/ms', $html, 1);
ReplyDeleteCan you help me to generate the same XML file as you generated. I needed that code for my project www.flickonline.com. Thanks a bunch.
ReplyDeleteAnonymous, you can get the IMDb Web Service API code at: http://lab.abhinayrathore.com/imdb/imdbWebService.htm
ReplyDeleteHey, wow script works perfekt only 2 things i have problems.
ReplyDeleteIf i type this i get only "ARRAY" in my php file.
< div class="cast">
< div class="genres">
All others like $movieArray['rating'] oder votes or title,... works perfekt.
Please help me with this problem :)
Greets Daniel
Daniel, for converting arrays to comma delimited string, use this statement:
ReplyDeleteimplode(",", $movieArray['cast'])
Hey Abhinay Rathore
ReplyDeleteWOW works perfect now wiht the implode
Big thx for the help and the script
Greets Daniel
Hi Abhinay
ReplyDeleteI've been using your script for a while and everything was fine, but one or two months ago the movie poster stopped downloading. I've tried using your php proxy but to no avail. Could IMDB be blocking my website? Everything works fine from localhost. :(
Over
Pedro, it might be possible that IMDb is monitoring leechers pretty heavily.
ReplyDeleteWhat you can try is, in your CURL, try to spoof the user agent. You can search on Google for ways of doing it.
I haven't read all the comments, but one way to get the orginal title from non-us users it to:
ReplyDelete$arr['orginal_title'] = trim($this->match('/class="title-extra">(.*?)</ms', $html, 1));
Works for me.
Great class btw!
fyi 'Votes' don't work anymore.
ReplyDeleteTanx for a good script!
Votes issue fixed!
ReplyDeleteI got some improvements. Here is my edited scrapper:
ReplyDeletehttp://pastebin.com/4Bef6m1R
If i made a mistake please poke :)
Thanks a lot serhatyolacan,
ReplyDeleteI've included a couple of additions from your script, but I did not include the full credits as it contains some unwanted information and fetching one more URL would slow down the scraping a bit. But anyone who needs all that info can definitely profit from it :)
Hi Abhinay,
ReplyDeleteGreat script. Got it working right out the box. Thanks alot. I do however have 1 problem. There are some movies that your script wont scrape. They also wont work on your demo page.
I only have this issue with 2 movies:
tt0458339 : Captain America: The First Avenger
tt1201607 : Harry Potter: Deadly Hallows Part 2
Perhaps im overlooking something, but i can find what is going wrong. Tried both Google & Bing search. PHP is up-to-date. All other movies work fine, only these 2 wont.
Can you help me please.
Regards,
Wouter
Wouter, I don't see any problem with these titles. They are working just fine here.
ReplyDeleteHi Abhinay,
ReplyDeleteI have the same problem with these two titles
tt0458339 : Captain America: The First Avenger
tt1201607 : Harry Potter: Deadly Hallows Part 2
Only if i search over the imdb number eg. tt0458339
Regards,
Gajan
Gajan and Wouter,
ReplyDeleteThe problem with searching using certain title id's is fixed now. Instead of searching for complete url match, its now only searching for IMDb title id's and it can now capture all alternate urls for a title id search.
Also, added a new function "getMovieInfoById" to directly get results from IMDb if you know the id.
@Abhinay, hello sir,
ReplyDeleteI have a begginer question.
I want to print out the Genres, but when I use
echo $movieArray['genres'].'
;
It give me a text result: 'Array' and this is all I get.
I saw your previous replys, like to implode, but I don't know how to use it, can you give me please a php line wich will echo the specific the genres?
Thank you.
Anonymous,
ReplyDeleteCheck the usage example file link above.
For your particular problem:
echo implode(", ", $movieArray['genres']);
Thank you Abhinay Rahtore, echo implode(", ", $movieArray['genres']); worked briliant.
ReplyDeleteI will post here when I finish the project, maybe you will like the result.
Hello Ahmed. This function allows full movie url or title id. You will like it :)
ReplyDeletehttp://pastebin.com/sYmiPayy
Hello Abhinay.
ReplyDeleteI am student currently pursuing BTECH-CSE.
I am currently working on a "web crawling and scrapping in PHP" project and would like to have your valuable help and suggestions on the same. I would be grateful if you mail me at
sumitdeb1001@gmail.com
thanks in advance...
Small fix:
ReplyDelete$url= "http://www.google.com/search?q=on+site:imdb.com+" . rawurlencode($title);
Hello Abhinay ,
ReplyDeletei need your help :( can you help me? i have 2 page ,
in a first page i have a text filed + submit button , and in second page i have a Movie Title , Movie Rate , Movie Plot and ,,,
now how can i use your code , in first page , when i type imdb link and click in button , show information Title , Movie rate , and ,,, in second page
ps : i started to learn php in few past month
dear Abhinay plz accept my previews comment , and help me :( tnX
ReplyDeleteAmirReza, please go ahead and learn form handling in PHP. That'll help you with this and similar problems in future.
ReplyDeleteSome examples: http://www.w3schools.com/php/php_forms.asp and http://www.tizag.com/phpT/examples/formex.php
fyi 'Votes' don't work anymore.
ReplyDeleteFixed Votes and Plot issue!
ReplyDeleteHere is my result partially based on your script, http://plusimdb.com
ReplyDeleteThank you a lot.
Just a suggestion, you could use in your post a tag or something called:
ReplyDeleteLast update: Day-Month-Year
This way, we will know when you update anything, since you keep the same version at every modify made.
great work man .. thanks for the latest (sept 22 2011) fix....
ReplyDeleteHi. how can i grab the trailer link with this?
ReplyDeleteIts working great anyways :D
If someone need the Trivia from movies, here it is:
ReplyDelete$arr['trivia'] = trim(strip_tags($this->match('/Trivia<\/h4>(.*?)(|<span)/ms', $html, 1)));
here's my version
ReplyDeletehttps://github.com/Islander/Isy_IMDB
cheers abhinay :)
Hi Abhi..I would like to tell you that you have given me much knowledge about it. That was really a great script.
ReplyDeleteHi
ReplyDeleteCan i search by title-id ? How ?
please 1 sample code
search by title name :
http://lab.abhinayrathore.com/imdb/imdbWebService.php?m=Titanic&o=xml
search by title-ID name : ???????
please help thanks
Regards
M2H, this is a pretty versatile api so you can put title-id in place of the movie name:
ReplyDeletehttp://lab.abhinayrathore.com/imdb/imdbWebService.php?m=tt1234567&o=xml
This comment has been removed by the author.
ReplyDelete@Abhinay Rathore :
ReplyDeleteVery very good
Thanks a lot
King Regards.
again thanks :)
i saved ur given file as imdb.php,and then i created test.php:
ReplyDeletegetMovieInfo("The Godfather");
echo $imdb->arr['title'];
?>
please helpme anyone its not working....
sj, please refer this: http://lab.abhinayrathore.com/imdb/usage.htm
ReplyDeleteIts a great piece of code, one question, where do you get the autocomplete results from... can you put a suggest.php sample code?
ReplyDeletei've been able to pick up the results from google suggest but not particularly from imdb results.
ReplyDeleteAnonymous, I am working on an easy to implement code for pulling search suggestions from IMDb. I'll post the code on this blog pretty soon :) Stay tuned!
ReplyDeleteAnonymous, you can find the IMDb search suggestions API here: http://web3o.blogspot.com/2011/10/imdb-search-suggestions-with-jquery.html
ReplyDeleteCan you please provide the regular expression for locations
ReplyDeleteHey Abhinay, this is a really great script! It really helps me out :)
ReplyDeleteIs there a way to input multiple imdb urls and then get the data spit out into columns rather than rows?
Also, how can I get it to separate cast members by a comma rather than a new line (
)?
Thanks!
Thanks!
Is it possible to use the scrapper to get the top250 or the top action genre or the top 1990's movies?
ReplyDeleteHello,
ReplyDeleteThx for this job.
I use this scraper in my free software : xbne (http://passion-xbmc.org/downloads/?sa=view;id=23)
- Is it possible to add trailer ?
- Is it possible to add a tag in :
1) Event images for advertising picture. Ex :(<a title="Mickey Rourke at event...)
2) Thumbs format (Height > Width)
3) Fanart format (Whidth > Heigth)
Thanks..
Vincent
hi how do i display the top 10 releases from im db ? by the way nice and useful script the only problem is you have to specify a imdb id for it to work :) i would like to be able to get the top 10 box office movies in the uk can somebody explain how it can be done please because i am no good with object oriented programming :) thanks
ReplyDeleteMy request to google is being blocked.
ReplyDeleteThen use bing.
Delete1000 times thankyou. This is awesome and works great.
ReplyDeleteAll hail Rathore for he has given the gift of awesomeness!
Here is what I have achieved http://demo.plusimdb.com
ReplyDeleteAs of this morning it seems that the MPAA is broken again. Any help would be appreciated, I can't seem to figure it out.
ReplyDeleteOh and this script is amazing.
MPAA is still broken in the script but I was able to get it working. Thanks again for the great tool!
ReplyDeleteFixed MPAA rating issue.
ReplyDeleteAbhinay, thanks for making this freely available. Not only has your API been very reliably effective for my project, but I'm also just learning PHP, so have benefitted greatly from being able to look through how your API works. Thanks!
ReplyDeleteDear Abhinay
ReplyDeleteIs your movie catalog script open source? can i have this script.
I have many movies and i use some different software for manage the list but your script is very very perfect and usefull.
Thanks
Mostafa, my movie catalog script is not open source as of now! BUT time permitting, I am planning to launch this as open source :)
Deletejust noticed there is an updated script, but using either, i get no movie found using google.com , google.co.uk or bing.com.
ReplyDeletei am only scrapping the imdb number and the rating (to compare vs my own).
any ideas on a fix?
thanks
Hey, can I include this script in a commercial project? Donation to you will be given of course :)
ReplyDeleteYes you can definitely use this script for any project you wish to. Just beware of IMDb's screen scraping policies: http://www.imdb.com/help/show_article?conditions :)
DeleteThank you, you are great! Where can I donate?
DeleteYou can donate on the demo page (http://lab.abhinayrathore.com/imdb/) using PayPal.
DeleteAbsolutely Awesome. Many thanks for keeping this updated.
DeleteFantastic script. I seem to be having a php coding issue. Apostrophes are coded as ' in plot and tagline strings. I've tried several php functions to decode the string, but none work. Can you suggest a solution?
Deletethat should have been Ambersand-Pound-x27; but it was converted by your page.
DeleteScript is very good. How do i additionally get the User Reviews too.
ReplyDeleteHi!
ReplyDeleteIs there a way to integrate it in WORDPRESS?
Would be great.
Thanks!
Hi, could you please provide a full cast and crew information.
ReplyDeleteAnd also modified scrapper for persons?
Hi,
ReplyDeleteI think IMDB have changed something on their site. If there is no poster available it shows the wrong output. For example 'Pepsi Smash' - if you enter this on your demo page it shows facebook images.
http://lab.abhinayrathore.com/imdb/?m=pepsi+smash&submit=Search
DeleteYeap, the imdb changed something because some of the posters can be retrieved anymore, any updates on this? Thank you.
ReplyDeleteIt looks fine to me on this side. Can you send some example titles that are not working?
DeleteLook at the previous comment: http://lab.abhinayrathore.com/imdb/?m=pepsi+smash&submit=Search
DeleteSo Pepsi Smash & don't be tardy for the wedding
DeleteSomehow searching for "Pepsi Smash" returns me "Basic Instinct" here in USA. What country are you located in? It's possible that IMDb is forwarding to your locale site.
DeleteI'm using http://akas.imdb.com
DeleteBut the problem is with tv series and documentaries. Last week it pulled a poster from the TV series, now it grabs /images/widgets/facebook_share.png
I'm from The Netherlands - so it may be possible that it searches on the UK site. But if you use akas.imdb.com it grabs from US site? But there is a change on the IMDB site.
The Release Date is apparently taken from the main page, which may display the release date in another country instead of the original release date.
ReplyDeleteExample: http://akas.imdb.com/title/tt0219400/ shows the Turkey release date on the main page.
I guess the Release Date would have to be calculated as the earliest date from the releaseinfo page.
I have also encounter a other 'small' thing. It now also grabs the line 'See full cast and crew' with the stars. This is also new. So you have the poster problem and this thing.
ReplyDeleteHi!
ReplyDeletePls help...
I only need to media images. How can I retrieve it? imdbimage.php might use.
thanks
Best regards
Gergő
How can i use it, if i have a database with 15.000 movies with the link of imdb and i want to copy the data in my own database??
ReplyDeleteand if i want to add a new movie, how can i do to only put the url of the movie and this script scrap all the data and save into database?
Hi. I am using the imdb.php file and the test file you have for demonstration but it reutrns an error no matter what movie I search for. It seems like the error is happening in the "geturl" function. Can anyone help me figure this out? I am only an intermediate at best with php so please be patient with me! :)
ReplyDeleteAwesome script, thank you!!
ReplyDeleteWhy don't use the "combined" (http://akas.imdb.com/title/tt0120338/combined)
There is important information like:
Original Music
Cinematography
Film Editing
Production Companies
Regards!
It was working perfectly but a few days ago the image posters url's stopped working.
ReplyDeleteIs there a fix for this?
Dear Abhinay Rathore,
ReplyDeletefirst of all, thank you so much for your script. It works & it is flexible.
I need your help. Can you please add info for whether the title is a movie or documentary or tv series?
thanks in advance, BR
CAGRI
Dear Abhinay Rathore and other friends,
ReplyDeleteWhy can not I get a "-" with the code below even if a movie doesn't have a original name in IMDB? What am I doing wrong? thanks in advance, BR
CAGRI
if (!is_null(trim($movieArray['original_title']))) {echo $movieArray['original_title'];} else {echo '-';}
Try if(!empty(...))
DeleteDear Abhinay Rathore
DeleteI also tried if(!empty(...)). Result is same. I can't get "-" for the movies without any original name.
do you have any other recommendation?
thanks in advance, BR
CAGRI
sorry for my false post, it is working with empty(...).
Deletethanks so much
Dear Abhinay Rathore and other friends,
ReplyDeletefor the movie tt0078771 - IMDB Title:Love on the Run. it is an French movie. Original name is:L'amour en fuite
I use getMovieInfoById function.
My strange detection is:
I get tt0078771's IMDB title as original name & I get original name as IMDB title. I also checked IMDB page (everything seems OK). What can be the reason for this movie's titles place replacement case?
thanks in advance, BR
CAGRI
please ignore my previous question.
Deletefor the new situation, my question has changed:
I use getMovieInfoById function in imdb class. I live in Turkey. I don't use akas.imdb.com. for the movie tt0078771, I can not get IMDB title that is Love on the Run.
what I get is:
IMDB ID TITLE ORIGINAL TITLE USA TITLE
tt0078771 L'amour en fuite - Love on the Run
but it should be as below:
IMDB ID TITLE ORIGINAL TITLE USA TITLE
tt0078771 Love on the Run L'amour en fuite Love on the Run
in demo page, if I input tt0078771 then I get title & original title ok but there exists no USA title in this case.
Can you help? I am very confused.
thanks in advance, BR
CAGRI
UPDATE Note: in order to echo also 1 win and/or 1 nomination
ReplyDeleteI changed these and it works:
$arr['awards'] = trim($this->match('/(\d+) win/ms',$html, 1));
$arr['nominations'] = trim($this->match('/(\d+) nomination/ms',$html, 1));
what i've changed:
wins to win
nominations to nomination
try with tt0115561. it has 1 win only.
BR
CAGRI
Dear Abhinay Rathore and other friends,
ReplyDeleteupdate requirement:
if the movie had only 1 oscar, script echo empty. oscars to oscar does no effect because IMDB writes "Won Oscar." if there is only 1 oscar. Since it doesn't use numerical 1 value, result comes empty.
can you please update this issue
thanks in advance, BR
CAGRI
Hi Abhinay,
ReplyDeleteIf you update the script this year please think in the "combined" option (http://www.imdb.com/title/tt0088247/combined)
I want to put some important information like:
Original Music
Cinematography
Film Editing
Production Companies
I tried to implement myself with no luck, im very bad at scraping.
Regards and happy new year!
There still something wrong retrieving Posters. It's not a mistake in the script - but it's something in the website from IMDB. For example if you search for Men Are Such Fools (tt0030433) you will get this URL: http://lab.abhinayrathore.com/imdb/imdbImage.php?url=http://b.scorecardresearch.com/p?c1=2&c2=6034961&c3=&c4=http%3A%2F%2Fwww.imdb.com%2Ftitle%2Ftt0030433%2F&c5=c6=&15=&cj=1
ReplyDeletelooks like the script got broken by updates to imdb, any plans to update?
ReplyDelete