Thursday, November 25, 2010

FREE! ASP.net/C# IMDb Scraping API


In extension to my previous IMDb Scraping API in PHP, I have converted the code to ASP.net/C# based IMDb scraper. In my de facto development style, I have kept the code pretty simple and concise. For more details on the movie information it scraps, please refer to my previous post. I don’t have an IIS server running to demonstrate the code but it is functionally similar to my PHP Scraper API so you can test it on the link below. And above all... it’s FREE!!!

Test the Scraper API: http://lab.abhinayrathore.com/imdb/

Download C# Class file: http://lab.abhinayrathore.com/imdb/imdb_asp_csharp.htm
(Add this class file to your project and rename the namespace accordingly)
To convert this C# code to VB.net use this tool:
http://www.developerfusion.com/tools/convert/csharp-to-vb/

Fork it on GitHub: https://github.com/abhinayrathore/ASP.NET-IMDb-Scraper

Download DLL file: http://lab.abhinayrathore.com/imdb/IMDb.dll
(Copy this DLL file to your project folder and add it to the project references)

How to use this class:

  1. Include the class on your ASP page.
  2. Instantiate the class: IMDb imdb = new IMDb("The Godfather", true);
    (Second parameter is an optional Boolean value for scraping extra movie information.)
  3. Access the movie information using public variables: imdb.Id, imdb.Title etc.

It’s been a while since I worked on ASP.net, so if you have any improvements or suggestions, do let me know :)

116 comments:

  1. Nice stuff, cool thing that you're constantly improving your projects

    ReplyDelete
  2. This is cool man ..

    ReplyDelete
  3. Hi there i have downloaded your csharp class and converted it to vb.net. when i load the class it errors on 'HttpContext.Current.Server.UrlEncode(MovieName)'

    httpcontext not defined.

    Was wondering what im supposted to define it as?

    Thanks

    ReplyDelete
  4. Try adding a reference to System.Web in your project or use System.Web.HttpContext.Current.Server.UrlEncode(MovieName);
    I guess you are getting this error because your project is not a "web application" project in Visual Studio.

    ReplyDelete
    Replies
    1. please send me csharp class and dll at my mail id kapil.soni99@gmail.com

      Delete
  5. thanks i will try today.

    ReplyDelete
  6. Found some code that converts strings into url form to replicate what the system.web.httpcontext does with web apps.

    thanks for you help and the source code much appreciated.

    ReplyDelete
  7. Wow!!! Amazing!!!
    It works fast!
    Say.... Do You have something like this but on the cast screen(cast, director, etc.)?

    ReplyDelete
  8. Very nice :) If only I would have found this sooner as I made my own from scratch!

    ReplyDelete
  9. Im having a little trouble using your imdb class/dll in a vs2010 winforms project, (im working on a personal movie database) and i was using a previous imdbservice (http://imdb.codeplex.com/) but with the new changes on the imdb webside it doesnt work anymore (at least not the scraping part, anyway, it still shows me the the matches, and i adapted your class to work with the first part of the old code, but im getting a problem now..

    Imagine that we have multiple matches in the treeview on one ur code searches every one of them, BUT when i research for another movie (repopulating my treeview object) your code doesnt works with the new IMDBid's... and i wonder why, executing code step by step, im getting the timeout at the (datastream = client.openread(url) Function) Code:

    private string getUrlData(string url)
    {
    WebClient client = new WebClient();
    Stream datastream = client.OpenRead(url);
    StreamReader reader = new StreamReader(datastream);
    StringBuilder sb = new StringBuilder();
    while (!reader.EndOfStream)
    sb.Append(reader.ReadLine());

    client = null;
    datastream.Flush();
    reader.Close();
    return sb.ToString();
    }



    as you can see i tried to close/reasign all object but that doesnt change anything

    need help please

    ReplyDelete
  10. Mendor,
    I didn't understand what new IMDb id's are you talking about. Can you give me some examples. As far as I know the id's are same.

    This app is only designed to scrap the movie information page and not the IMDb search result page. For a quick search, I am using Google I'm Feeling Lucky feature. Using Google you only get one result back (best match).
    The only important function you need in this library is parseIMDbPage. So you can modify the constructor according to your needs and link it directly to correct IMDb id url.

    As for the getUrlData function, you can try using HttpWebRequest and HttpWebResponse classes: http://www.west-wind.com/presentations/dotnetwebrequest/dotnetwebrequest.htm

    ReplyDelete
  11. Thx but when i create a class (vs2010) and past your code it says: "The name 'HttpContext' does not exist in the current context" i have a reference to system.web, do you know how to fix this?

    ReplyDelete
  12. Kay nevermind i fixet it with: HttpWebRequest httpreq = (HttpWebRequest)WebRequest.Create(MovieName);

    ReplyDelete
  13. sorry to bother you again but how do i get the imdb.cast to a string? ( yea im still learing )

    ReplyDelete
  14. Robin,
    You can convert a C# ArrayList to a comma delimited string like this...
    String.Join(",", imdb.Cast.ToArray())

    ReplyDelete
  15. Hi,

    I'm using VB.Net with VS2010. I have a referece set to System.Web. I converted the code to VB using the link you suggested. I compile the program just fine but when I go to run it Visual Studio complains that the HttpContext.Current.Server.Urlencode(MovieName) returned NullReferenceExpection..Object reference not set to an instance of an object. Please let me know your thoughts as I'm stumped.

    Thanks!

    ReplyDelete
  16. Hi you did a great job here!
    thanks!

    Im having a little problem, the parseIMDbPage
    i is hanging it just try to parse and stop responding all together

    any help will be great!

    ReplyDelete
  17. Ian,
    You might be getting the Urlencode function error because your project is not a web application project. I guess you'll have to instantiate the HttpContext class in your code and then use it. You can even skip url encoding if you want to, it wont affect the results.

    ReplyDelete
  18. Enzima,
    I am having no issues with this function on my side. I guess you'll have to use the debugger on your side and detect what part of the code is hanging.
    If you find any bugs/flaws, do let me know, I'll update my code as well.

    ReplyDelete
  19. it is really hard to tell where it is even in debug.
    i wrote a console app and i have a list of movies i want to search for.
    I start the application everything is good until stop responding.
    no errors nothing.
    i pause in debug mode and nothing. i can't find where it pause.. it really strange

    i will continue debugging it if i find the issue i will let you know.
    any help will be great

    ReplyDelete
  20. ok i find where it hang

    Here
    Year = match(@....

    and this movie name: Puffo the Clown
    http://www.imdb.com/character/ch0131222/

    ReplyDelete
  21. Hi,

    I'm trying to the scraper work on series as well but I'm getting mixed results.

    I'm scraping:
    "Avatar the Last Airbender (#1.12) (TV)"
    in both the compiled dll and the PHP scraper. PHP gives me the episode and the dll returns de series main page (and not the episode information).

    What I'm I doing wrong?

    ReplyDelete
  22. Hi.

    First of all...great work on the api. :)

    Now to my problem...

    I have a webform with a single textbox in which I want to display the title of the movie. I put the .cs file and the .dll file in the bin folder. When I start debugging the page opens, starts loading, but never opens.

    Could you, please, tell me if I overlooked something?

    ReplyDelete
  23. Hi again.

    I think I have figured out what is is wrong with the dll:

    This piece needs is wrong:
    //constructor
    public IMDb(string MovieName, bool GetExtraInfo = true)
    {
    string url = "http://www.google.com/search?hl=en&btnI=I%27m+Feeling+Lucky&q=imdb+";
    url += HttpContext.Current.Server.UrlEncode(MovieName);
    string html = getUrlData(url);
    parseIMDbPage(html, GetExtraInfo);
    }

    And needs to be like this:
    //constructor
    public IMDb(string MovieName, bool GetExtraInfo = true)
    {
    string url = "http://www.google.com/search?hl=en&q=imdb+";
    url += HttpContext.Current.Server.UrlEncode(MovieName);
    url += "&btnI=I%27m+Feeling+Lucky"
    string html = getUrlData(url);
    parseIMDbPage(html, GetExtraInfo);
    }

    This leads to tthe following problem :). Series title are scraped are scraped now but they are still in HTML format.

    So:
    ""Avatar: The Last Airbender" The Storm"
    Instead of:
    ""Avatar: The Last Airbender" The Storm"

    I'm sure I can figure out how to deal with that but I could use your advice.

    Sincerely,

    Martien

    PS. for those that have trouble with the "url += HttpUtility.UrlEncode(MovieName)" line read this:
    http://stackoverflow.com/questions/4967051/why-cant-i-find-or-use-urlencode-in-visual-studio-2010

    And if that doesn't work add "System.Web" to the project in the Solution Explorer, like this:
    http://img716.imageshack.us/i/addreference.png/

    ReplyDelete
  24. Hi,

    Thank you so much for your great work on the api :)

    I have a few problems here. Can I use this api in Windows Phone application? Because I cannot add a reference to System.Web since it's not a Windows Phone library.

    Sincerely,

    Veri

    ReplyDelete
  25. Veri,
    Sorry to say that I have never worked on a Windows Phone app, so I don't know much about it.
    I would say try to find some way to get the html content from the IMDb page and I think the only other major dependency in this API if the regular expressions which I think should be there for the phone platform.

    ReplyDelete
  26. Thanks for This Bit of precious Code
    It will inspire me for a scraper for allmusic.com !
    and a scraper for everything ....
    need to scrap scrap scrap !!

    Thanks man !

    ReplyDelete
  27. I am glad I found this before I dug into doing it myself.
    Got it implemented fine.
    For some reason, the MPAA Rating is often not scraped correctly.
    Any change this is easy to fix?

    ReplyDelete
  28. Ian, can you please give me some sample titles where the MPAA rating is not scraped successfully. I'll try to look into it :)

    ReplyDelete
  29. Here are several where the MPAArating is wrong
    Super
    A Better Life
    Legend of the Fist: The Return of Chen Zhen
    The Pruitt-Igoe Myth
    Blue Crush 2

    These have extra characters in the plot

    Blue Crush 2
    Black Cat Run

    The approach to getting to the IMDB entry is ok but sometimes the selected movie is wrong (I didn't expect prefection)

    For instance searching for Cat Run always returns the older movie Black Cat Run and a search for Arthur (to be released) always returns Arthur (from 1995)

    ReplyDelete
  30. Ian,
    All the movies that you have listed above don't have MPAA rating on the page (small image below the movie title). Compare these pages with other movie pages with MPAA rating to notice the difference.

    Also, the title search is done using Google's "I'm feeling lucky" feature. So if it does not return the exact title, try searching with as much info as possible like adding movie year or actors in the search.

    ReplyDelete
  31. Is there a way to know that there is no Mpaa rating? Right now I get a string from somewhere else on the page.

    I would love to add the year to the search but I don't have it.

    ReplyDelete
  32. Ian, thanks for pointing out the bug. I've fixed it and now you should get an empty string if there is no Mpaa rating on the page.
    (Get the latest version from the link above)

    ReplyDelete
  33. Thanks. I also improved my search for the correct movie by adding (currentyear OR lastyear) to the search. Since I am looking at Trailers that the dates are good guesses.

    ReplyDelete
  34. Tried downloading the update but the date in the file hasn't changed. Where was the change in the code?

    ReplyDelete
  35. Go to the script page and hit refresh button.

    ReplyDelete
  36. Got it! Thanks
    I added a couple of small changes
    1. a do nothing constructor
    2. a MovieLookup function that accomplishes what your constructor does and changed your contructor to call MovieLookUp

    ReplyDelete
  37. To get around the error with

    HttpContext.Current.Server.UrlEncode(MovieName)

    I used

    url += HttpUtility.UrlEncode(MovieName);

    ReplyDelete
  38. I was wondering if you would consider scraping for the ReleaseDate?
    Also I noticed some extra text in the plot field. (example: Authur 2011). On the web page the plot comes up empty. the query returns "A drunken playboy stands to lose a wealthy inheritance when he falls for a woman his family doesn't like."

    ReplyDelete
  39. Sorry just realized the it is already there. For some reason I am not scraping it successfully. Will run a more comprehensive set of test and will communicate back. I am scraping Trailers so the IMDB page may not be complete. Thanks again for a very consise API.

    ReplyDelete
  40. I ran into an interesting movie that the scraper can't seem to locate

    Square Grouper The Godfathers of Ganja

    The web scraper returns not found but the C++ scaper seems to hang

    ReplyDelete
  41. The scraper hangs when trying to get the Title. Since the ID is "", I just put a if ID == "" to exit the scraping before it gets into trouble.

    ReplyDelete
  42. New Bing search doesn't find the correct entry for "Cooper and the Castle Hills Gang". Surprisingly, IMDB search returns only one choice and it is the correct one

    ReplyDelete
  43. I am not really satisfied with the search results from Bing. What about using the a google search with site:imdb.com?

    ReplyDelete
  44. Ian,
    I am working on converting the script to scrap Google Search results page to get the IMDb urls. Somehow Bing fails on some of the non-popular titles but Google search is definitely better at this job.
    We'll keep scraping the Google Search result page until they stop automated script access to that as well.
    Also, scraping the results from Search result page is more efficient as there is more probability of finding the url then using the "I'm feeling lucky" redirect.

    P.S: I've already converted the PHP scraper to scrap Google, so you can give it a try :)

    ReplyDelete
  45. I noticed how you were scraping for the IMDB pages. Great Idea and should be robust.

    ReplyDelete
  46. Hi Abhinay,

    Like you I am looking into alternative's. DuckDuck go has a nice feature, can "!bang" search results, which allow you to use it to make use of the site specific search by using !imdb or !google.

    Now I know the former isn't much help since, IMDb own search default search is less accurate that it's advanced search, which can't be "!banged" (It's actually a pretty solid search, but I can't come up with a way to verify that the first search result is the right one...).

    But using it to search Google could give you an advantage perhaps...

    If you come up with interesting result, do share. I shall do the same.

    Martien

    ReplyDelete
  47. Modified the scraper to get IMDb url from Google Search Result page.

    ReplyDelete
  48. Hey there,

    I got this to work, and made it find the ID of a hard coded movie. But it takes about a minute.

    Im in South Africa, line speed is 384kb, could that be an issue?

    How long does it normally take?

    Tx

    ReplyDelete
  49. Edit:
    It happens when it gets the info from imdb eg:
    IMDb imdb = new IMDb("The Godfather");
    Once that is retrieved, it using the data happens instant.
    Ideas?

    ReplyDelete
  50. Is it possible to also scrape the country information? Liek for "The Dark Knight" -> USA/GB ? That would be nice!

    Cheerz, Michael

    ReplyDelete
  51. Ah nevermind, have it ;) just copied the genre variables and patch the code to match the country ;)

    ReplyDelete
  52. Hey, thank you so much for this! it's a wonderful complement to my peronal project!
    I have one huge problem that i'm trying to solve though. The IMDB title always comes in my mother language Portuguese (PORTUGAL) and i cannot seem to make it give me the titles in any language that i choose. I've tried to change the language on all my browsers and it still doesn't work. Is it getting the language from the computer or so? How can i change this? I tried looking at the code for any routine that i could change, but i can't seem to find it.
    Thanks!

    ReplyDelete
  53. Inquisitor,
    Here's your answer: http://www.imdb.com/help/show_leaf?akas
    IMDb has a separate site for Portugal: http://www.imdb.pt

    In the next version I'll try to include the feature to scrap AKAS titles as well.

    ReplyDelete
  54. This comment has been removed by the author.

    ReplyDelete
  55. Hello Abhinay Rathore,

    Thanks for the quick reply! I'm sorry i didn't search enough, i thought it was not directly connected to IMDb. You're a genius for creating this, it works almost perfectly! I'm getting "trash" off of some words, mainly the ones translated to Portuguese, because of certain words (cão, for example, which means dog!) Also just realised that "" also come with trash (For example The Godfather's Storyline "...as "Don" Vito...") , must be missing something! I'll Sherlock Holmes it out!
    Big thank you and best of luck!

    ReplyDelete
  56. How do I remove the ascii correctly?

    imdb.Plot = imdb.Plot.Replace("'", "'");

    I'm using this method.

    It isn't very reliable, is there a better way?

    ReplyDelete
  57. This comment has been removed by the author.

    ReplyDelete
  58. What i did was i recovered the chars that belong to the coding of the letter and build a pseudo-library class that replaces my string according to those chars, so for example if its an é that is coming, it will appear in a specific code, this is the pattern for letters:
    &[char]acute;
    [char] is replaced by the char you want, if its an é like i said before, then its e that you want there

    ReplyDelete
  59. Just notice a small issue with scaping of MPAA ratings. On IMDB, the ratings for PG-13 and NC-17 actually show up in the scraper as PG_13 and NC_17.

    ReplyDelete
  60. Doesn't Google ban us if we make loads of call using a automated script.

    ReplyDelete
  61. the_mr_hb@hotmail.comMay 16, 2011 at 8:07 PM

    Hi. Nice done, sir :)
    But, for some reason it only grabs the thumbnails of the "MediaImages"
    That would be the 100x100 images that are shown on imdb. =/

    ReplyDelete
  62. the_mr_hb@hotmail.comMay 16, 2011 at 8:34 PM

    Problem solved :)
    I just changed: list.Add(m.Groups(1).Value) to
    Dim s As String = m.Groups(1).Value.Substring(0, m.Groups(1).Value.IndexOf("_")) & "_V1._SX640_SY427_.jpg"
    list.Add(s)

    I've also added progress report and status report.
    So you can get info about what being done and how far thou
    the total process it is ( in % ) :)

    ReplyDelete
  63. is there anyway to automatically fetch the latest releases or upcoming movies instead of this searching stuff??

    ReplyDelete
  64. Gitesh, this scraper is only meant to search and fetch movie information. You request is out of the scope for this scraper. You can write a similar scraper to get latest or upcoming movie list.

    ReplyDelete
  65. Hi I cant do it work :(
    Someone can send me a c# 2008 sample with this class
    Thanks

    ReplyDelete
  66. I get the following error at this line
    Stream datastream = client.OpenRead(url);
    ----------------------------------------------
    {"No connection could be made because the target machine actively refused it 74.125.93.103:80"}

    What am I doing wrong?

    ReplyDelete
  67. Hi, IMDB Rating is no longer grabbed. Any chance you can fix it?

    ReplyDelete
  68. hi, i can not get the votes, can u check it ???

    ReplyDelete
  69. Fixed: Plot and Votes issue. Added random IP Address and User-Agent to Webclient request.

    ReplyDelete
  70. hi,

    thanks for fixed.

    but where did u fix it in this url:

    _http://lab.abhinayrathore.com/imdb/imdb_asp_csharp_2.htm

    I tried this class but its still missing votes and plot.

    how about Languages, u want to add it in this class?

    any solution for original title. Im living in Finland and the title always return in Finnish (I dont want that). How about convert all Title to English ???

    Some specific Title can not display like: WALL·E. can you fix it, too...

    ReplyDelete
  71. I'm the first to admit I'm not a sophisticated code designer or coder, so please forgive me if the answer to this is "obvious", but when I try to compile the VB code, I get the following errors:

    Warning 1 Resolved file has a bad image, no metadata, or is otherwise inaccessible. Could not load file or assembly 'C:\Users\Alan\Documents\Visual Studio 2008\Projects\imdbtest\imdbtest\IMDb.dll' or one of its dependencies. This assembly is built by a runtime newer than the currently loaded runtime and cannot be loaded.

    Warning 2 The referenced component 'IMDb' could not be found.

    Error 3 Name 'imdb' is not declared. C:\Users\Alan\Documents\Visual Studio 2008\Projects\imdbtest\imdbtest\Form1.vb

    Error 4 Type 'IMDb' is not defined. C:\Users\Alan\Documents\Visual Studio 2008\Projects\imdbtest\imdbtest\Form1.vb


    Can someone tell me how to resolve this?

    TIA...

    ReplyDelete
  72. hi again Abhinay, I want to add e lable to translate plot summery from English to another (for example Persian). how can I do if possible?

    ReplyDelete
  73. Hello, I am having trouble making this work in VS 2008 as a C# Windows Form Application. I've saved the DLL to my project folder and added it as a project reference. The DLL appears in my solution explorer but with a yellow caution sign. I have added the I get the error "The referenced 'IMDb' component could not be found." Any help on how I can make this work as a Windows Application instead of an ASP.Net web page would be greatly appreciated. Thanks

    Jon

    ReplyDelete
  74. Abhinay. I have tracked down an issue with getUrlData.

    The error I am getting is "Exception: Unable to read data from the transport connection: The connection was closed."

    I beleive the problem is caused by
    Random r = new Random();
    //Random IP Address
    client.Headers["X-Forwarded-For"] = r.Next(0, 255) + "." + r.Next(0, 255) + "." + r.Next(0, 255) + "." + r.Next(0, 255);
    //Random User-Agent
    client.Headers["User-Agent"] = "Mozilla/" + r.Next(3, 5) + ".0 (Windows NT " + r.Next(3, 5) + "." + r.Next(0, 2) + "; rv:2.0.1) Gecko/20100101 Firefox/" + r.Next(3, 5) + "." + r.Next(0, 5) + "." + r.Next(0, 5);

    ReplyDelete
  75. Abhinay
    Just one more small gotcha.

    The MpaaRating for PG-13 and NC-17 come back as PG_13 and NC_17.

    ReplyDelete
  76. Thank you for this scraper, I would have spend many days getting this to work!!

    I have a fix that look to be fixed in the online test but not in the code:
    Plot = match(@"<_p itemprop=""description"">(.*?)" --> (|<a href)", html);

    I have also tried a small change to see why it sometimes fails to get the movie from with the search engines or the imdb site, I think, but could be completely wrong, that it might have to do with invalid ip addresses. I have made the following change and will do some more testing.
    Changes to the first octet in the ip address

    int a1 = r.Next(1,254);
    if (a1 == 10 || a1 == 127 || a1 == 169 || a1 == 172 || a1 == 192 || a1 == 224 || a1 == 240) a1++;


    I also had to add a wrapper to fix html escaped characters in the fields: Title, OrigianlTitle, Plot, Storyline, and Tagline to fix these characters. Code below, I am not a good RegEx programmer so it probably could do with some of your finesse.

    Regex regex = new Regex( @"\&\#x[0-9]{2};");
    MatchCollection matches = regex.Matches(s);
    foreach (Match match in matches)
    {
    s = s.Replace(match.Value, ((char)Convert.ToByte(match.Value.Replace(@"&#x", "").Replace(";", ""), 16)).ToString());
    }

    Once again, Many many thanks for this code.

    Arvid

    ReplyDelete
  77. I have found a better way to fix up any encoded characters in the data fields, which works for all the other encoding's as well:
    System.Net.WebUtility.HtmlDecode();

    Arvid

    ReplyDelete
  78. I noticed inconsistent data being returned and tracked it down to the ip address used in the header for the download from imdb. I found some movies, like 'All She Can (2011)' would come down as 'Benavides Born' in Australia using a browser. So I changed the random picking of the first octet to one of the /8 ranges for USA and this looks to have fixed the issue.


    int[] ipRangesUS = { 3, 6, 11, 13, 15, 16, 18, 20, 22, 26, 30, 33, 40, 48, 52, 56, 73 };
    int a1 = ipRangesUS[r.Next(0, ipRangesUS.Length)];

    ReplyDelete
  79. Nice library, thanks.
    How to get italian image path?
    imdb.Languages = ..

    ReplyDelete
  80. Where are you hosting your website for your FREE! ASP.net/C# IMDb Scraper ??? Because on my my hoster i cant run this code. Thanks for the information ...

    ReplyDelete
  81. Hello Abhinay,

    Great scrapper my only question is why there is a need to have random IPs in a header?
    Is it because imdb needs this header?

    ReplyDelete
  82. Abhinay
    LOng time users of you imdb scraper class. Recently, the class as published in the C# listing doesn't seem to return the correct values. Although you demo page returns the correct data.

    For instance for the movie Hammer of the Gods (2013)your web site demo returns the correct data but the C# code returns a release date of 2009 (instead of 2013) and a rating of 3.0 instead of ""

    Take Care

    ReplyDelete
  83. It looks like IMDB changed their page layout in late November so the scraper may be partially broken.
    For instance the title field used to be IMDB - title. Now it is title - IMDB.

    ReplyDelete
  84. Abhinay
    I ran into a unique issue this past week using the C# scraper. The movie involved is Ditch (2013).
    The google serarch returns a page where Ditch (2013) is listed at the top of the page. Unfortuantely the scraper doesn't find a url on that page I am not sure I understand why it doesn't find the url (first problem). The scraper drops down through the other search engines and finally returns a search page result page where The Ditch (2010) is listed first. The fact that it returns the wrong movie is a problem but not the main issue here is that the regex that tries to locate the year hangs hard (second issue).

    Take Care.
    Year = match...

    Take Care.

    ReplyDelete
  85. Abhinay
    I saw you posted a new version. I am still crashing hard in the match subroutine called from this line
    Year = match(@"<titl ....
    The movie involved here is Top Gun

    ReplyDelete
  86. Hey Abhinay

    Props to you for this awesome tool! It works perfectly!
    Also thanks for keeping this up to date and for making improvements...with this last update everything runs 10x as fast!

    Keep up the good work!

    Grtz

    ReplyDelete
  87. Tnx for your class... It's great.

    It's the first time I heard of "Regex"... I managed to get IMDB search results in my program by dirty string manipulation... Now I want it to do the "Regex" way. But I can't figure out a working Regex string. Maybe u or someone can help me?

    Grz.
    Lone



    ReplyDelete
  88. Hi,

    having some troubles with your class...
    Somehow once in while my program gets in a 'not responding state'.
    If found out that the match function after : Rating = match("ratingValue"">(\d.\d)<", html) was giving me the problem.
    I changed it in:
    indexA = html.IndexOf("ratingValue"">", html.IndexOf("ratingValue"">") + 13) + 13 'second occurence
    IndexB = html.IndexOf("<", indexA)
    Rating = html.Substring(indexA, IndexB - indexA)
    and the problem never occured again...

    Do you have any clue?

    Grz.
    Lone

    ReplyDelete
  89. cast,writers,directors and stars not working properly. Appreciate if you can update it.

    ReplyDelete
  90. I am working on to fix all the issues right now. Stay tuned for some nice enhancements coming soon!

    ReplyDelete
  91. THX for the fix!! I'm going to update now...

    ReplyDelete
  92. Thanks Abhinay. I haven''t tried out the Feb 20 version yet but will do later tonight. I was able to get around my problem with getting the year by first triming the html to the text that contained the year then finding the year per your regex. This technique was used a few post up.

    ReplyDelete
  93. Seems that posters are not being retrieved and the Starring gives html chunks back...

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
  94. I got it working in WPF with vb.net but it's super slow. Sometimes it takes so long that it times out and crashes the application. I'm new to programming so I'm not sure what's causing it.

    ReplyDelete
  95. :( i m not able to use this API

    i m using asp.net 3.5 and that DLL is not able to process and giving error as

    Error 1 Could not load file or assembly 'IMDb' or one of its dependencies. This assembly is built by a runtime newer than the currently loaded runtime and cannot be loaded.

    ReplyDelete
  96. Hi,,

    when ever i search a movie using
    var imdb = new IMDb("Cowboys & Aliens");
    it results in a wrong movie info

    ReplyDelete
  97. Hello
    I have noticed a small issue with the scraper. With the movie Monsters University (2013), the MPAA value from the scraper is "" but the movie actually has a MPAA of "G".

    I looked at the imdb web page and the problem is that the parental rating is just listed as Certificate G. Perhaps because the movie is from the UK.

    ReplyDelete
  98. Abhinay,

    Outstanding work! Thank you so much for the information.

    I am currently using the IMDB class in C#. I saw a post up above on why the class returns the wrong movie sometimes. Here is a quick fix to that:

    Add a text box for the movie and one for the year and the poll the IMDb class this way:

    IMDb imdb = new IMDb("'"+ txt_Name.Text +"' " + txt_Year.Text, true);

    This encapsulates the name in a single quote and adds the year on the end.

    I digress, my original question is a bout the movie posters. How, in C# can I pull in the movie poster?

    I have the variable imdb.Poster populating with a web address and I am unable to display the picture. Is there a way to capture this picture and either 1) display it from the webpage or 2) save it to the hard drive and display it from there?

    Thanks in advance,

    -Jeff

    ReplyDelete
    Replies
    1. Never mind, found it. For future reference and if you are using C#:


      var request = WebRequest.Create(imdb.Poster);

      using (var response = request.GetResponse())
      using (var stream = response.GetResponseStream())
      {
      pic_Cover.Image = Bitmap.FromStream(stream);
      }

      Add in:

      Using System.Net;

      pic_Cover is the name of the picture object I am using

      -J

      Delete
  99. Very nice API! I have been using it for about 2 years. Why did you remove imdb.Stars? I was using it... :/

    ReplyDelete
  100. any way of adding progressbar when searching for movie

    ReplyDelete
  101. hi all,

    i am pretty new to .net platform but interested in working on this scrapper.
    can somebody save working project file ( vb.net or asp.net ) post a link for download.

    i have used php scrapper and hugely modified for my website. http://www.clickcinema.in/

    would love to start with .net platform and move to My-SQL DB at backend.

    Regards'
    Arun Kumar

    ReplyDelete
  102. Some movies (Monsters University for instance) use certification instead of MPAA rating.
    I can't post the code because it is interpreted as html but you can model it after the MPAA line.
    if (MpaaRating.Length == 0)
    {
    // If MPAA rating is not set check the Certification
    MpaaRating = match(@"Certification: ***** certificates=us:g"">USA:(G|PG|PG-13|PG-14|R|NC-17|X)", html);
    }

    ReplyDelete
    Replies
    1. just replace the ****** with the html string to look for. Use Monsters University as an example

      Delete
  103. I have noticed and interesting problem. If you look for the movie Shepard & Dark, your test page returns the correct movie. WHen I use the C# version, Google returns the correct page and the movie is the first one listed but it doesn't locate the movie url. It eds up going onto BIng which doesn't return the correct movie.

    ReplyDelete
  104. It is just what I was looking for, but my problem is: I work with Delphi. So, does anybory knows how to convert class file to Pascal/Delphi. I think DLL file will work fine.
    Thanks a lot.

    ReplyDelete
  105. Hi Abhinay Rathore,
    I am not able to download class and dll . can you send me both things at my mail . this is my mail id : kapil.soni99@gmail.com. Please send me asap.
    I am waiting..
    Thanks

    ReplyDelete
  106. some movies don't work - example - tt3549656

    ReplyDelete
  107. For those trying MPAA to get certifications...

    Replace (in the class):

    Public Property MpaaRating() As String
    Get
    Return m_MpaaRating
    End Get
    Set(Value As String)
    m_MpaaRating = Value
    End Set
    End Property
    Private m_MpaaRating As String

    To:

    Public Property MpaaRating() As ArrayList
    Get
    Return m_MpaaRating
    End Get
    Set(Value As ArrayList)
    m_MpaaRating = Value
    End Set
    End Property
    Private m_MpaaRating As ArrayList

    And:
    MpaaRating = match("contentRating*.?*.? (G|PG|PG-13|PG-14|R|NC-17|X) ", html)

    To:

    MpaaRating = matchAll("<0a.*?>(.*?)", match("Certification:.*?<0div class=""info-content"">(.*?)</0div", html))

    (remove all the "0" from the line, was refused posting the comment)
    Now a list of all certifications per country will be grabbed, use code to filter the one wanted.

    ReplyDelete
  108. Poster image displaying well on localhost but no image display after hosting on server.

    Am I making any mistake?

    ReplyDelete

Thanks a lot for your valuable comments :)