×

Discussion Board

Results 1 to 3 of 3
  1. #1
    Registered User
    Join Date
    Feb 2008
    Posts
    12

    parsing results page

    Hi All,

    I want to save the top 10 links from a yahoo results search into folders. I can extract all links from the page ok but that is way more than ten (up to 70 due to all the links on the page!), does anyone know if they is anything that distinguishes these top 10 links from the rest of them.

    i.e on this page..

    http://uk.search.yahoo.com/search;_y...ei=UTF-8&rd=r1

    i would want the following html content for these links

    1 www.python.org
    2 www.pythonline.com
    3
    .
    .
    .
    10 www.onlamp.com/python

    in ten folders. rather than every link from page!!

    Any help appreciated.

    Thank You

  2. #2
    Registered User
    Join Date
    Feb 2008
    Posts
    12

    Re: parsing results page

    and heres the main chunk of the code to get the URLS
    THANKS

    Code:
    if __name__ == "__main__":
        import urllib
        usock = urllib.urlopen("http://uk.search.yahoo.com/search?p=cinemas+in+dublin&fr=yfp-t-501&ei=UTF-8&meta=vc%3D")
        parser = URLLister()
        parser.feed(usock.read())
        parser.close()
        usock.close()
        path = u"c:\\Users\\Neil\\Desktop\\"
        i = 0
        for url in parser.urls: 
           if i <= (len(parser.urls)):
              print i
              print parser.urls[i]
              page = urllib.urlopen(parser.urls[i]).read()
              f = file(path + u"test" + str(i) + u".txt", "w+")   
              print >> f, page 
              f.close()
              print "Html file successfully printed to file!"

  3. #3
    Regular Contributor
    Join Date
    Jan 2004
    Location
    Helsinki
    Posts
    376

    Re: parsing results page

    Quote Originally Posted by IreStep View Post
    Hi All,

    I want to save the top 10 links from a yahoo results search into folders. I can extract all links from the page ok but that is way more than ten (up to 70 due to all the links on the page!), does anyone know if they is anything that distinguishes these top 10 links from the rest of them.

    i.e on this page..

    http://uk.search.yahoo.com/search;_y...ei=UTF-8&rd=r1

    i would want the following html content for these links

    1 www.python.org
    2 www.pythonline.com
    3
    .
    .
    .
    10 www.onlamp.com/python

    in ten folders. rather than every link from page!!
    I recommend you to ask this question in general Python forums, since this problem applies in other environments too. This is a simple generic programming exercise and various algorithms can be chosen.
    Mikko Ohtamaa

    http://mfabrik.com
    http://blog.mfabrik.com

Similar Threads

  1. Replies: 0
    Last Post: 2005-02-17, 06:15
  2. File format Unknown when trying to view XHTML page on the Nokia6650
    By shawn.hines in forum Browsing and Mark-ups
    Replies: 0
    Last Post: 2004-05-04, 16:35

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
×