×

Discussion Board

Results 1 to 3 of 3
  1. #1
    Registered User
    Join Date
    Jul 2011
    Posts
    1

    Exclamation Problem using QXmlStreamReader to parse html tag

    Hi,I want to use QXmlStreamReader to parse a html file ,I want to get the all <p> content ,but in some <p> tag ,there is another <em> tag ,I can’t get all <p> tag content when meeting <em>,the QXmlStreamReader can’not read <em> tag.
    here is the html file :
    Code:
    <?xml version='1.0' encoding='utf-8'?>
    <html>
     <body class="calibre">
    <hr class="calibre6" id="calibre_pb_6"/><h3 id="calibre_toc_7" class="calibre7">CHAPTER VI</h3>
    <p class="calibre4">PREPARING TO BE A SAILOR</p>
    <p class="calibre4">"Take you for an old fraud," replied the unabashed first mate of the <em class="calibre5">Fancy</em>. "Of course you would be bankrupted, as you ought to have been long ago, if you gave fifty dollars on every turnip that is brought in; but you could well afford to advance a hundred on this watch, and you know it."</p>
    <p class="calibre4">"Veil, I tell you; I gifs t'venty-fife."</p>
    <p class="calibre4">[Illustration: "'VELL, I TELL YOU. I GIFS YOU TVENTY-FIFE'"]</p>
    <p class="calibre4">"Fifty," said Bonny, firmly.</p>
     </body>
    </html>
    here is the code to parse the html ‘s <p> and <em> tags,what’s wrong with this!
    Code:
    QString CParseEpubHtml::ParseHtml( QString filePath)
    {
        QFile pTmpFile(filePath);
        if(!pTmpFile.open(QIODevice::ReadOnly))
        {
            qWarning("Error opening  file");
            // return -1;
        }
        QXmlStreamReader xmlReader(&pTmpFile);
        xmlReader.setDevice(&pTmpFile);
     
        while(!xmlReader.atEnd() && !xmlReader.hasError())
        {
            xmlReader.readNext();
            if(xmlReader.isStartElement()){
                if( xmlReader.name()=="p")
                {
                    m_ReadContent +=xmlReader.readElementText();
                }
            }
     
            if(xmlReader.name()=="em")
            {
                xmlReader.readNext();
                m_ReadContent+=xmlReader.readElementText();
            }
     
            if(xmlReader.isEndElement())
            {
                if(xmlReader.name()=="p")
                    m_ReadContent+="\n";
                if(xmlReader.name()=="html")
                    break;
            }        
        }
     return m_ReadContent;
    }
    Another qusetion is that can TextEdit in QML support the css ,I want to show the text in the TextEdit just like in the html,it can keep the style like html!

    thank you for reply!
    My best regards.

  2. #2
    Super Contributor
    Join Date
    Mar 2009
    Posts
    1,024

    Re: Problem using QXmlStreamReader to parse html tag

    Hi,
    All XML Elements Must Have a Closing Tag. So I guess if you want to parse that HTML file you have to remove <hr class="calibre6" id="calibre_pb_6"/> tag.
    You can use a RegExp to achieve that.
    BTW real HTML pages cannot be parsed by XMLParser. That's why I use QWebPage with its frames for parsing HTML.
    Here a way how to achieve that:

    QWebPage page;
    QWebFrame * frame = page.mainFrame();

    QUrl fileUrl("myEBook.html");
    frame->setUrl(fileUrl);

    QWebElement document = frame->documentElement();
    QWebElementCollection elements = document.findAll("p");

    You can use QWebView to show HTML pages, it supports CSS and it's well integrated in QML.

  3. #3
    Super Contributor
    Join Date
    Nov 2009
    Location
    Minnesota, USA
    Posts
    3,209

    Re: Problem using QXmlStreamReader to parse html tag

    BTW real HTML pages cannot be parsed by XMLParser.
    Fully compliant HTML can be parsed with an XML parser (with possibly a few exceptions). But probably less than 5% of the HTML out on the web is even moderately compliant. (So I suppose your statement that "real" (ie, real life) HTML can't be parsed is true.)

Similar Threads

  1. QXmlStreamReader problem parsing XML
    By Bronko Pavel in forum Qt
    Replies: 4
    Last Post: 2011-04-06, 09:42
  2. How to add new HTML tag
    By prathibha83 in forum Qt
    Replies: 5
    Last Post: 2010-05-24, 08:26
  3. how to parse html
    By davidmaxwaterman in forum Qt
    Replies: 4
    Last Post: 2010-03-23, 15:34
  4. how to parse xml format string using cparse
    By nokia_lin in forum Symbian
    Replies: 8
    Last Post: 2009-07-08, 08:14
  5. DOCTYPE-tag in main.html of a widget?
    By Danneman in forum Symbian
    Replies: 0
    Last Post: 2009-04-18, 12:57

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
×