×

Discussion Board

Results 1 to 7 of 7
  1. #1
    Regular Contributor
    Join Date
    Aug 2007
    Posts
    105

    XML-Parsing of ISO-8859-1 Document

    Hi all,

    I'm having a nice character-encoding problem here. I've got an xml-document (see here but with ä ü and ö) which is encodet ISO-8859-1. It is received by HTTP via
    Code:
    // Get the body data supplier
    MHTTPDataSupplier* body = aTransaction.Response().Body();
    TPtrC8 dataChunk;
    body->GetNextDataPart(dataChunk);
    First of all: How can it be that I have an ISO-8859-1 string in an 8-bit descriptor? I thought this had to be 16bit? But if I display the whole XML-string, it displays correctly.

    Now I'd like to parse this body using CParser. Doing this, i always get Âx in my output with x being different for different characters like ö,ä or ü. Now, I've found in the forum that CParser accepts only UTF-8 so i tried to convert it to this doing

    Code:
    //put the TDesC8 in TDesC16
    HBufC16* tempBuf=HBufC16::NewL(iReceivedData->Length());
    tempBuf->Des().Copy(*iReceivedData);
    
    //convert to UTF-8
    TBuf8<1000> test;
    CnvUtfConverter::ConvertFromUnicodeToUtf8(test,*tempBuf);
    delete tempBuf;
    
    //start XML parser
    HBufC8* testy=HBufC8::NewL(test.Length());
    testy->Des().Copy(test);
    iXmlResult= CXmlHandler::NewL(*testy);
    delete testy;
    but then i got my  even doubled and received ÂÂx in the output. I've also read in Paul Todd's blog:

    Also can be a bit strange that the data is all in ASCII (UTF8 IIRC) and so you need to convert it to unicode or handle it as ASCII.
    but I don't know how I should do that. my OnContentL only does

    Code:
    void CXmlHandler::OnContentL( const TDesC8 &aBytes, TInt aErrorCode )
    {
        	aBuffer = HBufC::NewL( aBytes.Length() );
        	aBuffer->Des().Copy( aBytes );
    }
    no where the hell should I even start?

    Thanks to all of you!

    Chris
    Last edited by -chris-; 2008-11-26 at 10:15.

  2. #2
    Nokia Developer Moderator
    Join Date
    Feb 2006
    Location
    Budapest, Hungary
    Posts
    28,572

    Re: XML-Parsing of ISO-8859-1 Document

    What happens in the simplest case: pass data to CParser, and get result in OnContentL, no modifications on either side?

  3. #3
    Regular Contributor
    Join Date
    Aug 2007
    Posts
    105

    Re: XML-Parsing of ISO-8859-1 Document

    In that case I get Âx in my output of OnContentL with x being different for different characters like ö,ä or ü.

  4. #4
    Nokia Developer Moderator
    Join Date
    Feb 2006
    Location
    Budapest, Hungary
    Posts
    28,572

    Re: XML-Parsing of ISO-8859-1 Document

    I really do not know, but CParser may decide to do some UTF-8 decoding internally (despite that it still provides content in 8-bit descriptors).

    Then you still need the getdata-copy-encode-provide-decode way. I do not really undestand your original code, I would try this one:
    Code:
    HBufC *unicode=HBufC::NewLC(iReceivedData->Length()); // I hope that iReceivedData is a non-tampered 8-bit descriptor
    unicode->Des().Copy(*iReceivedData);
    
    HBufC8 *utf8=CnvUtfConverter::ConvertFromUnicodeToUtf8L(*unicode);
    CleanupStack::PopAndDestroy(); // unicode
    CleanupStack::PushL(utf8);
    
    iXmlResult= CXmlHandler::NewL(*utf8);
    CleanupStack::PopAndDestroy(); // utf8
    
    void CXmlHandler::OnContentL( const TDesC8 &aBytes, TInt aErrorCode )
    {
        aBuffer = CnvUtfConverter::ConvertToUnicodeFromUtf8L(aBytes);
    }

  5. #5
    Regular Contributor
    Join Date
    Aug 2007
    Posts
    105

    Re: XML-Parsing of ISO-8859-1 Document

    Hi Wizard,

    thanks a lot for these hints! As you coded it, it didn't do the trick - but doing no encoding and using your line of utf8->unicode in oncontentl did the trick. I'll add this to the wiki for future reference.


    thanks again!

    Chris

  6. #6
    Nokia Developer Moderator
    Join Date
    Feb 2006
    Location
    Budapest, Hungary
    Posts
    28,572

    Re: XML-Parsing of ISO-8859-1 Document

    In that case your XML document was already in UTF-8 format, and not in "plain" ISO-8859-1. And all you needed is decoding it to Unicode, after parsing.

  7. #7
    Regular Contributor
    Join Date
    Aug 2007
    Posts
    105

    Re: XML-Parsing of ISO-8859-1 Document

    that's what I thought, although I do not understand it. The browser shows ISO and not UTF - maybe the HTTP Client of Symbian somehow recodes it? strange, strange...

Similar Threads

  1. Parsing wbxml file to xml file using kxml parser
    By ramyashashe in forum Mobile Java General
    Replies: 18
    Last Post: 2010-05-17, 15:56
  2. Parsing xml data from the server using kxml2 in J2ME
    By dninsiima in forum Mobile Java Networking & Messaging & Security
    Replies: 4
    Last Post: 2010-03-26, 08:08
  3. Replies: 21
    Last Post: 2009-04-18, 12:00
  4. Need knowledge about XML and XML Parsing
    By 02060515 in forum Mobile Java General
    Replies: 1
    Last Post: 2008-07-07, 08:41
  5. C-code..unable to make sis
    By Symbian_Neil in forum Symbian
    Replies: 9
    Last Post: 2006-12-02, 07:55

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •