×

Discussion Board

Results 1 to 9 of 9
  1. #1
    Regular Contributor
    Join Date
    Mar 2008
    Posts
    151

    simple foreign characters (utf-8) problem

    I have repeatedly failed to import foreign characters from a text file - please help. This is a simple problem but I can't fix it!

    I know I can display foreign characters because:

    text = u"héllo world"
    print text

    ...produces héllo world


    if i create a text file:
    hello world
    next line

    then the following code opens and prints it with no problems:

    filename = u"e:\\Python\\order.txt"
    f = file(filename, "r")
    def openfile():
    for line in f:
    print line
    f.close
    openfile()


    But if the textfile contains a foreign character (eg é) nothing happens. Please tell me how to teach pys60 that my text file contains utf foreign chars. Thanks!

  2. #2
    Nokia Developer Champion
    Join Date
    Feb 2008
    Location
    Ahmedabad, Gujarat, India
    Posts
    3,852

    Re: simple foreign characters (utf-8) problem

    hi jtullis
    welcome to the wonderful python dibo. hope you get the best out of this forum.
    now in your code please try the following change.
    Code:
    filename = u"e:\\Python\\order.txt"
    f = file(filename, "r").decode('utf-8')
    hope it helps
    give a feedback

    nd please the next time you write code use the quotes provided.

  3. #3
    Regular Contributor
    Join Date
    Mar 2008
    Posts
    151

    Re: simple foreign characters (utf-8) problem

    Hi - thanks. It's not quite working.

    This code returns an error 'attribute error: file object as no attribute decode:
    Code:
    filename = u"e:\\Python\\order.txt"
    f = file(filename, "r").decode('utf-8')
    
    for line in f:
        print line
    f.close()
    So I changed it to this:
    Code:
    filename = u"e:\\Python\\order.txt"
    f = file(filename, "r".decode('utf-8'))
    
    for line in f:
        print line.strip()
    f.close()
    ...which works OK for plain text in my textfile order.txt, but as soon as i put a foreign char in there (eg é), the program won't print anything to the console. it just hangs.

    I'm using 1.4.2 on my nokia n73. Thanks, John

  4. #4
    Super Contributor
    Join Date
    May 2004
    Location
    Tampere, Finland
    Posts
    524

    Re: simple foreign characters (utf-8) problem

    Quote Originally Posted by jtullis View Post
    This code returns an error 'attribute error: file object as no attribute decode:
    Code:
    filename = u"e:\\Python\\order.txt"
    f = file(filename, "r").decode('utf-8')
    
    for line in f:
        print line
    f.close()
    Yeah, file() returns a file object, which has no decode() attribute/method.


    Quote Originally Posted by jtullis View Post
    So I changed it to this:
    Code:
    filename = u"e:\\Python\\order.txt"
    f = file(filename, "r".decode("utf-8"))
    
    for line in f:
        print line.strip()
    f.close()
    No, this is absolutely not correct. You merely change the encoding of a single letter "r" to Unicode.

    Try this:

    Code:
    filename = u"e:\\Python\\order.txt"
    
    f = file(filename, "r")
    
    for line in f:
        print line.decode("utf-8")
    
    f.close()

  5. #5
    Regular Contributor
    Join Date
    Mar 2008
    Posts
    151

    Re: simple foreign characters (utf-8) problem

    Thanks again - no joy I'm afraid. Error msg screenshot here http://www.nottingham.ac.uk/~lgxjt2/A0101.jpg

    My textfile was made in notepad, and contains only one line:

    hello wòrld

  6. #6
    Super Contributor
    Join Date
    May 2004
    Location
    Tampere, Finland
    Posts
    524

    Re: simple foreign characters (utf-8) problem

    Quote Originally Posted by jtullis View Post
    My textfile was made in notepad, and contains only one line:
    Then the encoding is not UTF-8. Its probably ISO-8859-1 or ISO-8859-15 then. Its really up to you what encoding you're going to use in that file of yours. Just replace the line.decode("utf-8") with line.decode("iso-8859-1") or whatever encoding you're using.

  7. #7
    Regular Contributor
    Join Date
    Mar 2008
    Posts
    151

    Re: simple foreign characters (utf-8) problem

    Brilliant - problem solved. Thanks.

    If you hadn't been able to tell me that, where could I have found it out for myself that notepad uses that system? Also, is there a list in the documentation somewhere of which systems python's decode function recognises? This could be useful for people in future.

    Thanks for help, J

  8. #8
    Super Contributor
    Join Date
    May 2004
    Location
    Tampere, Finland
    Posts
    524

    Re: simple foreign characters (utf-8) problem

    Quote Originally Posted by jtullis View Post
    If you hadn't been able to tell me that, where could I have found it out for myself that notepad uses that system? Also, is there a list in the documentation somewhere of which systems python's decode function recognises? This could be useful for people in future.
    Character encodings is a difficult and broad subject. You need to know computer history going back fifty years to understand why the situation is as it is today. There's a couple of Wikipedia articles about character encoding in general and UTF-8 in particular, which may be of some help.

    Python has a list of standard encodings as of Python v2.3. PyS60 is Python v2.2.2 so the list is not totally accurate. Also, not all encodings are included in the PyS60 SIS file. You need to find out yourself which ones are present.

    To add to the confusion, Windows' Notepad does not call the encodings by their correct names. ISO-8859-1 (or more accurately Windows-1252) encoding is called "ANSI" and 16-bit encodings (UCS-2) are simply called "Unicode". Other editors are available where encodings are called by their real names.

    Notepad also adds BOM (Byte-Order-Mark) in the beginning of Unicode text files. This may confuse programs that do not expect these two bytes.

  9. #9
    Regular Contributor
    Join Date
    Mar 2008
    Posts
    151

    Re: simple foreign characters (utf-8) problem


Similar Threads

  1. simple listbox problem
    By Symbian_Neil in forum Symbian Tools & SDKs
    Replies: 15
    Last Post: 2007-04-03, 13:34
  2. 7710 problem with special characters
    By KelvinChien in forum General Development Questions
    Replies: 0
    Last Post: 2005-06-08, 03:43
  3. utf problem
    By Nokia_Archive in forum Browsing and Mark-ups
    Replies: 0
    Last Post: 2002-05-27, 16:17
  4. Problem with Greek Characters on Input
    By Nokia_Archived in forum Browsing and Mark-ups
    Replies: 1
    Last Post: 2002-05-15, 00:38

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
×