Discussion Board

Results 1 to 14 of 14
  1. #1
    Registered User
    Join Date
    Mar 2003
    Location
    Luton, Bedfordshire, UK
    Posts
    29

    Thoughts and questions re internationalisation

    I've been looking with interest at some of the threads in here dealing with internationalisation, and it seems that there are tradeoffs between fully hard-coded versus fully dynamic, and in-situ versus centralised. My problem is that I'm finding it hard to choose between some of the possibilities, so I thought I might list them as I see them, with supporting arguments, and see if anybody has anything to add or subtract, or a completely new approach that I hadn't thought of.

    To limit the discussion, I'm talking about non-connected apps (so no downloading resources from the internet). Also I'm only thinking about Indo-European languages, since that's all I know even the slightest thing about. I've come up with four different approaches, and after writing this, I think I'm starting to favour hard-coded + in-situ, which is a bit of a surprise.

    So that this doesn't turn into a single giant post, I'm going to do each option as a separate reply to this one.

  2. #2
    Registered User
    Join Date
    Mar 2003
    Location
    Luton, Bedfordshire, UK
    Posts
    29

    Option 1 - language-specific source files

    I use Ant for compilation and packaging, so that gives me my first option: use it to effectively preprocess my source files via the Copy task using a FilterSet to replace tokens, producing sets of language-specific source files.

    Pro: No string lookup necessary at runtime. Maximum speed. Simpler programs: code just uses constants.
    Con: Either package in separate language versions of the app, or Class.forName is necessary at runtime (in which case also add: BIG jar file). Somewhat more complicated build.

    I'm leaning towards rejecting this, as language-specific versions complicate distribution, whereas big jar files complicate downloading.

  3. #3
    Registered User
    Join Date
    Mar 2003
    Location
    Luton, Bedfordshire, UK
    Posts
    29

    Option 2 - language-specific subclasses

    Declare the string constants in the base classes, and define them in language-specific subclasses. To get an idea of what I mean by this, say you're doing an options or settings form. Then you'd have a base Options.java that defined:
    Code:
        protected static String ID_LABEL;
        protected static String ID_OK;
        protected static String ID_CANCEL;
        protected static String ID_MSG;
    and did all the business of adding them to the display, etc. in the appropriate methods. Then you'd have the derived classes that just looked like e.g. this:
    Code:
    public class Options_fr extends Options
    {
        static
        {
            ID_LABEL = "Param¨¨tres";
            ID_OK = "OK";
            ID_CANCEL = "Annuler";
            ID_MSG = "Oui, ça marche!";
        }
    }
    Then your midlet does:
    Code:
        private Options options;
        // ... some time later...
            try
            {
                Class cls = Class.forName( "Options_" + locale );
                this.options = (Options) cls.newInstance();
            }
            catch ( Exception e )
            {
                this.options = new Options_en();
            }
            this.options.init( this );
    Pros: The string definitions are, more-or-less, where you want them to be. That is, they are near the point they are used. You don't have to have them all in a big, central text file, and start counting offsets, which would be a maintenance nightmare. Reasonably easy to use, quite easy to code: code uses variables that look like constants. It's a good deal less bulky in terms of jar size than the first option.
    Cons: The variables can't be final (but then, you'd never assign to SOMETHING_LIKE_THIS in method code, would you?). You have to mess about with Class.forName, and move construction parameters into initialisation methods, which is a bit flaky. Slightly more involved class hierarchy. Still, potentially, a bit big in terms of jar size -- there could end up being a LOT of classes.

    I quite like this one, especially since you could generate the language-specific subclasses from a single source file using Ant's copy task, as in the first option. But I can see that it might strike some people as a bit of an abomination.
    Last edited by GerardMason; 2003-10-01 at 11:56.

  4. #4
    Registered User
    Join Date
    Mar 2003
    Location
    Luton, Bedfordshire, UK
    Posts
    29

    Options 3 - utility resource loader + provider

    Here, a central resource class loads a text or raw file at runtime, and then supplies the strings to client classes via constant indexes. You might use the MIDlet itself for this, as it tends to be passed to UI classes so that they can call its callbacks when they finish; or you might use a specialist class and read the strings in at class-load time via a static initialiser so that you don't have to bother instantiating and passing it.

    Pro: Straightforward, well-understood mode of operation; no gimmicky coding. Smaller jar.
    Con: Messy to use: you need an accessor method, and your nice string constants have been replaced with integer index constants. You end up writing code like 'this.addCommand( new Command( TestMIDlet.getLocalString( TestMIDlet.STRING_COMMAND_BACK ), Command.BACK, 1 ) );', which is ugly. Worse, it's error-prone, with nothing to stop one class asking for, and getting, another class' strings. Maintenance is a chore if you don't add strings at the end of the file, because then you have to renumber your index constants, so neatness is expensive.

    This is the approach that I've seen described elsewhere on this site, and it's the one I like least (even though it's the one I'm actually using at the moment!).

  5. #5
    Registered User
    Join Date
    Mar 2003
    Location
    Luton, Bedfordshire, UK
    Posts
    29

    Option 4 - improving option 3

    The fourth option is similar to the third in loading from a raw or text file at runtime, but it relies on the magic of static initialisation to improve things slightly; in this case, on the fact that static finals (constants) can be assigned-to in a static initialiser. So your MIDlet (say) has declarations like so:
    Code:
        static final String SETTINGSDIALOG_CHOOSEFONTSIZE;
        static final String SETTINGSDIALOG_FONTSIZESMALL;
        static final String SETTINGSDIALOG_FONTSIZEMEDIUM;
        static final String SETTINGSDIALOG_FONTSIZELARGE;
    and in the static initialiser you open your raw file, read in the UTF strings and assign them to the constants.

    Pro: Gets rid of the accessor method and integer indexes, and allows client classes say just 'TestMIDlet.SETTINGSDIALOG_CHOOSEFONTSIZE' instead. The strings are final!
    Con: The actual coding is long-winded. Because the reading in and assigning has to be done inside a try-statement, and the compiler will complain if there is any possibility that a constant has no value, you can't do them all in one try statement: you have to do a separate try and catch construct for each one. There's the same problem with the lack of data hiding and error-proneness as the third option.

  6. #6
    Super Contributor
    Join Date
    Jun 2003
    Location
    Cheshire, UK
    Posts
    7,395
    Option 3

    I confess... I'm an "option 3" user!

    But I don't need code like:
    Code:
    TestMIDlet.getLocalString(TestMIDlet.STRING_COMMAND_BACK)
    First, I have a separate resources class, so my constants' names don't need to so long (they're all Strings, so prefixing STRING_ is irrelevant).

    Second, I use a C preprocessor, so I can define a macro for this kind of phrase, and just write:
    Code:
    res (COMMAND_BACK)
    There's no need to pass a reference, as I use a class method (they're faster to invoke).

    The biggest pain is, as you say, renumbering the constants... I now have a text file with a list of constant names, and a VBScript (!) program that generates code from it, autonumbering the constant definitions. (In fact, I don't use Java constants at all, so it actually generates a C header file, Resources.h - I was amazed at how much space I saved by replacing final vars with #defines!).

    Sharing text resources between classes is a plus for size, as many classes are going to need phrases like "OK", "Cancel", "Back" and so on.

    Graham.

  7. #7
    Registered User
    Join Date
    Mar 2003
    Location
    Luton, Bedfordshire, UK
    Posts
    29
    Yes, I remember reading where you said about using a C preprocessor, and thought "I must ask him about that" -- and then forgot! I suppose that's part of Visual Studio, which I don't have. On the other hand, I do have the freebie version of Borland C++ builder 6 -- is there something in there I could use? It sounds like it could be integrated into the ant build without too much fuss. Could you post a tiny bit of code, and explain the process in baby steps for those of us not too familiar with C?


    Cheers,
    Gerard.

  8. #8
    Super Contributor
    Join Date
    Jun 2003
    Location
    Cheshire, UK
    Posts
    7,395
    Using a C Preprocessor with Java

    In general, using a C preprocessor with Java has more pitfalls than benefits. I've used it, because it cut my JAR file size by about 5%, which makes the difference between "OK" and "too big". (That's 5% of the obfuscated size - about ten times more than I expected. I was surprised at just how much space Java constants take). I don't code like this routinely!

    Constants in C work by having the preprocessor do a sort of "search-and-replace" on your code before the compiler sees it. So, you write (in C):
    Code:
    #define BUFFER_SIZE 512
    
    char arrayCnotJava [BUFFER_SIZE];
    
    printf ("BUFFER_SIZE got replaced\n");
    but by the time the compiler gets it, it has become:
    Code:
    char arrayCnotJava [512];
    
    printf ("BUFFER_SIZE got replaced\n");
    Note that the quoted string gets left alone. It's a smart search-and-replace, because the cpp knows a bit about C; about comments and string literals, for example. You can use a cpp in Java because these bits of syntax are common to both languages.

    Lines that start with a # are preprocessor directives - they're not understood by the C compiler, so the cpp strips them out. The other common directive is #include, which tells the cpp to read another file (a header file) and insert its content at that point. #defines are often placed in header files, so that the same definition can be #included into multiple source code units. So for each .c file, you will usually have a matching .h file, containing all of the "public" constants for that module.

    It has other tricks too, like conditional compilation - including or excluding sections of code depending on some build-time option setting. This is useful for buidling debug versions, or versions for different platforms.

    Costs are:

    * It will take time to convert your code. It will take more time to convert it back if you don't like it!

    * The cpp's insertion and deletion of code messes all your line numbers up - compilation errors will be given in cpp'd output lines, not original lines. This makes locating errors in the source code more of a pain.

    * If you change a header file, you will need to recompile every class that #includes it. Your existing build process is not likely to expect you to do something so perverse as cpp'ing Java, so it probably won't help you manage all the new file dependancies - at least, not without coaxing.

    If you don't need the reduction in size, it's not worth it. If you do, then you might as well take as much advantage of it as possible. (Another possible reason might be if you needed to produce different versions with vendor-specific classes - with conditional compilation, you could do this from the same source code.)

    I'm using the GNU C preprocessor, cpp, but I've used Microsoft's too. Borland always used to be good at providing some nice command-line tools, and I imagine they still are.

    The GNU CPP documentation is quite good, and covers the various areas where C programmers can shaft themselves, most of which also apply to those of us foolish and/or desperate enough to use it with Java! You can read it at http://gcc.gnu.org/onlinedocs/cpp/ (especially the "pitfalls" section if you're not an experienced C programmer).

    The command I'm using is:
    Code:
    cpp -P -nostdinc -undef MyClass.java > cppout\MyClass.java
    The options essentially make it less "C" oriented. The equivalent for Microsoft's kit involves running the compiler itself:
    Code:
    cl /EP /X /u MyClass.java > cppout\MyClass.java
    It's a powerful tool, and can solve problems that enter into Java when you code for such limited devices (like the fact that symbolic-constants take more space in the code than literals). But remember that it isn't designed to work with Java, and none of your Java tools are designed to work with a C preprocessor.

    Sorry if some of this seems negative - cpp has worked well for me, and solved problems that I otherwise could not have solved. But I want you to be fully aware of both the up and down sides to the process. cpp is a tool - it isn't magic.

    May your destiny be merciful!

    Graham.

  9. #9
    Super Contributor
    Join Date
    Mar 2003
    Location
    Israel
    Posts
    2,280
    I use option 1.
    I have a class will all the language specific constants and code (or the version specific, I also use this technique to make versions for different devices).
    I don't use Class.forName(), but I make separate packages for each version instead.
    I still have to work out a good build method, though (in other words, learn how to use Ant).

    shmoove

  10. #10
    Registered User
    Join Date
    Jul 2003
    Location
    Finland, Tampere
    Posts
    1,113
    I use Option3 + custom tools

    Here, a central resource class loads a text or raw file at runtime, and then supplies the strings to client classes via constant indexes. You might use the MIDlet itself for this, as it tends to be passed to UI classes so that they can call its callbacks when they finish; or you might use a specialist class and read the strings in at class-load time via a static initialiser so that you don't have to bother instantiating and passing it.


    Pro: Straightforward, well-understood mode of operation; no gimmicky coding. Smaller jar.


    Exactly! I also store all the languages in a single file which provides better compression

    Con: Messy to use: you need an accessor method, and your nice string constants have been replaced with integer index constants. You end up writing code like 'this.addCommand( new Command( TestMIDlet.getLocalString( TestMIDlet.STRING_COMMAND_BACK ), Command.BACK, 1 ) );',
    I don't consider it being too messy. Especially if your accessor's method name is 1 letter long. Well.. in fact my code is even more complex, because I use Singleton pattern (not satisfied with using static initializers and static methods only)
    My code looks like:
    Code:
    this.addCommand( new Command(TestMIDlet.getLocalString(Resources.instance().getString(Resources.HIGH_SCORES), 1 ) );
    Worse, it's error-prone, with nothing to stop one class asking for, and getting, another class' strings.
    Theoretically it can be a problem, but in practice this means that you mix HIGH_SCORES with NEW_GAME. Isn't it the same as if you confused "High Scores" with "New Game"? These errors are quite obvious and easily trackable.

    Maintenance is a chore if you don't add strings at the end of the file, because then you have to renumber your index constants, so neatness is expensive.

    Here is the point, where I must argue!
    I've written a simple tool to edit my language packs. It places all the strings in all the languages together and adds two-integers long header describing number of langs and number of strings per lang. Also tool is responsible for checking that everything is correct (number of strings per lang is the same).

    Tools also generates "public static final int" strings ready to be copied into a Resource.java which is the same for every applications. It differs with these constants only


    One more important pro of this approach: I can change game's languages simply by replacing the language pack. No recompilation needed! In fact you can do it with your favourite zip archiver only!

  11. #11
    Registered User
    Join Date
    Mar 2003
    Location
    Luton, Bedfordshire, UK
    Posts
    29
    Hmm, just looking over the Antenna documentation for an unrelated reason (to try to see how to get it working with the Nokia extensions), I note that it includes a WtkPreprocess task, apparently for this very purpose! I must investigate further...

  12. #12
    Registered User
    Join Date
    Jul 2003
    Location
    Finland, Tampere
    Posts
    1,113
    Heh, as for me sofisticated preprocessor would be much better then any tools or whatever. Space is too valuable for MIDlets, I can sacrifice some flexibility to have smaller MIDlets.

    One big requirement, preprocessor should be integratible with JBuilder otherwise it will be too unflexible for me. Ufortunately I haven't seen any of these or could not recognize them

  13. #13
    Registered User
    Join Date
    Jul 2003
    Location
    Finland, Tampere
    Posts
    1,113
    Sorry, guys, I was probably too exited when heard about some real good preprocessor available

    Even if I can use conditional compilation (Antenna can do it) still reading text strings from resource might take less space then directly including strings into code.

    Does anybody want to do a small research about it?

  14. #14
    Regular Contributor
    Join Date
    Aug 2003
    Location
    uk
    Posts
    232

    Re: Line number issue with CPP

    Its easy to fix the line number issue using CPP, so that error line numbers from the preprocessed file match the original file.

    You will need to get a copy of any awk derivative from the web (awk/gawk/mawk/etc).

    You need to run the output of cpp though the following script

    Code:
    #!/bin/gawk -f
    
    # Auther:  Alex Crowther - Dec 2003
    # Purpose: reinsert white space removed by cpp
    # Reason:  So clickable error messages work :)
    
    # BugFix:  17/07/2005 - header detection change again.
    #          It absolutely will work all of the time now.
    #          Honestly ... it will really :)
    
    {  
    	if (NR == 1) 
    		header=$0
    	else
    		if ($0 == header) # we are past the header
    			header="done"
    
    	if ($1 == "#")
    	{
    		if (header = "done") # ie we are past the header fields
    		{
    			while (line_number < $2-1)
    			{
    				line_number++
    				print ""
    			}
    		}
    	}
    	else
    	{
    		line_number++
    		print
    	}
    }
    My preprocessor script line is (this is for bash not ant):

    ( cpp -nostdinc -imacros src/macros.h -Wall $INFILE ) 2> cpp_errors.txt | cppFIX > $OUTFILE

    Where $INFILE is the file before preprocessing and $OUTFILE is the preprocessed file.

    Any preprocessor errors are piped into the file cpp_errors.txt (example errors include #if and #endif not matching etc)

    If you are not running bash, change it to:

    cpp -nostdinc -Wall infile.java | awk -f cppFIX > outfile.java

    Hope that helps some people.

    Alex
    Last edited by alex_crowther; 2005-07-18 at 00:42.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
×