| This is gettext.info, produced by makeinfo version 4.13 from |
| gettext.texi. |
| |
| INFO-DIR-SECTION GNU Gettext Utilities |
| START-INFO-DIR-ENTRY |
| * gettext: (gettext). GNU gettext utilities. |
| * autopoint: (gettext)autopoint Invocation. Copy gettext infrastructure. |
| * envsubst: (gettext)envsubst Invocation. Expand environment variables. |
| * gettextize: (gettext)gettextize Invocation. Prepare a package for gettext. |
| * msgattrib: (gettext)msgattrib Invocation. Select part of a PO file. |
| * msgcat: (gettext)msgcat Invocation. Combine several PO files. |
| * msgcmp: (gettext)msgcmp Invocation. Compare a PO file and template. |
| * msgcomm: (gettext)msgcomm Invocation. Match two PO files. |
| * msgconv: (gettext)msgconv Invocation. Convert PO file to encoding. |
| * msgen: (gettext)msgen Invocation. Create an English PO file. |
| * msgexec: (gettext)msgexec Invocation. Process a PO file. |
| * msgfilter: (gettext)msgfilter Invocation. Pipe a PO file through a filter. |
| * msgfmt: (gettext)msgfmt Invocation. Make MO files out of PO files. |
| * msggrep: (gettext)msggrep Invocation. Select part of a PO file. |
| * msginit: (gettext)msginit Invocation. Create a fresh PO file. |
| * msgmerge: (gettext)msgmerge Invocation. Update a PO file from template. |
| * msgunfmt: (gettext)msgunfmt Invocation. Uncompile MO file into PO file. |
| * msguniq: (gettext)msguniq Invocation. Unify duplicates for PO file. |
| * ngettext: (gettext)ngettext Invocation. Translate a message with plural. |
| * xgettext: (gettext)xgettext Invocation. Extract strings into a PO file. |
| * ISO639: (gettext)Language Codes. ISO 639 language codes. |
| * ISO3166: (gettext)Country Codes. ISO 3166 country codes. |
| END-INFO-DIR-ENTRY |
| |
| This file provides documentation for GNU `gettext' utilities. It |
| also serves as a reference for the free Translation Project. |
| |
| Copyright (C) 1995-1998, 2001-2007 Free Software Foundation, Inc. |
| |
| This manual is free documentation. It is dually licensed under the |
| GNU FDL and the GNU GPL. This means that you can redistribute this |
| manual under either of these two licenses, at your choice. |
| |
| This manual is covered by the GNU FDL. Permission is granted to |
| copy, distribute and/or modify this document under the terms of the GNU |
| Free Documentation License (FDL), either version 1.2 of the License, or |
| (at your option) any later version published by the Free Software |
| Foundation (FSF); with no Invariant Sections, with no Front-Cover Text, |
| and with no Back-Cover Texts. A copy of the license is included in |
| *note GNU FDL::. |
| |
| This manual is covered by the GNU GPL. You can redistribute it |
| and/or modify it under the terms of the GNU General Public License |
| (GPL), either version 2 of the License, or (at your option) any later |
| version published by the Free Software Foundation (FSF). A copy of the |
| license is included in *note GNU GPL::. |
| |
| |
| File: gettext.info, Node: Top, Next: Introduction, Prev: (dir), Up: (dir) |
| |
| GNU `gettext' utilities |
| *********************** |
| |
| This manual documents the GNU gettext tools and the GNU libintl |
| library, version 0.18.1. |
| |
| * Menu: |
| |
| * Introduction:: Introduction |
| * Users:: The User's View |
| * PO Files:: The Format of PO Files |
| * Sources:: Preparing Program Sources |
| * Template:: Making the PO Template File |
| * Creating:: Creating a New PO File |
| * Updating:: Updating Existing PO Files |
| * Editing:: Editing PO Files |
| * Manipulating:: Manipulating PO Files |
| * Binaries:: Producing Binary MO Files |
| * Programmers:: The Programmer's View |
| * Translators:: The Translator's View |
| * Maintainers:: The Maintainer's View |
| * Installers:: The Installer's and Distributor's View |
| * Programming Languages:: Other Programming Languages |
| * Conclusion:: Concluding Remarks |
| |
| * Language Codes:: ISO 639 language codes |
| * Country Codes:: ISO 3166 country codes |
| * Licenses:: Licenses |
| |
| * Program Index:: Index of Programs |
| * Option Index:: Index of Command-Line Options |
| * Variable Index:: Index of Environment Variables |
| * PO Mode Index:: Index of Emacs PO Mode Commands |
| * Autoconf Macro Index:: Index of Autoconf Macros |
| * Index:: General Index |
| |
| --- The Detailed Node Listing --- |
| |
| Introduction |
| |
| * Why:: The Purpose of GNU `gettext' |
| * Concepts:: I18n, L10n, and Such |
| * Aspects:: Aspects in Native Language Support |
| * Files:: Files Conveying Translations |
| * Overview:: Overview of GNU `gettext' |
| |
| The User's View |
| |
| * System Installation:: Questions During Operating System Installation |
| * Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs |
| * Setting the POSIX Locale:: How to Specify the Locale According to POSIX |
| * Installing Localizations:: How to Install Additional Translations |
| |
| Setting the POSIX Locale |
| |
| * Locale Names:: How a Locale Specification Looks Like |
| * Locale Environment Variables:: Which Environment Variable Specfies What |
| * The LANGUAGE variable:: How to Specify a Priority List of Languages |
| |
| Preparing Program Sources |
| |
| * Importing:: Importing the `gettext' declaration |
| * Triggering:: Triggering `gettext' Operations |
| * Preparing Strings:: Preparing Translatable Strings |
| * Mark Keywords:: How Marks Appear in Sources |
| * Marking:: Marking Translatable Strings |
| * c-format Flag:: Telling something about the following string |
| * Special cases:: Special Cases of Translatable Strings |
| * Bug Report Address:: Letting Users Report Translation Bugs |
| * Names:: Marking Proper Names for Translation |
| * Libraries:: Preparing Library Sources |
| |
| Making the PO Template File |
| |
| * xgettext Invocation:: Invoking the `xgettext' Program |
| |
| Creating a New PO File |
| |
| * msginit Invocation:: Invoking the `msginit' Program |
| * Header Entry:: Filling in the Header Entry |
| |
| Updating Existing PO Files |
| |
| * msgmerge Invocation:: Invoking the `msgmerge' Program |
| |
| Editing PO Files |
| |
| * KBabel:: KDE's PO File Editor |
| * Gtranslator:: GNOME's PO File Editor |
| * PO Mode:: Emacs's PO File Editor |
| * Compendium:: Using Translation Compendia |
| |
| Emacs's PO File Editor |
| |
| * Installation:: Completing GNU `gettext' Installation |
| * Main PO Commands:: Main Commands |
| * Entry Positioning:: Entry Positioning |
| * Normalizing:: Normalizing Strings in Entries |
| * Translated Entries:: Translated Entries |
| * Fuzzy Entries:: Fuzzy Entries |
| * Untranslated Entries:: Untranslated Entries |
| * Obsolete Entries:: Obsolete Entries |
| * Modifying Translations:: Modifying Translations |
| * Modifying Comments:: Modifying Comments |
| * Subedit:: Mode for Editing Translations |
| * C Sources Context:: C Sources Context |
| * Auxiliary:: Consulting Auxiliary PO Files |
| |
| Using Translation Compendia |
| |
| * Creating Compendia:: Merging translations for later use |
| * Using Compendia:: Using older translations if they fit |
| |
| Manipulating PO Files |
| |
| * msgcat Invocation:: Invoking the `msgcat' Program |
| * msgconv Invocation:: Invoking the `msgconv' Program |
| * msggrep Invocation:: Invoking the `msggrep' Program |
| * msgfilter Invocation:: Invoking the `msgfilter' Program |
| * msguniq Invocation:: Invoking the `msguniq' Program |
| * msgcomm Invocation:: Invoking the `msgcomm' Program |
| * msgcmp Invocation:: Invoking the `msgcmp' Program |
| * msgattrib Invocation:: Invoking the `msgattrib' Program |
| * msgen Invocation:: Invoking the `msgen' Program |
| * msgexec Invocation:: Invoking the `msgexec' Program |
| * Colorizing:: Highlighting parts of PO files |
| * libgettextpo:: Writing your own programs that process PO files |
| |
| Highlighting parts of PO files |
| |
| * The --color option:: Triggering colorized output |
| * The TERM variable:: The environment variable `TERM' |
| * The --style option:: The `--style' option |
| * Style rules:: Style rules for PO files |
| * Customizing less:: Customizing `less' for viewing PO files |
| |
| Producing Binary MO Files |
| |
| * msgfmt Invocation:: Invoking the `msgfmt' Program |
| * msgunfmt Invocation:: Invoking the `msgunfmt' Program |
| * MO Files:: The Format of GNU MO Files |
| |
| The Programmer's View |
| |
| * catgets:: About `catgets' |
| * gettext:: About `gettext' |
| * Comparison:: Comparing the two interfaces |
| * Using libintl.a:: Using libintl.a in own programs |
| * gettext grok:: Being a `gettext' grok |
| * Temp Programmers:: Temporary Notes for the Programmers Chapter |
| |
| About `catgets' |
| |
| * Interface to catgets:: The interface |
| * Problems with catgets:: Problems with the `catgets' interface?! |
| |
| About `gettext' |
| |
| * Interface to gettext:: The interface |
| * Ambiguities:: Solving ambiguities |
| * Locating Catalogs:: Locating message catalog files |
| * Charset conversion:: How to request conversion to Unicode |
| * Contexts:: Solving ambiguities in GUI programs |
| * Plural forms:: Additional functions for handling plurals |
| * Optimized gettext:: Optimization of the *gettext functions |
| |
| Temporary Notes for the Programmers Chapter |
| |
| * Temp Implementations:: Temporary - Two Possible Implementations |
| * Temp catgets:: Temporary - About `catgets' |
| * Temp WSI:: Temporary - Why a single implementation |
| * Temp Notes:: Temporary - Notes |
| |
| The Translator's View |
| |
| * Trans Intro 0:: Introduction 0 |
| * Trans Intro 1:: Introduction 1 |
| * Discussions:: Discussions |
| * Organization:: Organization |
| * Information Flow:: Information Flow |
| * Translating plural forms:: How to fill in `msgstr[0]', `msgstr[1]' |
| * Prioritizing messages:: How to find which messages to translate first |
| |
| Organization |
| |
| * Central Coordination:: Central Coordination |
| * National Teams:: National Teams |
| * Mailing Lists:: Mailing Lists |
| |
| National Teams |
| |
| * Sub-Cultures:: Sub-Cultures |
| * Organizational Ideas:: Organizational Ideas |
| |
| The Maintainer's View |
| |
| * Flat and Non-Flat:: Flat or Non-Flat Directory Structures |
| * Prerequisites:: Prerequisite Works |
| * gettextize Invocation:: Invoking the `gettextize' Program |
| * Adjusting Files:: Files You Must Create or Alter |
| * autoconf macros:: Autoconf macros for use in `configure.ac' |
| * CVS Issues:: Integrating with CVS |
| * Release Management:: Creating a Distribution Tarball |
| |
| Files You Must Create or Alter |
| |
| * po/POTFILES.in:: `POTFILES.in' in `po/' |
| * po/LINGUAS:: `LINGUAS' in `po/' |
| * po/Makevars:: `Makevars' in `po/' |
| * po/Rules-*:: Extending `Makefile' in `po/' |
| * configure.ac:: `configure.ac' at top level |
| * config.guess:: `config.guess', `config.sub' at top level |
| * mkinstalldirs:: `mkinstalldirs' at top level |
| * aclocal:: `aclocal.m4' at top level |
| * acconfig:: `acconfig.h' at top level |
| * config.h.in:: `config.h.in' at top level |
| * Makefile:: `Makefile.in' at top level |
| * src/Makefile:: `Makefile.in' in `src/' |
| * lib/gettext.h:: `gettext.h' in `lib/' |
| |
| Autoconf macros for use in `configure.ac' |
| |
| * AM_GNU_GETTEXT:: AM_GNU_GETTEXT in `gettext.m4' |
| * AM_GNU_GETTEXT_VERSION:: AM_GNU_GETTEXT_VERSION in `gettext.m4' |
| * AM_GNU_GETTEXT_NEED:: AM_GNU_GETTEXT_NEED in `gettext.m4' |
| * AM_GNU_GETTEXT_INTL_SUBDIR:: AM_GNU_GETTEXT_INTL_SUBDIR in `intldir.m4' |
| * AM_PO_SUBDIRS:: AM_PO_SUBDIRS in `po.m4' |
| * AM_ICONV:: AM_ICONV in `iconv.m4' |
| |
| Integrating with CVS |
| |
| * Distributed CVS:: Avoiding version mismatch in distributed development |
| * Files under CVS:: Files to put under CVS version control |
| * autopoint Invocation:: Invoking the `autopoint' Program |
| |
| Other Programming Languages |
| |
| * Language Implementors:: The Language Implementor's View |
| * Programmers for other Languages:: The Programmer's View |
| * Translators for other Languages:: The Translator's View |
| * Maintainers for other Languages:: The Maintainer's View |
| * List of Programming Languages:: Individual Programming Languages |
| * List of Data Formats:: Internationalizable Data |
| |
| The Translator's View |
| |
| * c-format:: C Format Strings |
| * objc-format:: Objective C Format Strings |
| * sh-format:: Shell Format Strings |
| * python-format:: Python Format Strings |
| * lisp-format:: Lisp Format Strings |
| * elisp-format:: Emacs Lisp Format Strings |
| * librep-format:: librep Format Strings |
| * scheme-format:: Scheme Format Strings |
| * smalltalk-format:: Smalltalk Format Strings |
| * java-format:: Java Format Strings |
| * csharp-format:: C# Format Strings |
| * awk-format:: awk Format Strings |
| * object-pascal-format:: Object Pascal Format Strings |
| * ycp-format:: YCP Format Strings |
| * tcl-format:: Tcl Format Strings |
| * perl-format:: Perl Format Strings |
| * php-format:: PHP Format Strings |
| * gcc-internal-format:: GCC internal Format Strings |
| * gfc-internal-format:: GFC internal Format Strings |
| * qt-format:: Qt Format Strings |
| * qt-plural-format:: Qt Plural Format Strings |
| * kde-format:: KDE Format Strings |
| * boost-format:: Boost Format Strings |
| |
| Individual Programming Languages |
| |
| * C:: C, C++, Objective C |
| * sh:: sh - Shell Script |
| * bash:: bash - Bourne-Again Shell Script |
| * Python:: Python |
| * Common Lisp:: GNU clisp - Common Lisp |
| * clisp C:: GNU clisp C sources |
| * Emacs Lisp:: Emacs Lisp |
| * librep:: librep |
| * Scheme:: GNU guile - Scheme |
| * Smalltalk:: GNU Smalltalk |
| * Java:: Java |
| * C#:: C# |
| * gawk:: GNU awk |
| * Pascal:: Pascal - Free Pascal Compiler |
| * wxWidgets:: wxWidgets library |
| * YCP:: YCP - YaST2 scripting language |
| * Tcl:: Tcl - Tk's scripting language |
| * Perl:: Perl |
| * PHP:: PHP Hypertext Preprocessor |
| * Pike:: Pike |
| * GCC-source:: GNU Compiler Collection sources |
| |
| sh - Shell Script |
| |
| * Preparing Shell Scripts:: Preparing Shell Scripts for Internationalization |
| * gettext.sh:: Contents of `gettext.sh' |
| * gettext Invocation:: Invoking the `gettext' program |
| * ngettext Invocation:: Invoking the `ngettext' program |
| * envsubst Invocation:: Invoking the `envsubst' program |
| * eval_gettext Invocation:: Invoking the `eval_gettext' function |
| * eval_ngettext Invocation:: Invoking the `eval_ngettext' function |
| |
| Perl |
| |
| * General Problems:: General Problems Parsing Perl Code |
| * Default Keywords:: Which Keywords Will xgettext Look For? |
| * Special Keywords:: How to Extract Hash Keys |
| * Quote-like Expressions:: What are Strings And Quote-like Expressions? |
| * Interpolation I:: Invalid String Interpolation |
| * Interpolation II:: Valid String Interpolation |
| * Parentheses:: When To Use Parentheses |
| * Long Lines:: How To Grok with Long Lines |
| * Perl Pitfalls:: Bugs, Pitfalls, and Things That Do Not Work |
| |
| Internationalizable Data |
| |
| * POT:: POT - Portable Object Template |
| * RST:: Resource String Table |
| * Glade:: Glade - GNOME user interface description |
| |
| Concluding Remarks |
| |
| * History:: History of GNU `gettext' |
| * References:: Related Readings |
| |
| Language Codes |
| |
| * Usual Language Codes:: Two-letter ISO 639 language codes |
| * Rare Language Codes:: Three-letter ISO 639 language codes |
| |
| Licenses |
| |
| * GNU GPL:: GNU General Public License |
| * GNU LGPL:: GNU Lesser General Public License |
| * GNU FDL:: GNU Free Documentation License |
| |
| |
| File: gettext.info, Node: Introduction, Next: Users, Prev: Top, Up: Top |
| |
| 1 Introduction |
| ************** |
| |
| This chapter explains the goals sought in the creation of GNU |
| `gettext' and the free Translation Project. Then, it explains a few |
| broad concepts around Native Language Support, and positions message |
| translation with regard to other aspects of national and cultural |
| variance, as they apply to programs. It also surveys those files used |
| to convey the translations. It explains how the various tools interact |
| in the initial generation of these files, and later, how the maintenance |
| cycle should usually operate. |
| |
| In this manual, we use _he_ when speaking of the programmer or |
| maintainer, _she_ when speaking of the translator, and _they_ when |
| speaking of the installers or end users of the translated program. |
| This is only a convenience for clarifying the documentation. It is |
| _absolutely_ not meant to imply that some roles are more appropriate to |
| males or females. Besides, as you might guess, GNU `gettext' is meant |
| to be useful for people using computers, whatever their sex, race, |
| religion or nationality! |
| |
| Please send suggestions and corrections to: |
| |
| Internet address: |
| bug-gnu-gettext@gnu.org |
| |
| Please include the manual's edition number and update date in your |
| messages. |
| |
| * Menu: |
| |
| * Why:: The Purpose of GNU `gettext' |
| * Concepts:: I18n, L10n, and Such |
| * Aspects:: Aspects in Native Language Support |
| * Files:: Files Conveying Translations |
| * Overview:: Overview of GNU `gettext' |
| |
| |
| File: gettext.info, Node: Why, Next: Concepts, Prev: Introduction, Up: Introduction |
| |
| 1.1 The Purpose of GNU `gettext' |
| ================================ |
| |
| Usually, programs are written and documented in English, and use |
| English at execution time to interact with users. This is true not |
| only of GNU software, but also of a great deal of proprietary and free |
| software. Using a common language is quite handy for communication |
| between developers, maintainers and users from all countries. On the |
| other hand, most people are less comfortable with English than with |
| their own native language, and would prefer to use their mother tongue |
| for day to day's work, as far as possible. Many would simply _love_ to |
| see their computer screen showing a lot less of English, and far more |
| of their own language. |
| |
| However, to many people, this dream might appear so far fetched that |
| they may believe it is not even worth spending time thinking about it. |
| They have no confidence at all that the dream might ever become true. |
| Yet some have not lost hope, and have organized themselves. The |
| Translation Project is a formalization of this hope into a workable |
| structure, which has a good chance to get all of us nearer the |
| achievement of a truly multi-lingual set of programs. |
| |
| GNU `gettext' is an important step for the Translation Project, as |
| it is an asset on which we may build many other steps. This package |
| offers to programmers, translators and even users, a well integrated |
| set of tools and documentation. Specifically, the GNU `gettext' |
| utilities are a set of tools that provides a framework within which |
| other free packages may produce multi-lingual messages. These tools |
| include |
| |
| * A set of conventions about how programs should be written to |
| support message catalogs. |
| |
| * A directory and file naming organization for the message catalogs |
| themselves. |
| |
| * A runtime library supporting the retrieval of translated messages. |
| |
| * A few stand-alone programs to massage in various ways the sets of |
| translatable strings, or already translated strings. |
| |
| * A library supporting the parsing and creation of files containing |
| translated messages. |
| |
| * A special mode for Emacs(1) which helps preparing these sets and |
| bringing them up to date. |
| |
| GNU `gettext' is designed to minimize the impact of |
| internationalization on program sources, keeping this impact as small |
| and hardly noticeable as possible. Internationalization has better |
| chances of succeeding if it is very light weighted, or at least, appear |
| to be so, when looking at program sources. |
| |
| The Translation Project also uses the GNU `gettext' distribution as |
| a vehicle for documenting its structure and methods. This goes beyond |
| the strict technicalities of documenting the GNU `gettext' proper. By |
| so doing, translators will find in a single place, as far as possible, |
| all they need to know for properly doing their translating work. Also, |
| this supplemental documentation might also help programmers, and even |
| curious users, in understanding how GNU `gettext' is related to the |
| remainder of the Translation Project, and consequently, have a glimpse |
| at the _big picture_. |
| |
| ---------- Footnotes ---------- |
| |
| (1) In this manual, all mentions of Emacs refers to either GNU Emacs |
| or to XEmacs, which people sometimes call FSF Emacs and Lucid Emacs, |
| respectively. |
| |
| |
| File: gettext.info, Node: Concepts, Next: Aspects, Prev: Why, Up: Introduction |
| |
| 1.2 I18n, L10n, and Such |
| ======================== |
| |
| Two long words appear all the time when we discuss support of native |
| language in programs, and these words have a precise meaning, worth |
| being explained here, once and for all in this document. The words are |
| _internationalization_ and _localization_. Many people, tired of |
| writing these long words over and over again, took the habit of writing |
| "i18n" and "l10n" instead, quoting the first and last letter of each |
| word, and replacing the run of intermediate letters by a number merely |
| telling how many such letters there are. But in this manual, in the |
| sake of clarity, we will patiently write the names in full, each time... |
| |
| By "internationalization", one refers to the operation by which a |
| program, or a set of programs turned into a package, is made aware of |
| and able to support multiple languages. This is a generalization |
| process, by which the programs are untied from calling only English |
| strings or other English specific habits, and connected to generic ways |
| of doing the same, instead. Program developers may use various |
| techniques to internationalize their programs. Some of these have been |
| standardized. GNU `gettext' offers one of these standards. *Note |
| Programmers::. |
| |
| By "localization", one means the operation by which, in a set of |
| programs already internationalized, one gives the program all needed |
| information so that it can adapt itself to handle its input and output |
| in a fashion which is correct for some native language and cultural |
| habits. This is a particularisation process, by which generic methods |
| already implemented in an internationalized program are used in |
| specific ways. The programming environment puts several functions to |
| the programmers disposal which allow this runtime configuration. The |
| formal description of specific set of cultural habits for some country, |
| together with all associated translations targeted to the same native |
| language, is called the "locale" for this language or country. Users |
| achieve localization of programs by setting proper values to special |
| environment variables, prior to executing those programs, identifying |
| which locale should be used. |
| |
| In fact, locale message support is only one component of the cultural |
| data that makes up a particular locale. There are a whole host of |
| routines and functions provided to aid programmers in developing |
| internationalized software and which allow them to access the data |
| stored in a particular locale. When someone presently refers to a |
| particular locale, they are obviously referring to the data stored |
| within that particular locale. Similarly, if a programmer is referring |
| to "accessing the locale routines", they are referring to the complete |
| suite of routines that access all of the locale's information. |
| |
| One uses the expression "Native Language Support", or merely NLS, |
| for speaking of the overall activity or feature encompassing both |
| internationalization and localization, allowing for multi-lingual |
| interactions in a program. In a nutshell, one could say that |
| internationalization is the operation by which further localizations |
| are made possible. |
| |
| Also, very roughly said, when it comes to multi-lingual messages, |
| internationalization is usually taken care of by programmers, and |
| localization is usually taken care of by translators. |
| |
| |
| File: gettext.info, Node: Aspects, Next: Files, Prev: Concepts, Up: Introduction |
| |
| 1.3 Aspects in Native Language Support |
| ====================================== |
| |
| For a totally multi-lingual distribution, there are many things to |
| translate beyond output messages. |
| |
| * As of today, GNU `gettext' offers a complete toolset for |
| translating messages output by C programs. Perl scripts and shell |
| scripts will also need to be translated. Even if there are today |
| some hooks by which this can be done, these hooks are not |
| integrated as well as they should be. |
| |
| * Some programs, like `autoconf' or `bison', are able to produce |
| other programs (or scripts). Even if the generating programs |
| themselves are internationalized, the generated programs they |
| produce may need internationalization on their own, and this |
| indirect internationalization could be automated right from the |
| generating program. In fact, quite usually, generating and |
| generated programs could be internationalized independently, as |
| the effort needed is fairly orthogonal. |
| |
| * A few programs include textual tables which might need translation |
| themselves, independently of the strings contained in the program |
| itself. For example, RFC 1345 gives an English description for |
| each character which the `recode' program is able to reconstruct |
| at execution. Since these descriptions are extracted from the RFC |
| by mechanical means, translating them properly would require a |
| prior translation of the RFC itself. |
| |
| * Almost all programs accept options, which are often worded out so |
| to be descriptive for the English readers; one might want to |
| consider offering translated versions for program options as well. |
| |
| * Many programs read, interpret, compile, or are somewhat driven by |
| input files which are texts containing keywords, identifiers, or |
| replies which are inherently translatable. For example, one may |
| want `gcc' to allow diacriticized characters in identifiers or use |
| translated keywords; `rm -i' might accept something else than `y' |
| or `n' for replies, etc. Even if the program will eventually make |
| most of its output in the foreign languages, one has to decide |
| whether the input syntax, option values, etc., are to be localized |
| or not. |
| |
| * The manual accompanying a package, as well as all documentation |
| files in the distribution, could surely be translated, too. |
| Translating a manual, with the intent of later keeping up with |
| updates, is a major undertaking in itself, generally. |
| |
| |
| As we already stressed, translation is only one aspect of locales. |
| Other internationalization aspects are system services and are handled |
| in GNU `libc'. There are many attributes that are needed to define a |
| country's cultural conventions. These attributes include beside the |
| country's native language, the formatting of the date and time, the |
| representation of numbers, the symbols for currency, etc. These local |
| "rules" are termed the country's locale. The locale represents the |
| knowledge needed to support the country's native attributes. |
| |
| There are a few major areas which may vary between countries and |
| hence, define what a locale must describe. The following list helps |
| putting multi-lingual messages into the proper context of other tasks |
| related to locales. See the GNU `libc' manual for details. |
| |
| _Characters and Codesets_ |
| The codeset most commonly used through out the USA and most English |
| speaking parts of the world is the ASCII codeset. However, there |
| are many characters needed by various locales that are not found |
| within this codeset. The 8-bit ISO 8859-1 code set has most of |
| the special characters needed to handle the major European |
| languages. However, in many cases, choosing ISO 8859-1 is |
| nevertheless not adequate: it doesn't even handle the major |
| European currency. Hence each locale will need to specify which |
| codeset they need to use and will need to have the appropriate |
| character handling routines to cope with the codeset. |
| |
| _Currency_ |
| The symbols used vary from country to country as does the position |
| used by the symbol. Software needs to be able to transparently |
| display currency figures in the native mode for each locale. |
| |
| _Dates_ |
| The format of date varies between locales. For example, Christmas |
| day in 1994 is written as 12/25/94 in the USA and as 25/12/94 in |
| Australia. Other countries might use ISO 8601 dates, etc. |
| |
| Time of the day may be noted as HH:MM, HH.MM, or otherwise. Some |
| locales require time to be specified in 24-hour mode rather than |
| as AM or PM. Further, the nature and yearly extent of the |
| Daylight Saving correction vary widely between countries. |
| |
| _Numbers_ |
| Numbers can be represented differently in different locales. For |
| example, the following numbers are all written correctly for their |
| respective locales: |
| |
| 12,345.67 English |
| 12.345,67 German |
| 12345,67 French |
| 1,2345.67 Asia |
| |
| Some programs could go further and use different unit systems, like |
| English units or Metric units, or even take into account variants |
| about how numbers are spelled in full. |
| |
| _Messages_ |
| The most obvious area is the language support within a locale. |
| This is where GNU `gettext' provides the means for developers and |
| users to easily change the language that the software uses to |
| communicate to the user. |
| |
| |
| These areas of cultural conventions are called _locale categories_. |
| It is an unfortunate term; _locale aspects_ or _locale feature |
| categories_ would be a better term, because each "locale category" |
| describes an area or task that requires localization. The concrete data |
| that describes the cultural conventions for such an area and for a |
| particular culture is also called a _locale category_. In this sense, |
| a locale is composed of several locale categories: the locale category |
| describing the codeset, the locale category describing the formatting |
| of numbers, the locale category containing the translated messages, and |
| so on. |
| |
| Components of locale outside of message handling are standardized in |
| the ISO C standard and the POSIX:2001 standard (also known as the SUSV3 |
| specification). GNU `libc' fully implements this, and most other |
| modern systems provide a more or less reasonable support for at least |
| some of the missing components. |
| |
| |
| File: gettext.info, Node: Files, Next: Overview, Prev: Aspects, Up: Introduction |
| |
| 1.4 Files Conveying Translations |
| ================================ |
| |
| The letters PO in `.po' files means Portable Object, to distinguish |
| it from `.mo' files, where MO stands for Machine Object. This |
| paradigm, as well as the PO file format, is inspired by the NLS |
| standard developed by Uniforum, and first implemented by Sun in their |
| Solaris system. |
| |
| PO files are meant to be read and edited by humans, and associate |
| each original, translatable string of a given package with its |
| translation in a particular target language. A single PO file is |
| dedicated to a single target language. If a package supports many |
| languages, there is one such PO file per language supported, and each |
| package has its own set of PO files. These PO files are best created by |
| the `xgettext' program, and later updated or refreshed through the |
| `msgmerge' program. Program `xgettext' extracts all marked messages |
| from a set of C files and initializes a PO file with empty |
| translations. Program `msgmerge' takes care of adjusting PO files |
| between releases of the corresponding sources, commenting obsolete |
| entries, initializing new ones, and updating all source line |
| references. Files ending with `.pot' are kind of base translation |
| files found in distributions, in PO file format. |
| |
| MO files are meant to be read by programs, and are binary in nature. |
| A few systems already offer tools for creating and handling MO files as |
| part of the Native Language Support coming with the system, but the |
| format of these MO files is often different from system to system, and |
| non-portable. The tools already provided with these systems don't |
| support all the features of GNU `gettext'. Therefore GNU `gettext' |
| uses its own format for MO files. Files ending with `.gmo' are really |
| MO files, when it is known that these files use the GNU format. |
| |
| |
| File: gettext.info, Node: Overview, Prev: Files, Up: Introduction |
| |
| 1.5 Overview of GNU `gettext' |
| ============================= |
| |
| The following diagram summarizes the relation between the files |
| handled by GNU `gettext' and the tools acting on these files. It is |
| followed by somewhat detailed explanations, which you should read while |
| keeping an eye on the diagram. Having a clear understanding of these |
| interrelations will surely help programmers, translators and |
| maintainers. |
| |
| Original C Sources ---> Preparation ---> Marked C Sources ---. |
| | |
| .---------<--- GNU gettext Library | |
| .--- make <---+ | |
| | `---------<--------------------+---------------' |
| | | |
| | .-----<--- PACKAGE.pot <--- xgettext <---' .---<--- PO Compendium |
| | | | ^ |
| | | `---. | |
| | `---. +---> PO editor ---. |
| | +----> msgmerge ------> LANG.po ---->--------' | |
| | .---' | |
| | | | |
| | `-------------<---------------. | |
| | +--- New LANG.po <--------------------' |
| | .--- LANG.gmo <--- msgfmt <---' |
| | | |
| | `---> install ---> /.../LANG/PACKAGE.mo ---. |
| | +---> "Hello world!" |
| `-------> install ---> /.../bin/PROGRAM -------' |
| |
| As a programmer, the first step to bringing GNU `gettext' into your |
| package is identifying, right in the C sources, those strings which are |
| meant to be translatable, and those which are untranslatable. This |
| tedious job can be done a little more comfortably using emacs PO mode, |
| but you can use any means familiar to you for modifying your C sources. |
| Beside this some other simple, standard changes are needed to properly |
| initialize the translation library. *Note Sources::, for more |
| information about all this. |
| |
| For newly written software the strings of course can and should be |
| marked while writing it. The `gettext' approach makes this very easy. |
| Simply put the following lines at the beginning of each file or in a |
| central header file: |
| |
| #define _(String) (String) |
| #define N_(String) String |
| #define textdomain(Domain) |
| #define bindtextdomain(Package, Directory) |
| |
| Doing this allows you to prepare the sources for internationalization. |
| Later when you feel ready for the step to use the `gettext' library |
| simply replace these definitions by the following: |
| |
| #include <libintl.h> |
| #define _(String) gettext (String) |
| #define gettext_noop(String) String |
| #define N_(String) gettext_noop (String) |
| |
| and link against `libintl.a' or `libintl.so'. Note that on GNU |
| systems, you don't need to link with `libintl' because the `gettext' |
| library functions are already contained in GNU libc. That is all you |
| have to change. |
| |
| Once the C sources have been modified, the `xgettext' program is |
| used to find and extract all translatable strings, and create a PO |
| template file out of all these. This `PACKAGE.pot' file contains all |
| original program strings. It has sets of pointers to exactly where in |
| C sources each string is used. All translations are set to empty. The |
| letter `t' in `.pot' marks this as a Template PO file, not yet oriented |
| towards any particular language. *Note xgettext Invocation::, for more |
| details about how one calls the `xgettext' program. If you are |
| _really_ lazy, you might be interested at working a lot more right |
| away, and preparing the whole distribution setup (*note Maintainers::). |
| By doing so, you spare yourself typing the `xgettext' command, as `make' |
| should now generate the proper things automatically for you! |
| |
| The first time through, there is no `LANG.po' yet, so the `msgmerge' |
| step may be skipped and replaced by a mere copy of `PACKAGE.pot' to |
| `LANG.po', where LANG represents the target language. See *note |
| Creating:: for details. |
| |
| Then comes the initial translation of messages. Translation in |
| itself is a whole matter, still exclusively meant for humans, and whose |
| complexity far overwhelms the level of this manual. Nevertheless, a |
| few hints are given in some other chapter of this manual (*note |
| Translators::). You will also find there indications about how to |
| contact translating teams, or becoming part of them, for sharing your |
| translating concerns with others who target the same native language. |
| |
| While adding the translated messages into the `LANG.po' PO file, if |
| you are not using one of the dedicated PO file editors (*note |
| Editing::), you are on your own for ensuring that your efforts fully |
| respect the PO file format, and quoting conventions (*note PO Files::). |
| This is surely not an impossible task, as this is the way many people |
| have handled PO files around 1995. On the other hand, by using a PO |
| file editor, most details of PO file format are taken care of for you, |
| but you have to acquire some familiarity with PO file editor itself. |
| |
| If some common translations have already been saved into a compendium |
| PO file, translators may use PO mode for initializing untranslated |
| entries from the compendium, and also save selected translations into |
| the compendium, updating it (*note Compendium::). Compendium files are |
| meant to be exchanged between members of a given translation team. |
| |
| Programs, or packages of programs, are dynamic in nature: users write |
| bug reports and suggestion for improvements, maintainers react by |
| modifying programs in various ways. The fact that a package has |
| already been internationalized should not make maintainers shy of |
| adding new strings, or modifying strings already translated. They just |
| do their job the best they can. For the Translation Project to work |
| smoothly, it is important that maintainers do not carry translation |
| concerns on their already loaded shoulders, and that translators be |
| kept as free as possible of programming concerns. |
| |
| The only concern maintainers should have is carefully marking new |
| strings as translatable, when they should be, and do not otherwise |
| worry about them being translated, as this will come in proper time. |
| Consequently, when programs and their strings are adjusted in various |
| ways by maintainers, and for matters usually unrelated to translation, |
| `xgettext' would construct `PACKAGE.pot' files which are evolving over |
| time, so the translations carried by `LANG.po' are slowly fading out of |
| date. |
| |
| It is important for translators (and even maintainers) to understand |
| that package translation is a continuous process in the lifetime of a |
| package, and not something which is done once and for all at the start. |
| After an initial burst of translation activity for a given package, |
| interventions are needed once in a while, because here and there, |
| translated entries become obsolete, and new untranslated entries |
| appear, needing translation. |
| |
| The `msgmerge' program has the purpose of refreshing an already |
| existing `LANG.po' file, by comparing it with a newer `PACKAGE.pot' |
| template file, extracted by `xgettext' out of recent C sources. The |
| refreshing operation adjusts all references to C source locations for |
| strings, since these strings move as programs are modified. Also, |
| `msgmerge' comments out as obsolete, in `LANG.po', those already |
| translated entries which are no longer used in the program sources |
| (*note Obsolete Entries::). It finally discovers new strings and |
| inserts them in the resulting PO file as untranslated entries (*note |
| Untranslated Entries::). *Note msgmerge Invocation::, for more |
| information about what `msgmerge' really does. |
| |
| Whatever route or means taken, the goal is to obtain an updated |
| `LANG.po' file offering translations for all strings. |
| |
| The temporal mobility, or fluidity of PO files, is an integral part |
| of the translation game, and should be well understood, and accepted. |
| People resisting it will have a hard time participating in the |
| Translation Project, or will give a hard time to other participants! In |
| particular, maintainers should relax and include all available official |
| PO files in their distributions, even if these have not recently been |
| updated, without exerting pressure on the translator teams to get the |
| job done. The pressure should rather come from the community of users |
| speaking a particular language, and maintainers should consider |
| themselves fairly relieved of any concern about the adequacy of |
| translation files. On the other hand, translators should reasonably |
| try updating the PO files they are responsible for, while the package |
| is undergoing pretest, prior to an official distribution. |
| |
| Once the PO file is complete and dependable, the `msgfmt' program is |
| used for turning the PO file into a machine-oriented format, which may |
| yield efficient retrieval of translations by the programs of the |
| package, whenever needed at runtime (*note MO Files::). *Note msgfmt |
| Invocation::, for more information about all modes of execution for the |
| `msgfmt' program. |
| |
| Finally, the modified and marked C sources are compiled and linked |
| with the GNU `gettext' library, usually through the operation of |
| `make', given a suitable `Makefile' exists for the project, and the |
| resulting executable is installed somewhere users will find it. The MO |
| files themselves should also be properly installed. Given the |
| appropriate environment variables are set (*note Setting the POSIX |
| Locale::), the program should localize itself automatically, whenever |
| it executes. |
| |
| The remainder of this manual has the purpose of explaining in depth |
| the various steps outlined above. |
| |
| |
| File: gettext.info, Node: Users, Next: PO Files, Prev: Introduction, Up: Top |
| |
| 2 The User's View |
| ***************** |
| |
| Nowadays, when users log into a computer, they usually find that all |
| their programs show messages in their native language - at least for |
| users of languages with an active free software community, like French |
| or German; to a lesser extent for languages with a smaller |
| participation in free software and the GNU project, like Hindi and |
| Filipino. |
| |
| How does this work? How can the user influence the language that is |
| used by the programs? This chapter will answer it. |
| |
| * Menu: |
| |
| * System Installation:: Questions During Operating System Installation |
| * Setting the GUI Locale:: How to Specify the Locale Used by GUI Programs |
| * Setting the POSIX Locale:: How to Specify the Locale According to POSIX |
| * Installing Localizations:: How to Install Additional Translations |
| |
| |
| File: gettext.info, Node: System Installation, Next: Setting the GUI Locale, Prev: Users, Up: Users |
| |
| 2.1 Operating System Installation |
| ================================= |
| |
| The default language is often already specified during operating |
| system installation. When the operating system is installed, the |
| installer typically asks for the language used for the installation |
| process and, separately, for the language to use in the installed |
| system. Some OS installers only ask for the language once. |
| |
| This determines the system-wide default language for all users. But |
| the installers often give the possibility to install extra |
| localizations for additional languages. For example, the localizations |
| of KDE (the K Desktop Environment) and OpenOffice.org are often bundled |
| separately, as one installable package per language. |
| |
| At this point it is good to consider the intended use of the |
| machine: If it is a machine designated for personal use, additional |
| localizations are probably not necessary. If, however, the machine is |
| in use in an organization or company that has international |
| relationships, one can consider the needs of guest users. If you have |
| a guest from abroad, for a week, what could be his preferred locales? |
| It may be worth installing these additional localizations ahead of |
| time, since they cost only a bit of disk space at this point. |
| |
| The system-wide default language is the locale configuration that is |
| used when a new user account is created. But the user can have his own |
| locale configuration that is different from the one of the other users |
| of the same machine. He can specify it, typically after the first |
| login, as described in the next section. |
| |
| |
| File: gettext.info, Node: Setting the GUI Locale, Next: Setting the POSIX Locale, Prev: System Installation, Up: Users |
| |
| 2.2 Setting the Locale Used by GUI Programs |
| =========================================== |
| |
| The immediately available programs in a user's desktop come from a |
| group of programs called a "desktop environment"; it usually includes |
| the window manager, a web browser, a text editor, and more. The most |
| common free desktop environments are KDE, GNOME, and Xfce. |
| |
| The locale used by GUI programs of the desktop environment can be |
| specified in a configuration screen called "control center", "language |
| settings" or "country settings". |
| |
| Individual GUI programs that are not part of the desktop environment |
| can have their locale specified either in a settings panel, or through |
| environment variables. |
| |
| For some programs, it is possible to specify the locale through |
| environment variables, possibly even to a different locale than the |
| desktop's locale. This means, instead of starting a program through a |
| menu or from the file system, you can start it from the command-line, |
| after having set some environment variables. The environment variables |
| can be those specified in the next section (*note Setting the POSIX |
| Locale::); for some versions of KDE, however, the locale is specified |
| through a variable `KDE_LANG', rather than `LANG' or `LC_ALL'. |
| |
| |
| File: gettext.info, Node: Setting the POSIX Locale, Next: Installing Localizations, Prev: Setting the GUI Locale, Up: Users |
| |
| 2.3 Setting the Locale through Environment Variables |
| ==================================================== |
| |
| As a user, if your language has been installed for this package, in |
| the simplest case, you only have to set the `LANG' environment variable |
| to the appropriate `LL_CC' combination. For example, let's suppose |
| that you speak German and live in Germany. At the shell prompt, merely |
| execute `setenv LANG de_DE' (in `csh'), `export LANG; LANG=de_DE' (in |
| `sh') or `export LANG=de_DE' (in `bash'). This can be done from your |
| `.login' or `.profile' file, once and for all. |
| |
| * Menu: |
| |
| * Locale Names:: How a Locale Specification Looks Like |
| * Locale Environment Variables:: Which Environment Variable Specfies What |
| * The LANGUAGE variable:: How to Specify a Priority List of Languages |
| |
| |
| File: gettext.info, Node: Locale Names, Next: Locale Environment Variables, Prev: Setting the POSIX Locale, Up: Setting the POSIX Locale |
| |
| 2.3.1 Locale Names |
| ------------------ |
| |
| A locale name usually has the form `LL_CC'. Here `LL' is an ISO 639 |
| two-letter language code, and `CC' is an ISO 3166 two-letter country |
| code. For example, for German in Germany, LL is `de', and CC is `DE'. |
| You find a list of the language codes in appendix *note Language |
| Codes:: and a list of the country codes in appendix *note Country |
| Codes::. |
| |
| You might think that the country code specification is redundant. |
| But in fact, some languages have dialects in different countries. For |
| example, `de_AT' is used for Austria, and `pt_BR' for Brazil. The |
| country code serves to distinguish the dialects. |
| |
| Many locale names have an extended syntax `LL_CC.ENCODING' that also |
| specifies the character encoding. These are in use because between |
| 2000 and 2005, most users have switched to locales in UTF-8 encoding. |
| For example, the German locale on glibc systems is nowadays |
| `de_DE.UTF-8'. The older name `de_DE' still refers to the German |
| locale as of 2000 that stores characters in ISO-8859-1 encoding - a |
| text encoding that cannot even accomodate the Euro currency sign. |
| |
| Some locale names use `LL_CC.@VARIANT' instead of `LL_CC'. The |
| `@VARIANT' can denote any kind of characteristics that is not already |
| implied by the language LL and the country CC. It can denote a |
| particular monetary unit. For example, on glibc systems, `de_DE@euro' |
| denotes the locale that uses the Euro currency, in contrast to the |
| older locale `de_DE' which implies the use of the currency before 2002. |
| It can also denote a dialect of the language, or the script used to |
| write text (for example, `sr_RS@latin' uses the Latin script, whereas |
| `sr_RS' uses the Cyrillic script to write Serbian), or the orthography |
| rules, or similar. |
| |
| On other systems, some variations of this scheme are used, such as |
| `LL'. You can get the list of locales supported by your system for |
| your language by running the command `locale -a | grep '^LL''. |
| |
| There is also a special locale, called `C'. When it is used, it |
| disables all localization: in this locale, all programs standardized by |
| POSIX use English messages and an unspecified character encoding (often |
| US-ASCII, but sometimes also ISO-8859-1 or UTF-8, depending on the |
| operating system). |
| |
| |
| File: gettext.info, Node: Locale Environment Variables, Next: The LANGUAGE variable, Prev: Locale Names, Up: Setting the POSIX Locale |
| |
| 2.3.2 Locale Environment Variables |
| ---------------------------------- |
| |
| A locale is composed of several _locale categories_, see *note |
| Aspects::. When a program looks up locale dependent values, it does |
| this according to the following environment variables, in priority |
| order: |
| |
| 1. `LANGUAGE' |
| |
| 2. `LC_ALL' |
| |
| 3. `LC_xxx', according to selected locale category: `LC_CTYPE', |
| `LC_NUMERIC', `LC_TIME', `LC_COLLATE', `LC_MONETARY', |
| `LC_MESSAGES', ... |
| |
| 4. `LANG' |
| |
| Variables whose value is set but is empty are ignored in this lookup. |
| |
| `LANG' is the normal environment variable for specifying a locale. |
| As a user, you normally set this variable (unless some of the other |
| variables have already been set by the system, in `/etc/profile' or |
| similar initialization files). |
| |
| `LC_CTYPE', `LC_NUMERIC', `LC_TIME', `LC_COLLATE', `LC_MONETARY', |
| `LC_MESSAGES', and so on, are the environment variables meant to |
| override `LANG' and affecting a single locale category only. For |
| example, assume you are a Swedish user in Spain, and you want your |
| programs to handle numbers and dates according to Spanish conventions, |
| and only the messages should be in Swedish. Then you could create a |
| locale named `sv_ES' or `sv_ES.UTF-8' by use of the `localedef' |
| program. But it is simpler, and achieves the same effect, to set the |
| `LANG' variable to `es_ES.UTF-8' and the `LC_MESSAGES' variable to |
| `sv_SE.UTF-8'; these two locales come already preinstalled with the |
| operating system. |
| |
| `LC_ALL' is an environment variable that overrides all of these. It |
| is typically used in scripts that run particular programs. For example, |
| `configure' scripts generated by GNU autoconf use `LC_ALL' to make sure |
| that the configuration tests don't operate in locale dependent ways. |
| |
| Some systems, unfortunately, set `LC_ALL' in `/etc/profile' or in |
| similar initialization files. As a user, you therefore have to unset |
| this variable if you want to set `LANG' and optionally some of the other |
| `LC_xxx' variables. |
| |
| The `LANGUAGE' variable is described in the next subsection. |
| |
| |
| File: gettext.info, Node: The LANGUAGE variable, Prev: Locale Environment Variables, Up: Setting the POSIX Locale |
| |
| 2.3.3 Specifying a Priority List of Languages |
| --------------------------------------------- |
| |
| Not all programs have translations for all languages. By default, an |
| English message is shown in place of a nonexistent translation. If you |
| understand other languages, you can set up a priority list of languages. |
| This is done through a different environment variable, called |
| `LANGUAGE'. GNU `gettext' gives preference to `LANGUAGE' over `LC_ALL' |
| and `LANG' for the purpose of message handling, but you still need to |
| have `LANG' (or `LC_ALL') set to the primary language; this is required |
| by other parts of the system libraries. For example, some Swedish |
| users who would rather read translations in German than English for |
| when Swedish is not available, set `LANGUAGE' to `sv:de' while leaving |
| `LANG' to `sv_SE'. |
| |
| Special advice for Norwegian users: The language code for Norwegian |
| bokma*l changed from `no' to `nb' recently (in 2003). During the |
| transition period, while some message catalogs for this language are |
| installed under `nb' and some older ones under `no', it is recommended |
| for Norwegian users to set `LANGUAGE' to `nb:no' so that both newer and |
| older translations are used. |
| |
| In the `LANGUAGE' environment variable, but not in the other |
| environment variables, `LL_CC' combinations can be abbreviated as `LL' |
| to denote the language's main dialect. For example, `de' is equivalent |
| to `de_DE' (German as spoken in Germany), and `pt' to `pt_PT' |
| (Portuguese as spoken in Portugal) in this context. |
| |
| Note: The variable `LANGUAGE' is ignored if the locale is set to |
| `C'. In other words, you have to first enable localization, by setting |
| `LANG' (or `LC_ALL') to a value other than `C', before you can use a |
| language priority list through the `LANGUAGE' variable. |
| |
| |
| File: gettext.info, Node: Installing Localizations, Prev: Setting the POSIX Locale, Up: Users |
| |
| 2.4 Installing Translations for Particular Programs |
| =================================================== |
| |
| Languages are not equally well supported in all packages using GNU |
| `gettext', and more translations are added over time. Usually, you use |
| the translations that are shipped with the operating system or with |
| particular packages that you install afterwards. But you can also |
| install newer localizations directly. For doing this, you will need an |
| understanding where each localization file is stored on the file system. |
| |
| For programs that participate in the Translation Project, you can |
| start looking for translations here: |
| `http://translationproject.org/team/index.html'. A snapshot of this |
| information is also found in the `ABOUT-NLS' file that is shipped with |
| GNU gettext. |
| |
| For programs that are part of the KDE project, the starting point is: |
| `http://i18n.kde.org/'. |
| |
| For programs that are part of the GNOME project, the starting point |
| is: `http://www.gnome.org/i18n/'. |
| |
| For other programs, you may check whether the program's source code |
| package contains some `LL.po' files; often they are kept together in a |
| directory called `po/'. Each `LL.po' file contains the message |
| translations for the language whose abbreviation of LL. |
| |
| |
| File: gettext.info, Node: PO Files, Next: Sources, Prev: Users, Up: Top |
| |
| 3 The Format of PO Files |
| ************************ |
| |
| The GNU `gettext' toolset helps programmers and translators at |
| producing, updating and using translation files, mainly those PO files |
| which are textual, editable files. This chapter explains the format of |
| PO files. |
| |
| A PO file is made up of many entries, each entry holding the relation |
| between an original untranslated string and its corresponding |
| translation. All entries in a given PO file usually pertain to a |
| single project, and all translations are expressed in a single target |
| language. One PO file "entry" has the following schematic structure: |
| |
| WHITE-SPACE |
| # TRANSLATOR-COMMENTS |
| #. EXTRACTED-COMMENTS |
| #: REFERENCE... |
| #, FLAG... |
| #| msgid PREVIOUS-UNTRANSLATED-STRING |
| msgid UNTRANSLATED-STRING |
| msgstr TRANSLATED-STRING |
| |
| The general structure of a PO file should be well understood by the |
| translator. When using PO mode, very little has to be known about the |
| format details, as PO mode takes care of them for her. |
| |
| A simple entry can look like this: |
| |
| #: lib/error.c:116 |
| msgid "Unknown system error" |
| msgstr "Error desconegut del sistema" |
| |
| Entries begin with some optional white space. Usually, when |
| generated through GNU `gettext' tools, there is exactly one blank line |
| between entries. Then comments follow, on lines all starting with the |
| character `#'. There are two kinds of comments: those which have some |
| white space immediately following the `#' - the TRANSLATOR COMMENTS -, |
| which comments are created and maintained exclusively by the |
| translator, and those which have some non-white character just after the |
| `#' - the AUTOMATIC COMMENTS -, which comments are created and |
| maintained automatically by GNU `gettext' tools. Comment lines |
| starting with `#.' contain comments given by the programmer, directed |
| at the translator; these comments are called EXTRACTED COMMENTS because |
| the `xgettext' program extracts them from the program's source code. |
| Comment lines starting with `#:' contain references to the program's |
| source code. Comment lines starting with `#,' contain flags; more |
| about these below. Comment lines starting with `#|' contain the |
| previous untranslated string for which the translator gave a |
| translation. |
| |
| All comments, of either kind, are optional. |
| |
| After white space and comments, entries show two strings, namely |
| first the untranslated string as it appears in the original program |
| sources, and then, the translation of this string. The original string |
| is introduced by the keyword `msgid', and the translation, by `msgstr'. |
| The two strings, untranslated and translated, are quoted in various |
| ways in the PO file, using `"' delimiters and `\' escapes, but the |
| translator does not really have to pay attention to the precise quoting |
| format, as PO mode fully takes care of quoting for her. |
| |
| The `msgid' strings, as well as automatic comments, are produced and |
| managed by other GNU `gettext' tools, and PO mode does not provide |
| means for the translator to alter these. The most she can do is merely |
| deleting them, and only by deleting the whole entry. On the other |
| hand, the `msgstr' string, as well as translator comments, are really |
| meant for the translator, and PO mode gives her the full control she |
| needs. |
| |
| The comment lines beginning with `#,' are special because they are |
| not completely ignored by the programs as comments generally are. The |
| comma separated list of FLAGs is used by the `msgfmt' program to give |
| the user some better diagnostic messages. Currently there are two |
| forms of flags defined: |
| |
| `fuzzy' |
| This flag can be generated by the `msgmerge' program or it can be |
| inserted by the translator herself. It shows that the `msgstr' |
| string might not be a correct translation (anymore). Only the |
| translator can judge if the translation requires further |
| modification, or is acceptable as is. Once satisfied with the |
| translation, she then removes this `fuzzy' attribute. The |
| `msgmerge' program inserts this when it combined the `msgid' and |
| `msgstr' entries after fuzzy search only. *Note Fuzzy Entries::. |
| |
| `c-format' |
| `no-c-format' |
| These flags should not be added by a human. Instead only the |
| `xgettext' program adds them. In an automated PO file processing |
| system as proposed here, the user's changes would be thrown away |
| again as soon as the `xgettext' program generates a new template |
| file. |
| |
| The `c-format' flag indicates that the untranslated string and the |
| translation are supposed to be C format strings. The `no-c-format' |
| flag indicates that they are not C format strings, even though the |
| untranslated string happens to look like a C format string (with |
| `%' directives). |
| |
| When the `c-format' flag is given for a string the `msgfmt' |
| program does some more tests to check the validity of the |
| translation. *Note msgfmt Invocation::, *note c-format Flag:: and |
| *note c-format::. |
| |
| `objc-format' |
| `no-objc-format' |
| Likewise for Objective C, see *note objc-format::. |
| |
| `sh-format' |
| `no-sh-format' |
| Likewise for Shell, see *note sh-format::. |
| |
| `python-format' |
| `no-python-format' |
| Likewise for Python, see *note python-format::. |
| |
| `lisp-format' |
| `no-lisp-format' |
| Likewise for Lisp, see *note lisp-format::. |
| |
| `elisp-format' |
| `no-elisp-format' |
| Likewise for Emacs Lisp, see *note elisp-format::. |
| |
| `librep-format' |
| `no-librep-format' |
| Likewise for librep, see *note librep-format::. |
| |
| `scheme-format' |
| `no-scheme-format' |
| Likewise for Scheme, see *note scheme-format::. |
| |
| `smalltalk-format' |
| `no-smalltalk-format' |
| Likewise for Smalltalk, see *note smalltalk-format::. |
| |
| `java-format' |
| `no-java-format' |
| Likewise for Java, see *note java-format::. |
| |
| `csharp-format' |
| `no-csharp-format' |
| Likewise for C#, see *note csharp-format::. |
| |
| `awk-format' |
| `no-awk-format' |
| Likewise for awk, see *note awk-format::. |
| |
| `object-pascal-format' |
| `no-object-pascal-format' |
| Likewise for Object Pascal, see *note object-pascal-format::. |
| |
| `ycp-format' |
| `no-ycp-format' |
| Likewise for YCP, see *note ycp-format::. |
| |
| `tcl-format' |
| `no-tcl-format' |
| Likewise for Tcl, see *note tcl-format::. |
| |
| `perl-format' |
| `no-perl-format' |
| Likewise for Perl, see *note perl-format::. |
| |
| `perl-brace-format' |
| `no-perl-brace-format' |
| Likewise for Perl brace, see *note perl-format::. |
| |
| `php-format' |
| `no-php-format' |
| Likewise for PHP, see *note php-format::. |
| |
| `gcc-internal-format' |
| `no-gcc-internal-format' |
| Likewise for the GCC sources, see *note gcc-internal-format::. |
| |
| `gfc-internal-format' |
| `no-gfc-internal-format' |
| Likewise for the GNU Fortran Compiler sources, see *note |
| gfc-internal-format::. |
| |
| `qt-format' |
| `no-qt-format' |
| Likewise for Qt, see *note qt-format::. |
| |
| `qt-plural-format' |
| `no-qt-plural-format' |
| Likewise for Qt plural forms, see *note qt-plural-format::. |
| |
| `kde-format' |
| `no-kde-format' |
| Likewise for KDE, see *note kde-format::. |
| |
| `boost-format' |
| `no-boost-format' |
| Likewise for Boost, see *note boost-format::. |
| |
| |
| It is also possible to have entries with a context specifier. They |
| look like this: |
| |
| WHITE-SPACE |
| # TRANSLATOR-COMMENTS |
| #. EXTRACTED-COMMENTS |
| #: REFERENCE... |
| #, FLAG... |
| #| msgctxt PREVIOUS-CONTEXT |
| #| msgid PREVIOUS-UNTRANSLATED-STRING |
| msgctxt CONTEXT |
| msgid UNTRANSLATED-STRING |
| msgstr TRANSLATED-STRING |
| |
| The context serves to disambiguate messages with the same |
| UNTRANSLATED-STRING. It is possible to have several entries with the |
| same UNTRANSLATED-STRING in a PO file, provided that they each have a |
| different CONTEXT. Note that an empty CONTEXT string and an absent |
| `msgctxt' line do not mean the same thing. |
| |
| A different kind of entries is used for translations which involve |
| plural forms. |
| |
| WHITE-SPACE |
| # TRANSLATOR-COMMENTS |
| #. EXTRACTED-COMMENTS |
| #: REFERENCE... |
| #, FLAG... |
| #| msgid PREVIOUS-UNTRANSLATED-STRING-SINGULAR |
| #| msgid_plural PREVIOUS-UNTRANSLATED-STRING-PLURAL |
| msgid UNTRANSLATED-STRING-SINGULAR |
| msgid_plural UNTRANSLATED-STRING-PLURAL |
| msgstr[0] TRANSLATED-STRING-CASE-0 |
| ... |
| msgstr[N] TRANSLATED-STRING-CASE-N |
| |
| Such an entry can look like this: |
| |
| #: src/msgcmp.c:338 src/po-lex.c:699 |
| #, c-format |
| msgid "found %d fatal error" |
| msgid_plural "found %d fatal errors" |
| msgstr[0] "s'ha trobat %d error fatal" |
| msgstr[1] "s'han trobat %d errors fatals" |
| |
| Here also, a `msgctxt' context can be specified before `msgid', like |
| above. |
| |
| Here, additional kinds of flags can be used: |
| |
| `range:' |
| This flag is followed by a range of non-negative numbers, using |
| the syntax `range: MINIMUM-VALUE..MAXIMUM-VALUE'. It designates |
| the possible values that the numeric parameter of the message can |
| take. In some languages, translators may produce slightly better |
| translations if they know that the value can only take on values |
| between 0 and 10, for example. |
| |
| The PREVIOUS-UNTRANSLATED-STRING is optionally inserted by the |
| `msgmerge' program, at the same time when it marks a message fuzzy. It |
| helps the translator to see which changes were done by the developers |
| on the UNTRANSLATED-STRING. |
| |
| It happens that some lines, usually whitespace or comments, follow |
| the very last entry of a PO file. Such lines are not part of any entry, |
| and will be dropped when the PO file is processed by the tools, or may |
| disturb some PO file editors. |
| |
| The remainder of this section may be safely skipped by those using a |
| PO file editor, yet it may be interesting for everybody to have a better |
| idea of the precise format of a PO file. On the other hand, those |
| wishing to modify PO files by hand should carefully continue reading on. |
| |
| Each of UNTRANSLATED-STRING and TRANSLATED-STRING respects the C |
| syntax for a character string, including the surrounding quotes and |
| embedded backslashed escape sequences. When the time comes to write |
| multi-line strings, one should not use escaped newlines. Instead, a |
| closing quote should follow the last character on the line to be |
| continued, and an opening quote should resume the string at the |
| beginning of the following PO file line. For example: |
| |
| msgid "" |
| "Here is an example of how one might continue a very long string\n" |
| "for the common case the string represents multi-line output.\n" |
| |
| In this example, the empty string is used on the first line, to allow |
| better alignment of the `H' from the word `Here' over the `f' from the |
| word `for'. In this example, the `msgid' keyword is followed by three |
| strings, which are meant to be concatenated. Concatenating the empty |
| string does not change the resulting overall string, but it is a way |
| for us to comply with the necessity of `msgid' to be followed by a |
| string on the same line, while keeping the multi-line presentation |
| left-justified, as we find this to be a cleaner disposition. The empty |
| string could have been omitted, but only if the string starting with |
| `Here' was promoted on the first line, right after `msgid'.(1) It was |
| not really necessary either to switch between the two last quoted |
| strings immediately after the newline `\n', the switch could have |
| occurred after _any_ other character, we just did it this way because |
| it is neater. |
| |
| One should carefully distinguish between end of lines marked as `\n' |
| _inside_ quotes, which are part of the represented string, and end of |
| lines in the PO file itself, outside string quotes, which have no |
| incidence on the represented string. |
| |
| Outside strings, white lines and comments may be used freely. |
| Comments start at the beginning of a line with `#' and extend until the |
| end of the PO file line. Comments written by translators should have |
| the initial `#' immediately followed by some white space. If the `#' |
| is not immediately followed by white space, this comment is most likely |
| generated and managed by specialized GNU tools, and might disappear or |
| be replaced unexpectedly when the PO file is given to `msgmerge'. |
| |
| ---------- Footnotes ---------- |
| |
| (1) This limitation is not imposed by GNU `gettext', but is for |
| compatibility with the `msgfmt' implementation on Solaris. |
| |
| |
| File: gettext.info, Node: Sources, Next: Template, Prev: PO Files, Up: Top |
| |
| 4 Preparing Program Sources |
| *************************** |
| |
| For the programmer, changes to the C source code fall into three |
| categories. First, you have to make the localization functions known |
| to all modules needing message translation. Second, you should |
| properly trigger the operation of GNU `gettext' when the program |
| initializes, usually from the `main' function. Last, you should |
| identify, adjust and mark all constant strings in your program needing |
| translation. |
| |
| * Menu: |
| |
| * Importing:: Importing the `gettext' declaration |
| * Triggering:: Triggering `gettext' Operations |
| * Preparing Strings:: Preparing Translatable Strings |
| * Mark Keywords:: How Marks Appear in Sources |
| * Marking:: Marking Translatable Strings |
| * c-format Flag:: Telling something about the following string |
| * Special cases:: Special Cases of Translatable Strings |
| * Bug Report Address:: Letting Users Report Translation Bugs |
| * Names:: Marking Proper Names for Translation |
| * Libraries:: Preparing Library Sources |
| |
| |
| File: gettext.info, Node: Importing, Next: Triggering, Prev: Sources, Up: Sources |
| |
| 4.1 Importing the `gettext' declaration |
| ======================================= |
| |
| Presuming that your set of programs, or package, has been adjusted |
| so all needed GNU `gettext' files are available, and your `Makefile' |
| files are adjusted (*note Maintainers::), each C module having |
| translated C strings should contain the line: |
| |
| #include <libintl.h> |
| |
| Similarly, each C module containing `printf()'/`fprintf()'/... |
| calls with a format string that could be a translated C string (even if |
| the C string comes from a different C module) should contain the line: |
| |
| #include <libintl.h> |
| |
| |
| File: gettext.info, Node: Triggering, Next: Preparing Strings, Prev: Importing, Up: Sources |
| |
| 4.2 Triggering `gettext' Operations |
| =================================== |
| |
| The initialization of locale data should be done with more or less |
| the same code in every program, as demonstrated below: |
| |
| int |
| main (int argc, char *argv[]) |
| { |
| ... |
| setlocale (LC_ALL, ""); |
| bindtextdomain (PACKAGE, LOCALEDIR); |
| textdomain (PACKAGE); |
| ... |
| } |
| |
| PACKAGE and LOCALEDIR should be provided either by `config.h' or by |
| the Makefile. For now consult the `gettext' or `hello' sources for |
| more information. |
| |
| The use of `LC_ALL' might not be appropriate for you. `LC_ALL' |
| includes all locale categories and especially `LC_CTYPE'. This latter |
| category is responsible for determining character classes with the |
| `isalnum' etc. functions from `ctype.h' which could especially for |
| programs, which process some kind of input language, be wrong. For |
| example this would mean that a source code using the ç (c-cedilla |
| character) is runnable in France but not in the U.S. |
| |
| Some systems also have problems with parsing numbers using the |
| `scanf' functions if an other but the `LC_ALL' locale category is used. |
| The standards say that additional formats but the one known in the |
| `"C"' locale might be recognized. But some systems seem to reject |
| numbers in the `"C"' locale format. In some situation, it might also |
| be a problem with the notation itself which makes it impossible to |
| recognize whether the number is in the `"C"' locale or the local |
| format. This can happen if thousands separator characters are used. |
| Some locales define this character according to the national |
| conventions to `'.'' which is the same character used in the `"C"' |
| locale to denote the decimal point. |
| |
| So it is sometimes necessary to replace the `LC_ALL' line in the |
| code above by a sequence of `setlocale' lines |
| |
| { |
| ... |
| setlocale (LC_CTYPE, ""); |
| setlocale (LC_MESSAGES, ""); |
| ... |
| } |
| |
| On all POSIX conformant systems the locale categories `LC_CTYPE', |
| `LC_MESSAGES', `LC_COLLATE', `LC_MONETARY', `LC_NUMERIC', and `LC_TIME' |
| are available. On some systems which are only ISO C compliant, |
| `LC_MESSAGES' is missing, but a substitute for it is defined in GNU |
| gettext's `<libintl.h>' and in GNU gnulib's `<locale.h>'. |
| |
| Note that changing the `LC_CTYPE' also affects the functions |
| declared in the `<ctype.h>' standard header and some functions declared |
| in the `<string.h>' and `<stdlib.h>' standard headers. If this is not |
| desirable in your application (for example in a compiler's parser), you |
| can use a set of substitute functions which hardwire the C locale, such |
| as found in the modules `c-ctype', `c-strcase', `c-strcasestr', |
| `c-strtod', `c-strtold' in the GNU gnulib source distribution. |
| |
| It is also possible to switch the locale forth and back between the |
| environment dependent locale and the C locale, but this approach is |
| normally avoided because a `setlocale' call is expensive, because it is |
| tedious to determine the places where a locale switch is needed in a |
| large program's source, and because switching a locale is not |
| multithread-safe. |
| |
| |
| File: gettext.info, Node: Preparing Strings, Next: Mark Keywords, Prev: Triggering, Up: Sources |
| |
| 4.3 Preparing Translatable Strings |
| ================================== |
| |
| Before strings can be marked for translations, they sometimes need to |
| be adjusted. Usually preparing a string for translation is done right |
| before marking it, during the marking phase which is described in the |
| next sections. What you have to keep in mind while doing that is the |
| following. |
| |
| * Decent English style. |
| |
| * Entire sentences. |
| |
| * Split at paragraphs. |
| |
| * Use format strings instead of string concatenation. |
| |
| * Avoid unusual markup and unusual control characters. |
| |
| Let's look at some examples of these guidelines. |
| |
| Translatable strings should be in good English style. If slang |
| language with abbreviations and shortcuts is used, often translators |
| will not understand the message and will produce very inappropriate |
| translations. |
| |
| "%s: is parameter\n" |
| |
| This is nearly untranslatable: Is the displayed item _a_ parameter or |
| _the_ parameter? |
| |
| "No match" |
| |
| The ambiguity in this message makes it unintelligible: Is the program |
| attempting to set something on fire? Does it mean "The given object does |
| not match the template"? Does it mean "The template does not fit for any |
| of the objects"? |
| |
| In both cases, adding more words to the message will help both the |
| translator and the English speaking user. |
| |
| Translatable strings should be entire sentences. It is often not |
| possible to translate single verbs or adjectives in a substitutable way. |
| |
| printf ("File %s is %s protected", filename, rw ? "write" : "read"); |
| |
| Most translators will not look at the source and will thus only see the |
| string `"File %s is %s protected"', which is unintelligible. Change |
| this to |
| |
| printf (rw ? "File %s is write protected" : "File %s is read protected", |
| filename); |
| |
| This way the translator will not only understand the message, she will |
| also be able to find the appropriate grammatical construction. A French |
| translator for example translates "write protected" like "protected |
| against writing". |
| |
| Entire sentences are also important because in many languages, the |
| declination of some word in a sentence depends on the gender or the |
| number (singular/plural) of another part of the sentence. There are |
| usually more interdependencies between words than in English. The |
| consequence is that asking a translator to translate two half-sentences |
| and then combining these two half-sentences through dumb string |
| concatenation will not work, for many languages, even though it would |
| work for English. That's why translators need to handle entire |
| sentences. |
| |
| Often sentences don't fit into a single line. If a sentence is |
| output using two subsequent `printf' statements, like this |
| |
| printf ("Locale charset \"%s\" is different from\n", lcharset); |
| printf ("input file charset \"%s\".\n", fcharset); |
| |
| the translator would have to translate two half sentences, but nothing |
| in the POT file would tell her that the two half sentences belong |
| together. It is necessary to merge the two `printf' statements so that |
| the translator can handle the entire sentence at once and decide at |
| which place to insert a line break in the translation (if at all): |
| |
| printf ("Locale charset \"%s\" is different from\n\ |
| input file charset \"%s\".\n", lcharset, fcharset); |
| |
| You may now ask: how about two or more adjacent sentences? Like in |
| this case: |
| |
| puts ("Apollo 13 scenario: Stack overflow handling failed."); |
| puts ("On the next stack overflow we will crash!!!"); |
| |
| Should these two statements merged into a single one? I would recommend |
| to merge them if the two sentences are related to each other, because |
| then it makes it easier for the translator to understand and translate |
| both. On the other hand, if one of the two messages is a stereotypic |
| one, occurring in other places as well, you will do a favour to the |
| translator by not merging the two. (Identical messages occurring in |
| several places are combined by xgettext, so the translator has to |
| handle them once only.) |
| |
| Translatable strings should be limited to one paragraph; don't let a |
| single message be longer than ten lines. The reason is that when the |
| translatable string changes, the translator is faced with the task of |
| updating the entire translated string. Maybe only a single word will |
| have changed in the English string, but the translator doesn't see that |
| (with the current translation tools), therefore she has to proofread |
| the entire message. |
| |
| Many GNU programs have a `--help' output that extends over several |
| screen pages. It is a courtesy towards the translators to split such a |
| message into several ones of five to ten lines each. While doing that, |
| you can also attempt to split the documented options into groups, such |
| as the input options, the output options, and the informative output |
| options. This will help every user to find the option he is looking |
| for. |
| |
| Hardcoded string concatenation is sometimes used to construct English |
| strings: |
| |
| strcpy (s, "Replace "); |
| strcat (s, object1); |
| strcat (s, " with "); |
| strcat (s, object2); |
| strcat (s, "?"); |
| |
| In order to present to the translator only entire sentences, and also |
| because in some languages the translator might want to swap the order |
| of `object1' and `object2', it is necessary to change this to use a |
| format string: |
| |
| sprintf (s, "Replace %s with %s?", object1, object2); |
| |
| A similar case is compile time concatenation of strings. The ISO C |
| 99 include file `<inttypes.h>' contains a macro `PRId64' that can be |
| used as a formatting directive for outputting an `int64_t' integer |
| through `printf'. It expands to a constant string, usually "d" or "ld" |
| or "lld" or something like this, depending on the platform. Assume you |
| have code like |
| |
| printf ("The amount is %0" PRId64 "\n", number); |
| |
| The `gettext' tools and library have special support for these |
| `<inttypes.h>' macros. You can therefore simply write |
| |
| printf (gettext ("The amount is %0" PRId64 "\n"), number); |
| |
| The PO file will contain the string "The amount is %0<PRId64>\n". The |
| translators will provide a translation containing "%0<PRId64>" as well, |
| and at runtime the `gettext' function's result will contain the |
| appropriate constant string, "d" or "ld" or "lld". |
| |
| This works only for the predefined `<inttypes.h>' macros. If you |
| have defined your own similar macros, let's say `MYPRId64', that are |
| not known to `xgettext', the solution for this problem is to change the |
| code like this: |
| |
| char buf1[100]; |
| sprintf (buf1, "%0" MYPRId64, number); |
| printf (gettext ("The amount is %s\n"), buf1); |
| |
| This means, you put the platform dependent code in one statement, |
| and the internationalization code in a different statement. Note that |
| a buffer length of 100 is safe, because all available hardware integer |
| types are limited to 128 bits, and to print a 128 bit integer one needs |
| at most 54 characters, regardless whether in decimal, octal or |
| hexadecimal. |
| |
| All this applies to other programming languages as well. For |
| example, in Java and C#, string concatenation is very frequently used, |
| because it is a compiler built-in operator. Like in C, in Java, you |
| would change |
| |
| System.out.println("Replace "+object1+" with "+object2+"?"); |
| |
| into a statement involving a format string: |
| |
| System.out.println( |
| MessageFormat.format("Replace {0} with {1}?", |
| new Object[] { object1, object2 })); |
| |
| Similarly, in C#, you would change |
| |
| Console.WriteLine("Replace "+object1+" with "+object2+"?"); |
| |
| into a statement involving a format string: |
| |
| Console.WriteLine( |
| String.Format("Replace {0} with {1}?", object1, object2)); |
| |
| Unusual markup or control characters should not be used in |
| translatable strings. Translators will likely not understand the |
| particular meaning of the markup or control characters. |
| |
| For example, if you have a convention that `|' delimits the |
| left-hand and right-hand part of some GUI elements, translators will |
| often not understand it without specific comments. It might be better |
| to have the translator translate the left-hand and right-hand part |
| separately. |
| |
| Another example is the `argp' convention to use a single `\v' |
| (vertical tab) control character to delimit two sections inside a |
| string. This is flawed. Some translators may convert it to a simple |
| newline, some to blank lines. With some PO file editors it may not be |
| easy to even enter a vertical tab control character. So, you cannot be |
| sure that the translation will contain a `\v' character, at the |
| corresponding position. The solution is, again, to let the translator |
| translate two separate strings and combine at run-time the two |
| translated strings with the `\v' required by the convention. |
| |
| HTML markup, however, is common enough that it's probably ok to use |
| in translatable strings. But please bear in mind that the GNU gettext |
| tools don't verify that the translations are well-formed HTML. |
| |
| |
| File: gettext.info, Node: Mark Keywords, Next: Marking, Prev: Preparing Strings, Up: Sources |
| |
| 4.4 How Marks Appear in Sources |
| =============================== |
| |
| All strings requiring translation should be marked in the C sources. |
| Marking is done in such a way that each translatable string appears to |
| be the sole argument of some function or preprocessor macro. There are |
| only a few such possible functions or macros meant for translation, and |
| their names are said to be marking keywords. The marking is attached |
| to strings themselves, rather than to what we do with them. This |
| approach has more uses. A blatant example is an error message produced |
| by formatting. The format string needs translation, as well as some |
| strings inserted through some `%s' specification in the format, while |
| the result from `sprintf' may have so many different instances that it |
| is impractical to list them all in some `error_string_out()' routine, |
| say. |
| |
| This marking operation has two goals. The first goal of marking is |
| for triggering the retrieval of the translation, at run time. The |
| keyword is possibly resolved into a routine able to dynamically return |
| the proper translation, as far as possible or wanted, for the argument |
| string. Most localizable strings are found in executable positions, |
| that is, attached to variables or given as parameters to functions. |
| But this is not universal usage, and some translatable strings appear |
| in structured initializations. *Note Special cases::. |
| |
| The second goal of the marking operation is to help `xgettext' at |
| properly extracting all translatable strings when it scans a set of |
| program sources and produces PO file templates. |
| |
| The canonical keyword for marking translatable strings is `gettext', |
| it gave its name to the whole GNU `gettext' package. For packages |
| making only light use of the `gettext' keyword, macro or function, it |
| is easily used _as is_. However, for packages using the `gettext' |
| interface more heavily, it is usually more convenient to give the main |
| keyword a shorter, less obtrusive name. Indeed, the keyword might |
| appear on a lot of strings all over the package, and programmers |
| usually do not want nor need their program sources to remind them |
| forcefully, all the time, that they are internationalized. Further, a |
| long keyword has the disadvantage of using more horizontal space, |
| forcing more indentation work on sources for those trying to keep them |
| within 79 or 80 columns. |
| |
| Many packages use `_' (a simple underline) as a keyword, and write |
| `_("Translatable string")' instead of `gettext ("Translatable |
| string")'. Further, the coding rule, from GNU standards, wanting that |
| there is a space between the keyword and the opening parenthesis is |
| relaxed, in practice, for this particular usage. So, the textual |
| overhead per translatable string is reduced to only three characters: |
| the underline and the two parentheses. However, even if GNU `gettext' |
| uses this convention internally, it does not offer it officially. The |
| real, genuine keyword is truly `gettext' indeed. It is fairly easy for |
| those wanting to use `_' instead of `gettext' to declare: |
| |
| #include <libintl.h> |
| #define _(String) gettext (String) |
| |
| instead of merely using `#include <libintl.h>'. |
| |
| The marking keywords `gettext' and `_' take the translatable string |
| as sole argument. It is also possible to define marking functions that |
| take it at another argument position. It is even possible to make the |
| marked argument position depend on the total number of arguments of the |
| function call; this is useful in C++. All this is achieved using |
| `xgettext''s `--keyword' option. How to pass such an option to |
| `xgettext', assuming that `gettextize' is used, is described in *note |
| po/Makevars:: and *note AM_XGETTEXT_OPTION::. |
| |
| Note also that long strings can be split across lines, into multiple |
| adjacent string tokens. Automatic string concatenation is performed at |
| compile time according to ISO C and ISO C++; `xgettext' also supports |
| this syntax. |
| |
| Later on, the maintenance is relatively easy. If, as a programmer, |
| you add or modify a string, you will have to ask yourself if the new or |
| altered string requires translation, and include it within `_()' if you |
| think it should be translated. For example, `"%s"' is an example of |
| string _not_ requiring translation. But `"%s: %d"' _does_ require |
| translation, because in French, unlike in English, it's customary to |
| put a space before a colon. |
| |
| |
| File: gettext.info, Node: Marking, Next: c-format Flag, Prev: Mark Keywords, Up: Sources |
| |
| 4.5 Marking Translatable Strings |
| ================================ |
| |
| In PO mode, one set of features is meant more for the programmer than |
| for the translator, and allows him to interactively mark which strings, |
| in a set of program sources, are translatable, and which are not. Even |
| if it is a fairly easy job for a programmer to find and mark such |
| strings by other means, using any editor of his choice, PO mode makes |
| this work more comfortable. Further, this gives translators who feel a |
| little like programmers, or programmers who feel a little like |
| translators, a tool letting them work at marking translatable strings |
| in the program sources, while simultaneously producing a set of |
| translation in some language, for the package being internationalized. |
| |
| The set of program sources, targeted by the PO mode commands describe |
| here, should have an Emacs tags table constructed for your project, |
| prior to using these PO file commands. This is easy to do. In any |
| shell window, change the directory to the root of your project, then |
| execute a command resembling: |
| |
| etags src/*.[hc] lib/*.[hc] |
| |
| presuming here you want to process all `.h' and `.c' files from the |
| `src/' and `lib/' directories. This command will explore all said |
| files and create a `TAGS' file in your root directory, somewhat |
| summarizing the contents using a special file format Emacs can |
| understand. |
| |
| For packages following the GNU coding standards, there is a make |
| goal `tags' or `TAGS' which constructs the tag files in all directories |
| and for all files containing source code. |
| |
| Once your `TAGS' file is ready, the following commands assist the |
| programmer at marking translatable strings in his set of sources. But |
| these commands are necessarily driven from within a PO file window, and |
| it is likely that you do not even have such a PO file yet. This is not |
| a problem at all, as you may safely open a new, empty PO file, mainly |
| for using these commands. This empty PO file will slowly fill in while |
| you mark strings as translatable in your program sources. |
| |
| `,' |
| Search through program sources for a string which looks like a |
| candidate for translation (`po-tags-search'). |
| |
| `M-,' |
| Mark the last string found with `_()' (`po-mark-translatable'). |
| |
| `M-.' |
| Mark the last string found with a keyword taken from a set of |
| possible keywords. This command with a prefix allows some |
| management of these keywords (`po-select-mark-and-mark'). |
| |
| |
| The `,' (`po-tags-search') command searches for the next occurrence |
| of a string which looks like a possible candidate for translation, and |
| displays the program source in another Emacs window, positioned in such |
| a way that the string is near the top of this other window. If the |
| string is too big to fit whole in this window, it is positioned so only |
| its end is shown. In any case, the cursor is left in the PO file |
| window. If the shown string would be better presented differently in |
| different native languages, you may mark it using `M-,' or `M-.'. |
| Otherwise, you might rather ignore it and skip to the next string by |
| merely repeating the `,' command. |
| |
| A string is a good candidate for translation if it contains a |
| sequence of three or more letters. A string containing at most two |
| letters in a row will be considered as a candidate if it has more |
| letters than non-letters. The command disregards strings containing no |
| letters, or isolated letters only. It also disregards strings within |
| comments, or strings already marked with some keyword PO mode knows |
| (see below). |
| |
| If you have never told Emacs about some `TAGS' file to use, the |
| command will request that you specify one from the minibuffer, the |
| first time you use the command. You may later change your `TAGS' file |
| by using the regular Emacs command `M-x visit-tags-table', which will |
| ask you to name the precise `TAGS' file you want to use. *Note Tag |
| Tables: (emacs)Tags. |
| |
| Each time you use the `,' command, the search resumes from where it |
| was left by the previous search, and goes through all program sources, |
| obeying the `TAGS' file, until all sources have been processed. |
| However, by giving a prefix argument to the command (`C-u ,'), you may |
| request that the search be restarted all over again from the first |
| program source; but in this case, strings that you recently marked as |
| translatable will be automatically skipped. |
| |
| Using this `,' command does not prevent using of other regular Emacs |
| tags commands. For example, regular `tags-search' or |
| `tags-query-replace' commands may be used without disrupting the |
| independent `,' search sequence. However, as implemented, the |
| _initial_ `,' command (or the `,' command is used with a prefix) might |
| also reinitialize the regular Emacs tags searching to the first tags |
| file, this reinitialization might be considered spurious. |
| |
| The `M-,' (`po-mark-translatable') command will mark the recently |
| found string with the `_' keyword. The `M-.' |
| (`po-select-mark-and-mark') command will request that you type one |
| keyword from the minibuffer and use that keyword for marking the |
| string. Both commands will automatically create a new PO file |
| untranslated entry for the string being marked, and make it the current |
| entry (making it easy for you to immediately proceed to its |
| translation, if you feel like doing it right away). It is possible |
| that the modifications made to the program source by `M-,' or `M-.' |
| render some source line longer than 80 columns, forcing you to break |
| and re-indent this line differently. You may use the `O' command from |
| PO mode, or any other window changing command from Emacs, to break out |
| into the program source window, and do any needed adjustments. You |
| will have to use some regular Emacs command to return the cursor to the |
| PO file window, if you want command `,' for the next string, say. |
| |
| The `M-.' command has a few built-in speedups, so you do not have to |
| explicitly type all keywords all the time. The first such speedup is |
| that you are presented with a _preferred_ keyword, which you may accept |
| by merely typing `<RET>' at the prompt. The second speedup is that you |
| may type any non-ambiguous prefix of the keyword you really mean, and |
| the command will complete it automatically for you. This also means |
| that PO mode has to _know_ all your possible keywords, and that it will |
| not accept mistyped keywords. |
| |
| If you reply `?' to the keyword request, the command gives a list of |
| all known keywords, from which you may choose. When the command is |
| prefixed by an argument (`C-u M-.'), it inhibits updating any program |
| source or PO file buffer, and does some simple keyword management |
| instead. In this case, the command asks for a keyword, written in |
| full, which becomes a new allowed keyword for later `M-.' commands. |
| Moreover, this new keyword automatically becomes the _preferred_ |
| keyword for later commands. By typing an already known keyword in |
| response to `C-u M-.', one merely changes the _preferred_ keyword and |
| does nothing more. |
| |
| All keywords known for `M-.' are recognized by the `,' command when |
| scanning for strings, and strings already marked by any of those known |
| keywords are automatically skipped. If many PO files are opened |
| simultaneously, each one has its own independent set of known keywords. |
| There is no provision in PO mode, currently, for deleting a known |
| keyword, you have to quit the file (maybe using `q') and reopen it |
| afresh. When a PO file is newly brought up in an Emacs window, only |
| `gettext' and `_' are known as keywords, and `gettext' is preferred for |
| the `M-.' command. In fact, this is not useful to prefer `_', as this |
| one is already built in the `M-,' command. |
| |
| |
| File: gettext.info, Node: c-format Flag, Next: Special cases, Prev: Marking, Up: Sources |
| |
| 4.6 Special Comments preceding Keywords |
| ======================================= |
| |
| In C programs strings are often used within calls of functions from |
| the `printf' family. The special thing about these format strings is |
| that they can contain format specifiers introduced with `%'. Assume we |
| have the code |
| |
| printf (gettext ("String `%s' has %d characters\n"), s, strlen (s)); |
| |
| A possible German translation for the above string might be: |
| |
| "%d Zeichen lang ist die Zeichenkette `%s'" |
| |
| A C programmer, even if he cannot speak German, will recognize that |
| there is something wrong here. The order of the two format specifiers |
| is changed but of course the arguments in the `printf' don't have. |
| This will most probably lead to problems because now the length of the |
| string is regarded as the address. |
| |
| To prevent errors at runtime caused by translations the `msgfmt' |
| tool can check statically whether the arguments in the original and the |
| translation string match in type and number. If this is not the case |
| and the `-c' option has been passed to `msgfmt', `msgfmt' will give an |
| error and refuse to produce a MO file. Thus consequent use of `msgfmt |
| -c' will catch the error, so that it cannot cause cause problems at |
| runtime. |
| |
| If the word order in the above German translation would be correct one |
| would have to write |
| |
| "%2$d Zeichen lang ist die Zeichenkette `%1$s'" |
| |
| The routines in `msgfmt' know about this special notation. |
| |
| Because not all strings in a program must be format strings it is not |
| useful for `msgfmt' to test all the strings in the `.po' file. This |
| might cause problems because the string might contain what looks like a |
| format specifier, but the string is not used in `printf'. |
| |
| Therefore the `xgettext' adds a special tag to those messages it |
| thinks might be a format string. There is no absolute rule for this, |
| only a heuristic. In the `.po' file the entry is marked using the |
| `c-format' flag in the `#,' comment line (*note PO Files::). |
| |
| The careful reader now might say that this again can cause problems. |
| The heuristic might guess it wrong. This is true and therefore |
| `xgettext' knows about a special kind of comment which lets the |
| programmer take over the decision. If in the same line as or the |
| immediately preceding line to the `gettext' keyword the `xgettext' |
| program finds a comment containing the words `xgettext:c-format', it |
| will mark the string in any case with the `c-format' flag. This kind |
| of comment should be used when `xgettext' does not recognize the string |
| as a format string but it really is one and it should be tested. |
| Please note that when the comment is in the same line as the `gettext' |
| keyword, it must be before the string to be translated. |
| |
| This situation happens quite often. The `printf' function is often |
| called with strings which do not contain a format specifier. Of course |
| one would normally use `fputs' but it does happen. In this case |
| `xgettext' does not recognize this as a format string but what happens |
| if the translation introduces a valid format specifier? The `printf' |
| function will try to access one of the parameters but none exists |
| because the original code does not pass any parameters. |
| |
| `xgettext' of course could make a wrong decision the other way |
| round, i.e. a string marked as a format string actually is not a format |
| string. In this case the `msgfmt' might give too many warnings and |
| would prevent translating the `.po' file. The method to prevent this |
| wrong decision is similar to the one used above, only the comment to |
| use must contain the string `xgettext:no-c-format'. |
| |
| If a string is marked with `c-format' and this is not correct the |
| user can find out who is responsible for the decision. See *note |
| xgettext Invocation:: to see how the `--debug' option can be used for |
| solving this problem. |
| |
| |
| File: gettext.info, Node: Special cases, Next: Bug Report Address, Prev: c-format Flag, Up: Sources |
| |
| 4.7 Special Cases of Translatable Strings |
| ========================================= |
| |
| The attentive reader might now point out that it is not always |
| possible to mark translatable string with `gettext' or something like |
| this. Consider the following case: |
| |
| { |
| static const char *messages[] = { |
| "some very meaningful message", |
| "and another one" |
| }; |
| const char *string; |
| ... |
| string |
| = index > 1 ? "a default message" : messages[index]; |
| |
| fputs (string); |
| ... |
| } |
| |
| While it is no problem to mark the string `"a default message"' it |
| is not possible to mark the string initializers for `messages'. What |
| is to be done? We have to fulfill two tasks. First we have to mark the |
| strings so that the `xgettext' program (*note xgettext Invocation::) |
| can find them, and second we have to translate the string at runtime |
| before printing them. |
| |
| The first task can be fulfilled by creating a new keyword, which |
| names a no-op. For the second we have to mark all access points to a |
| string from the array. So one solution can look like this: |
| |
| #define gettext_noop(String) String |
| |
| { |
| static const char *messages[] = { |
| gettext_noop ("some very meaningful message"), |
| gettext_noop ("and another one") |
| }; |
| const char *string; |
| ... |
| string |
| = index > 1 ? gettext ("a default message") : gettext (messages[index]); |
| |
| fputs (string); |
| ... |
| } |
| |
| Please convince yourself that the string which is written by `fputs' |
| is translated in any case. How to get `xgettext' know the additional |
| keyword `gettext_noop' is explained in *note xgettext Invocation::. |
| |
| The above is of course not the only solution. You could also come |
| along with the following one: |
| |
| #define gettext_noop(String) String |
| |
| { |
| static const char *messages[] = { |
| gettext_noop ("some very meaningful message", |
| gettext_noop ("and another one") |
| }; |
| const char *string; |
| ... |
| string |
| = index > 1 ? gettext_noop ("a default message") : messages[index]; |
| |
| fputs (gettext (string)); |
| ... |
| } |
| |
| But this has a drawback. The programmer has to take care that he |
| uses `gettext_noop' for the string `"a default message"'. A use of |
| `gettext' could have in rare cases unpredictable results. |
| |
| One advantage is that you need not make control flow analysis to make |
| sure the output is really translated in any case. But this analysis is |
| generally not very difficult. If it should be in any situation you can |
| use this second method in this situation. |
| |
| |
| File: gettext.info, Node: Bug Report Address, Next: Names, Prev: Special cases, Up: Sources |
| |
| 4.8 Letting Users Report Translation Bugs |
| ========================================= |
| |
| Code sometimes has bugs, but translations sometimes have bugs too. |
| The users need to be able to report them. Reporting translation bugs |
| to the programmer or maintainer of a package is not very useful, since |
| the maintainer must never change a translation, except on behalf of the |
| translator. Hence the translation bugs must be reported to the |
| translators. |
| |
| Here is a way to organize this so that the maintainer does not need |
| to forward translation bug reports, nor even keep a list of the |
| addresses of the translators or their translation teams. |
| |
| Every program has a place where is shows the bug report address. For |
| GNU programs, it is the code which handles the "-help" option, |
| typically in a function called "usage". In this place, instruct the |
| translator to add her own bug reporting address. For example, if that |
| code has a statement |
| |
| printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); |
| |
| you can add some translator instructions like this: |
| |
| /* TRANSLATORS: The placeholder indicates the bug-reporting address |
| for this package. Please add _another line_ saying |
| "Report translation bugs to <...>\n" with the address for translation |
| bugs (typically your translation team's web or email address). */ |
| printf (_("Report bugs to <%s>.\n"), PACKAGE_BUGREPORT); |
| |
| These will be extracted by `xgettext', leading to a .pot file that |
| contains this: |
| |
| #. TRANSLATORS: The placeholder indicates the bug-reporting address |
| #. for this package. Please add _another line_ saying |
| #. "Report translation bugs to <...>\n" with the address for translation |
| #. bugs (typically your translation team's web or email address). |
| #: src/hello.c:178 |
| #, c-format |
| msgid "Report bugs to <%s>.\n" |
| msgstr "" |
| |
| |
| File: gettext.info, Node: Names, Next: Libraries, Prev: Bug Report Address, Up: Sources |
| |
| 4.9 Marking Proper Names for Translation |
| ======================================== |
| |
| Should names of persons, cities, locations etc. be marked for |
| translation or not? People who only know languages that can be written |
| with Latin letters (English, Spanish, French, German, etc.) are tempted |
| to say "no", because names usually do not change when transported |
| between these languages. However, in general when translating from one |
| script to another, names are translated too, usually phonetically or by |
| transliteration. For example, Russian or Greek names are converted to |
| the Latin alphabet when being translated to English, and English or |
| French names are converted to the Katakana script when being translated |
| to Japanese. This is necessary because the speakers of the target |
| language in general cannot read the script the name is originally |
| written in. |
| |
| As a programmer, you should therefore make sure that names are marked |
| for translation, with a special comment telling the translators that it |
| is a proper name and how to pronounce it. In its simple form, it looks |
| like this: |
| |
| printf (_("Written by %s.\n"), |
| /* TRANSLATORS: This is a proper name. See the gettext |
| manual, section Names. Note this is actually a non-ASCII |
| name: The first name is (with Unicode escapes) |
| "Fran\u00e7ois" or (with HTML entities) "François". |
| Pronunciation is like "fraa-swa pee-nar". */ |
| _("Francois Pinard")); |
| |
| The GNU gnulib library offers a module `propername' |
| (`http://www.gnu.org/software/gnulib/MODULES.html#module=propername') |
| which takes care to automatically append the original name, in |
| parentheses, to the translated name. For names that cannot be written |
| in ASCII, it also frees the translator from the task of entering the |
| appropriate non-ASCII characters if no script change is needed. In |
| this more comfortable form, it looks like this: |
| |
| printf (_("Written by %s and %s.\n"), |
| proper_name ("Ulrich Drepper"), |
| /* TRANSLATORS: This is a proper name. See the gettext |
| manual, section Names. Note this is actually a non-ASCII |
| name: The first name is (with Unicode escapes) |
| "Fran\u00e7ois" or (with HTML entities) "François". |
| Pronunciation is like "fraa-swa pee-nar". */ |
| proper_name_utf8 ("Francois Pinard", "Fran\303\247ois Pinard")); |
| |
| You can also write the original name directly in Unicode (rather than |
| with Unicode escapes or HTML entities) and denote the pronunciation |
| using the International Phonetic Alphabet (see |
| `http://www.wikipedia.org/wiki/International_Phonetic_Alphabet'). |
| |
| As a translator, you should use some care when translating names, |
| because it is frustrating if people see their names mutilated or |
| distorted. |
| |
| If your language uses the Latin script, all you need to do is to |
| reproduce the name as perfectly as you can within the usual character |
| set of your language. In this particular case, this means to provide a |
| translation containing the c-cedilla character. If your language uses |
| a different script and the people speaking it don't usually read Latin |
| words, it means transliteration. If the programmer used the simple |
| case, you should still give, in parentheses, the original writing of |
| the name - for the sake of the people that do read the Latin script. |
| If the programmer used the `propername' module mentioned above, you |
| don't need to give the original writing of the name in parentheses, |
| because the program will already do so. Here is an example, using |
| Greek as the target script: |
| |
| #. This is a proper name. See the gettext |
| #. manual, section Names. Note this is actually a non-ASCII |
| #. name: The first name is (with Unicode escapes) |
| #. "Fran\u00e7ois" or (with HTML entities) "François". |
| #. Pronunciation is like "fraa-swa pee-nar". |
| msgid "Francois Pinard" |
| msgstr "\phi\rho\alpha\sigma\omicron\alpha \pi\iota\nu\alpha\rho" |
| " (Francois Pinard)" |
| |
| Because translation of names is such a sensitive domain, it is a good |
| idea to test your translation before submitting it. |
| |
| |
| File: gettext.info, Node: Libraries, Prev: Names, Up: Sources |
| |
| 4.10 Preparing Library Sources |
| ============================== |
| |
| When you are preparing a library, not a program, for the use of |
| `gettext', only a few details are different. Here we assume that the |
| library has a translation domain and a POT file of its own. (If it |
| uses the translation domain and POT file of the main program, then the |
| previous sections apply without changes.) |
| |
| 1. The library code doesn't call `setlocale (LC_ALL, "")'. It's the |
| responsibility of the main program to set the locale. The |
| library's documentation should mention this fact, so that |
| developers of programs using the library are aware of it. |
| |
| 2. The library code doesn't call `textdomain (PACKAGE)', because it |
| would interfere with the text domain set by the main program. |
| |
| 3. The initialization code for a program was |
| |
| setlocale (LC_ALL, ""); |
| bindtextdomain (PACKAGE, LOCALEDIR); |
| textdomain (PACKAGE); |
| |
| For a library it is reduced to |
| |
| bindtextdomain (PACKAGE, LOCALEDIR); |
| |
| If your library's API doesn't already have an initialization |
| function, you need to create one, containing at least the |
| `bindtextdomain' invocation. However, you usually don't need to |
| export and document this initialization function: It is sufficient |
| that all entry points of the library call the initialization |
| function if it hasn't been called before. The typical idiom used |
| to achieve this is a static boolean variable that indicates |
| whether the initialization function has been called. Like this: |
| |
| static bool libfoo_initialized; |
| |
| static void |
| libfoo_initialize (void) |
| { |
| bindtextdomain (PACKAGE, LOCALEDIR); |
| libfoo_initialized = true; |
| } |
| |
| /* This function is part of the exported API. */ |
| struct foo * |
| create_foo (...) |
| { |
| /* Must ensure the initialization is performed. */ |
| if (!libfoo_initialized) |
| libfoo_initialize (); |
| ... |
| } |
| |
| /* This function is part of the exported API. The argument must be |
| non-NULL and have been created through create_foo(). */ |
| int |
| foo_refcount (struct foo *argument) |
| { |
| /* No need to invoke the initialization function here, because |
| create_foo() must already have been called before. */ |
| ... |
| } |
| |
| 4. The usual declaration of the `_' macro in each source file was |
| |
| #include <libintl.h> |
| #define _(String) gettext (String) |
| |
| for a program. For a library, which has its own translation |
| domain, it reads like this: |
| |
| #include <libintl.h> |
| #define _(String) dgettext (PACKAGE, String) |
| |
| In other words, `dgettext' is used instead of `gettext'. |
| Similarly, the `dngettext' function should be used in place of the |
| `ngettext' function. |
| |
| |
| File: gettext.info, Node: Template, Next: Creating, Prev: Sources, Up: Top |
| |
| 5 Making the PO Template File |
| ***************************** |
| |
| After preparing the sources, the programmer creates a PO template |
| file. This section explains how to use `xgettext' for this purpose. |
| |
| `xgettext' creates a file named `DOMAINNAME.po'. You should then |
| rename it to `DOMAINNAME.pot'. (Why doesn't `xgettext' create it under |
| the name `DOMAINNAME.pot' right away? The answer is: for historical |
| reasons. When `xgettext' was specified, the distinction between a PO |
| file and PO file template was fuzzy, and the suffix `.pot' wasn't in |
| use at that time.) |
| |
| * Menu: |
| |
| * xgettext Invocation:: Invoking the `xgettext' Program |
| |
| |
| File: gettext.info, Node: xgettext Invocation, Prev: Template, Up: Template |
| |
| 5.1 Invoking the `xgettext' Program |
| =================================== |
| |
| xgettext [OPTION] [INPUTFILE] ... |
| |
| The `xgettext' program extracts translatable strings from given |
| input files. |
| |
| 5.1.1 Input file location |
| ------------------------- |
| |
| `INPUTFILE ...' |
| Input files. |
| |
| `-f FILE' |
| `--files-from=FILE' |
| Read the names of the input files from FILE instead of getting |
| them from the command line. |
| |
| `-D DIRECTORY' |
| `--directory=DIRECTORY' |
| Add DIRECTORY to the list of directories. Source files are |
| searched relative to this list of directories. The resulting `.po' |
| file will be written relative to the current directory, though. |
| |
| |
| If INPUTFILE is `-', standard input is read. |
| |
| 5.1.2 Output file location |
| -------------------------- |
| |
| `-d NAME' |
| `--default-domain=NAME' |
| Use `NAME.po' for output (instead of `messages.po'). |
| |
| `-o FILE' |
| `--output=FILE' |
| Write output to specified file (instead of `NAME.po' or |
| `messages.po'). |
| |
| `-p DIR' |
| `--output-dir=DIR' |
| Output files will be placed in directory DIR. |
| |
| |
| If the output FILE is `-' or `/dev/stdout', the output is written to |
| standard output. |
| |
| 5.1.3 Choice of input file language |
| ----------------------------------- |
| |
| `-L NAME' |
| `--language=NAME' |
| Specifies the language of the input files. The supported languages |
| are `C', `C++', `ObjectiveC', `PO', `Python', `Lisp', `EmacsLisp', |
| `librep', `Scheme', `Smalltalk', `Java', `JavaProperties', `C#', |
| `awk', `YCP', `Tcl', `Perl', `PHP', `GCC-source', `NXStringTable', |
| `RST', `Glade'. |
| |
| `-C' |
| `--c++' |
| This is a shorthand for `--language=C++'. |
| |
| |
| By default the language is guessed depending on the input file name |
| extension. |
| |
| 5.1.4 Input file interpretation |
| ------------------------------- |
| |
| `--from-code=NAME' |
| Specifies the encoding of the input files. This option is needed |
| only if some untranslated message strings or their corresponding |
| comments contain non-ASCII characters. Note that Tcl and Glade |
| input files are always assumed to be in UTF-8, regardless of this |
| option. |
| |
| |
| By default the input files are assumed to be in ASCII. |
| |
| 5.1.5 Operation mode |
| -------------------- |
| |
| `-j' |
| `--join-existing' |
| Join messages with existing file. |
| |
| `-x FILE' |
| `--exclude-file=FILE' |
| Entries from FILE are not extracted. FILE should be a PO or POT |
| file. |
| |
| `-c[TAG]' |
| `--add-comments[=TAG]' |
| Place comment blocks starting with TAG and preceding keyword lines |
| in the output file. Without a TAG, the option means to put _all_ |
| comment blocks preceding keyword lines in the output file. |
| |
| |
| 5.1.6 Language specific options |
| ------------------------------- |
| |
| `-a' |
| `--extract-all' |
| Extract all strings. |
| |
| This option has an effect with most languages, namely C, C++, |
| ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, |
| Tcl, Perl, PHP, GCC-source, Glade. |
| |
| `-k[KEYWORDSPEC]' |
| `--keyword[=KEYWORDSPEC]' |
| Specify KEYWORDSPEC as an additional keyword to be looked for. |
| Without a KEYWORDSPEC, the option means to not use default |
| keywords. |
| |
| If KEYWORDSPEC is a C identifier ID, `xgettext' looks for strings |
| in the first argument of each call to the function or macro ID. |
| If KEYWORDSPEC is of the form `ID:ARGNUM', `xgettext' looks for |
| strings in the ARGNUMth argument of the call. If KEYWORDSPEC is |
| of the form `ID:ARGNUM1,ARGNUM2', `xgettext' looks for strings in |
| the ARGNUM1st argument and in the ARGNUM2nd argument of the call, |
| and treats them as singular/plural variants for a message with |
| plural handling. Also, if KEYWORDSPEC is of the form |
| `ID:CONTEXTARGNUMc,ARGNUM' or `ID:ARGNUM,CONTEXTARGNUMc', |
| `xgettext' treats strings in the CONTEXTARGNUMth argument as a |
| context specifier. And, as a special-purpose support for GNOME, |
| if KEYWORDSPEC is of the form `ID:ARGNUMg', `xgettext' recognizes |
| the ARGNUMth argument as a string with context, using the GNOME |
| `glib' syntax `"msgctxt|msgid"'. |
| Furthermore, if KEYWORDSPEC is of the form `ID:...,TOTALNUMARGSt', |
| `xgettext' recognizes this argument specification only if the |
| number of actual arguments is equal to TOTALNUMARGS. This is |
| useful for disambiguating overloaded function calls in C++. |
| Finally, if KEYWORDSPEC is of the form `ID:ARGNUM...,"XCOMMENT"', |
| `xgettext', when extracting a message from the specified argument |
| strings, adds an extracted comment XCOMMENT to the message. Note |
| that when used through a normal shell command line, the |
| double-quotes around the XCOMMENT need to be escaped. |
| |
| This option has an effect with most languages, namely C, C++, |
| ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Java, C#, awk, |
| Tcl, Perl, PHP, GCC-source, Glade. |
| |
| The default keyword specifications, which are always looked for if |
| not explicitly disabled, are language dependent. They are: |
| |
| * For C, C++, and GCC-source: `gettext', `dgettext:2', |
| `dcgettext:2', `ngettext:1,2', `dngettext:2,3', |
| `dcngettext:2,3', `gettext_noop', and `pgettext:1c,2', |
| `dpgettext:2c,3', `dcpgettext:2c,3', `npgettext:1c,2,3', |
| `dnpgettext:2c,3,4', `dcnpgettext:2c,3,4'. |
| |
| * For Objective C: Like for C, and also `NSLocalizedString', |
| `_', `NSLocalizedStaticString', `__'. |
| |
| * For Shell scripts: `gettext', `ngettext:1,2', `eval_gettext', |
| `eval_ngettext:1,2'. |
| |
| * For Python: `gettext', `ugettext', `dgettext:2', |
| `ngettext:1,2', `ungettext:1,2', `dngettext:2,3', `_'. |
| |
| * For Lisp: `gettext', `ngettext:1,2', `gettext-noop'. |
| |
| * For EmacsLisp: `_'. |
| |
| * For librep: `_'. |
| |
| * For Scheme: `gettext', `ngettext:1,2', `gettext-noop'. |
| |
| * For Java: `GettextResource.gettext:2', |
| `GettextResource.ngettext:2,3', |
| `GettextResource.pgettext:2c,3', |
| `GettextResource.npgettext:2c,3,4', `gettext', `ngettext:1,2', |
| `pgettext:1c,2', `npgettext:1c,2,3', `getString'. |
| |
| * For C#: `GetString', `GetPluralString:1,2', |
| `GetParticularString:1c,2', |
| `GetParticularPluralString:1c,2,3'. |
| |
| * For awk: `dcgettext', `dcngettext:1,2'. |
| |
| * For Tcl: `::msgcat::mc'. |
| |
| * For Perl: `gettext', `%gettext', `$gettext', `dgettext:2', |
| `dcgettext:2', `ngettext:1,2', `dngettext:2,3', |
| `dcngettext:2,3', `gettext_noop'. |
| |
| * For PHP: `_', `gettext', `dgettext:2', `dcgettext:2', |
| `ngettext:1,2', `dngettext:2,3', `dcngettext:2,3'. |
| |
| * For Glade 1: `label', `title', `text', `format', `copyright', |
| `comments', `preview_text', `tooltip'. |
| |
| To disable the default keyword specifications, the option `-k' or |
| `--keyword' or `--keyword=', without a KEYWORDSPEC, can be used. |
| |
| `--flag=WORD:ARG:FLAG' |
| Specifies additional flags for strings occurring as part of the |
| ARGth argument of the function WORD. The possible flags are the |
| possible format string indicators, such as `c-format', and their |
| negations, such as `no-c-format', possibly prefixed with `pass-'. |
| The meaning of `--flag=FUNCTION:ARG:LANG-format' is that in |
| language LANG, the specified FUNCTION expects as ARGth argument a |
| format string. (For those of you familiar with GCC function |
| attributes, `--flag=FUNCTION:ARG:c-format' is roughly equivalent |
| to the declaration `__attribute__ ((__format__ (__printf__, ARG, |
| ...)))' attached to FUNCTION in a C source file.) For example, if |
| you use the `error' function from GNU libc, you can specify its |
| behaviour through `--flag=error:3:c-format'. The effect of this |
| specification is that `xgettext' will mark as format strings all |
| `gettext' invocations that occur as ARGth argument of FUNCTION. |
| This is useful when such strings contain no format string |
| directives: together with the checks done by `msgfmt -c' it will |
| ensure that translators cannot accidentally use format string |
| directives that would lead to a crash at runtime. |
| The meaning of `--flag=FUNCTION:ARG:pass-LANG-format' is that in |
| language LANG, if the FUNCTION call occurs in a position that must |
| yield a format string, then its ARGth argument must yield a format |
| string of the same type as well. (If you know GCC function |
| attributes, the `--flag=FUNCTION:ARG:pass-c-format' option is |
| roughly equivalent to the declaration `__attribute__ |
| ((__format_arg__ (ARG)))' attached to FUNCTION in a C source file.) |
| For example, if you use the `_' shortcut for the `gettext' |
| function, you should use `--flag=_:1:pass-c-format'. The effect |
| of this specification is that `xgettext' will propagate a format |
| string requirement for a `_("string")' call to its first argument, |
| the literal `"string"', and thus mark it as a format string. This |
| is useful when such strings contain no format string directives: |
| together with the checks done by `msgfmt -c' it will ensure that |
| translators cannot accidentally use format string directives that |
| would lead to a crash at runtime. |
| This option has an effect with most languages, namely C, C++, |
| ObjectiveC, Shell, Python, Lisp, EmacsLisp, librep, Scheme, Java, |
| C#, awk, YCP, Tcl, Perl, PHP, GCC-source. |
| |
| `-T' |
| `--trigraphs' |
| Understand ANSI C trigraphs for input. |
| This option has an effect only with the languages C, C++, |
| ObjectiveC. |
| |
| `--qt' |
| Recognize Qt format strings. |
| This option has an effect only with the language C++. |
| |
| `--kde' |
| Recognize KDE 4 format strings. |
| This option has an effect only with the language C++. |
| |
| `--boost' |
| Recognize Boost format strings. |
| This option has an effect only with the language C++. |
| |
| `--debug' |
| Use the flags `c-format' and `possible-c-format' to show who was |
| responsible for marking a message as a format string. The latter |
| form is used if the `xgettext' program decided, the format form is |
| used if the programmer prescribed it. |
| |
| By default only the `c-format' form is used. The translator should |
| not have to care about these details. |
| |
| |
| This implementation of `xgettext' is able to process a few awkward |
| cases, like strings in preprocessor macros, ANSI concatenation of |
| adjacent strings, and escaped end of lines for continued strings. |
| |
| 5.1.7 Output details |
| -------------------- |
| |
| `--color' |
| `--color=WHEN' |
| Specify whether or when to use colors and other text attributes. |
| See *note The --color option:: for details. |
| |
| `--style=STYLE_FILE' |
| Specify the CSS style rule file to use for `--color'. See *note |
| The --style option:: for details. |
| |
| `--force-po' |
| Always write an output file even if no message is defined. |
| |
| `-i' |
| `--indent' |
| Write the .po file using indented style. |
| |
| `--no-location' |
| Do not write `#: FILENAME:LINE' lines. Note that using this |
| option makes it harder for technically skilled translators to |
| understand each message's context. |
| |
| `-n' |
| `--add-location' |
| Generate `#: FILENAME:LINE' lines (default). |
| |
| `--strict' |
| Write out a strict Uniforum conforming PO file. Note that this |
| Uniforum format should be avoided because it doesn't support the |
| GNU extensions. |
| |
| `--properties-output' |
| Write out a Java ResourceBundle in Java `.properties' syntax. Note |
| that this file format doesn't support plural forms and silently |
| drops obsolete messages. |
| |
| `--stringtable-output' |
| Write out a NeXTstep/GNUstep localized resource file in `.strings' |
| syntax. Note that this file format doesn't support plural forms. |
| |
| `-w NUMBER' |
| `--width=NUMBER' |
| Set the output page width. Long strings in the output files will |
| be split across multiple lines in order to ensure that each line's |
| width (= number of screen columns) is less or equal to the given |
| NUMBER. |
| |
| `--no-wrap' |
| Do not break long message lines. Message lines whose width |
| exceeds the output page width will not be split into several |
| lines. Only file reference lines which are wider than the output |
| page width will be split. |
| |
| `-s' |
| `--sort-output' |
| Generate sorted output. Note that using this option makes it much |
| harder for the translator to understand each message's context. |
| |
| `-F' |
| `--sort-by-file' |
| Sort output by file location. |
| |
| `--omit-header' |
| Don't write header with `msgid ""' entry. |
| |
| This is useful for testing purposes because it eliminates a source |
| of variance for generated `.gmo' files. With `--omit-header', two |
| invocations of `xgettext' on the same files with the same options |
| at different times are guaranteed to produce the same results. |
| |
| Note that using this option will lead to an error if the resulting |
| file would not entirely be in ASCII. |
| |
| `--copyright-holder=STRING' |
| Set the copyright holder in the output. STRING should be the |
| copyright holder of the surrounding package. (Note that the msgstr |
| strings, extracted from the package's sources, belong to the |
| copyright holder of the package.) Translators are expected to |
| transfer or disclaim the copyright for their translations, so that |
| package maintainers can distribute them without legal risk. If |
| STRING is empty, the output files are marked as being in the |
| public domain; in this case, the translators are expected to |
| disclaim their copyright, again so that package maintainers can |
| distribute them without legal risk. |
| |
| The default value for STRING is the Free Software Foundation, Inc., |
| simply because `xgettext' was first used in the GNU project. |
| |
| `--foreign-user' |
| Omit FSF copyright in output. This option is equivalent to |
| `--copyright-holder='''. It can be useful for packages outside |
| the GNU project that want their translations to be in the public |
| domain. |
| |
| `--package-name=PACKAGE' |
| Set the package name in the header of the output. |
| |
| `--package-version=VERSION' |
| Set the package version in the header of the output. This option |
| has an effect only if the `--package-name' option is also used. |
| |
| `--msgid-bugs-address=EMAIL@ADDRESS' |
| Set the reporting address for msgid bugs. This is the email |
| address or URL to which the translators shall report bugs in the |
| untranslated strings: |
| |
| - Strings which are not entire sentences, see the maintainer |
| guidelines in *note Preparing Strings::. |
| |
| - Strings which use unclear terms or require additional context |
| to be understood. |
| |
| - Strings which make invalid assumptions about notation of |
| date, time or money. |
| |
| - Pluralisation problems. |
| |
| - Incorrect English spelling. |
| |
| - Incorrect formatting. |
| |
| It can be your email address, or a mailing list address where |
| translators can write to without being subscribed, or the URL of a |
| web page through which the translators can contact you. |
| |
| The default value is empty, which means that translators will be |
| clueless! Don't forget to specify this option. |
| |
| `-m[STRING]' |
| `--msgstr-prefix[=STRING]' |
| Use STRING (or "" if not specified) as prefix for msgstr values. |
| |
| `-M[STRING]' |
| `--msgstr-suffix[=STRING]' |
| Use STRING (or "" if not specified) as suffix for msgstr values. |
| |
| |
| 5.1.8 Informative output |
| ------------------------ |
| |
| `-h' |
| `--help' |
| Display this help and exit. |
| |
| `-V' |
| `--version' |
| Output version information and exit. |
| |
| |
| |
| File: gettext.info, Node: Creating, Next: Updating, Prev: Template, Up: Top |
| |
| 6 Creating a New PO File |
| ************************ |
| |
| When starting a new translation, the translator creates a file called |
| `LANG.po', as a copy of the `PACKAGE.pot' template file with |
| modifications in the initial comments (at the beginning of the file) |
| and in the header entry (the first entry, near the beginning of the |
| file). |
| |
| The easiest way to do so is by use of the `msginit' program. For |
| example: |
| |
| $ cd PACKAGE-VERSION |
| $ cd po |
| $ msginit |
| |
| The alternative way is to do the copy and modifications by hand. To |
| do so, the translator copies `PACKAGE.pot' to `LANG.po'. Then she |
| modifies the initial comments and the header entry of this file. |
| |
| * Menu: |
| |
| * msginit Invocation:: Invoking the `msginit' Program |
| * Header Entry:: Filling in the Header Entry |
| |
| |
| File: gettext.info, Node: msginit Invocation, Next: Header Entry, Prev: Creating, Up: Creating |
| |
| 6.1 Invoking the `msginit' Program |
| ================================== |
| |
| msginit [OPTION] |
| |
| The `msginit' program creates a new PO file, initializing the meta |
| information with values from the user's environment. |
| |
| 6.1.1 Input file location |
| ------------------------- |
| |
| `-i INPUTFILE' |
| `--input=INPUTFILE' |
| Input POT file. |
| |
| |
| If no INPUTFILE is given, the current directory is searched for the |
| POT file. If it is `-', standard input is read. |
| |
| 6.1.2 Output file location |
| -------------------------- |
| |
| `-o FILE' |
| `--output-file=FILE' |
| Write output to specified PO file. |
| |
| |
| If no output file is given, it depends on the `--locale' option or |
| the user's locale setting. If it is `-', the results are written to |
| standard output. |
| |
| 6.1.3 Input file syntax |
| ----------------------- |
| |
| `-P' |
| `--properties-input' |
| Assume the input file is a Java ResourceBundle in Java |
| `.properties' syntax, not in PO file syntax. |
| |
| `--stringtable-input' |
| Assume the input file is a NeXTstep/GNUstep localized resource |
| file in `.strings' syntax, not in PO file syntax. |
| |
| |
| 6.1.4 Output details |
| -------------------- |
| |
| `-l LL_CC' |
| `--locale=LL_CC' |
| Set target locale. LL should be a language code, and CC should be |
| a country code. The command `locale -a' can be used to output a |
| list of all installed locales. The default is the user's locale |
| setting. |
| |
| `--no-translator' |
| Declares that the PO file will not have a human translator and is |
| instead automatically generated. |
| |
| `--color' |
| `--color=WHEN' |
| Specify whether or when to use colors and other text attributes. |
| See *note The --color option:: for details. |
| |
| `--style=STYLE_FILE' |
| Specify the CSS style rule file to use for `--color'. See *note |
| The --style option:: for details. |
| |
| `-p' |
| `--properties-output' |
| Write out a Java ResourceBundle in Java `.properties' syntax. Note |
| that this file format doesn't support plural forms and silently |
| drops obsolete messages. |
| |
| `--stringtable-output' |
| Write out a NeXTstep/GNUstep localized resource file in `.strings' |
| syntax. Note that this file format doesn't support plural forms. |
| |
| `-w NUMBER' |
| `--width=NUMBER' |
| Set the output page width. Long strings in the output files will |
| be split across multiple lines in order to ensure that each line's |
| width (= number of screen columns) is less or equal to the given |
| NUMBER. |
| |
| `--no-wrap' |
| Do not break long message lines. Message lines whose width |
| exceeds the output page width will not be split into several |
| lines. Only file reference lines which are wider than the output |
| page width will be split. |
| |
| |
| 6.1.5 Informative output |
| ------------------------ |
| |
| `-h' |
| `--help' |
| Display this help and exit. |
| |
| `-V' |
| `--version' |
| Output version information and exit. |
| |
| |
| |
| File: gettext.info, Node: Header Entry, Prev: msginit Invocation, Up: Creating |
| |
| 6.2 Filling in the Header Entry |
| =============================== |
| |
| The initial comments "SOME DESCRIPTIVE TITLE", "YEAR" and "FIRST |
| AUTHOR <EMAIL@ADDRESS>, YEAR" ought to be replaced by sensible |
| information. This can be done in any text editor; if Emacs is used and |
| it switched to PO mode automatically (because it has recognized the |
| file's suffix), you can disable it by typing `M-x fundamental-mode'. |
| |
| Modifying the header entry can already be done using PO mode: in |
| Emacs, type `M-x po-mode RET' and then `RET' again to start editing the |
| entry. You should fill in the following fields. |
| |
| Project-Id-Version |
| This is the name and version of the package. Fill it in if it has |
| not already been filled in by `xgettext'. |
| |
| Report-Msgid-Bugs-To |
| This has already been filled in by `xgettext'. It contains an |
| email address or URL where you can report bugs in the untranslated |
| strings: |
| |
| - Strings which are not entire sentences, see the maintainer |
| guidelines in *note Preparing Strings::. |
| |
| - Strings which use unclear terms or require additional context |
| to be understood. |
| |
| - Strings which make invalid assumptions about notation of |
| date, time or money. |
| |
| - Pluralisation problems. |
| |
| - Incorrect English spelling. |
| |
| - Incorrect formatting. |
| |
| POT-Creation-Date |
| This has already been filled in by `xgettext'. |
| |
| PO-Revision-Date |
| You don't need to fill this in. It will be filled by the PO file |
| editor when you save the file. |
| |
| Last-Translator |
| Fill in your name and email address (without double quotes). |
| |
| Language-Team |
| Fill in the English name of the language, and the email address or |
| homepage URL of the language team you are part of. |
| |
| Before starting a translation, it is a good idea to get in touch |
| with your translation team, not only to make sure you don't do |
| duplicated work, but also to coordinate difficult linguistic |
| issues. |
| |
| In the Free Translation Project, each translation team has its own |
| mailing list. The up-to-date list of teams can be found at the |
| Free Translation Project's homepage, |
| `http://translationproject.org/', in the "Teams" area. |
| |
| Language |
| Fill in the language code of the language. This can be in one of |
| three forms: |
| |
| - `LL', an ISO 639 two-letter language code (lowercase). See |
| *note Language Codes:: for the list of codes. |
| |
| - `LL_CC', where `LL' is an ISO 639 two-letter language code |
| (lowercase) and `CC' is an ISO 3166 two-letter country code |
| (uppercase). The country code specification is not redundant: |
| Some languages have dialects in different countries. For |
| example, `de_AT' is used for Austria, and `pt_BR' for Brazil. |
| The country code serves to distinguish the dialects. See |
| *note Language Codes:: and *note Country Codes:: for the |
| lists of codes. |
| |
| - `LL_CC@VARIANT', where `LL' is an ISO 639 two-letter language |
| code (lowercase), `CC' is an ISO 3166 two-letter country code |
| (uppercase), and `VARIANT' is a variant designator. The |
| variant designator (lowercase) can be a script designator, |
| such as `latin' or `cyrillic'. |
| |
| The naming convention `LL_CC' is also the way locales are named on |
| systems based on GNU libc. But there are three important |
| differences: |
| |
| * In this PO file field, but not in locale names, `LL_CC' |
| combinations denoting a language's main dialect are |
| abbreviated as `LL'. For example, `de' is equivalent to |
| `de_DE' (German as spoken in Germany), and `pt' to `pt_PT' |
| (Portuguese as spoken in Portugal) in this context. |
| |
| * In this PO file field, suffixes like `.ENCODING' are not used. |
| |
| * In this PO file field, variant designators that are not |
| relevant to message translation, such as `@euro', are not |
| used. |
| |
| So, if your locale name is `de_DE.UTF-8', the language |
| specification in PO files is just `de'. |
| |
| Content-Type |
| Replace `CHARSET' with the character encoding used for your |
| language, in your locale, or UTF-8. This field is needed for |
| correct operation of the `msgmerge' and `msgfmt' programs, as well |
| as for users whose locale's character encoding differs from yours |
| (see *note Charset conversion::). |
| |
| You get the character encoding of your locale by running the shell |
| command `locale charmap'. If the result is `C' or |
| `ANSI_X3.4-1968', which is equivalent to `ASCII' (= `US-ASCII'), |
| it means that your locale is not correctly configured. In this |
| case, ask your translation team which charset to use. `ASCII' is |
| not usable for any language except Latin. |
| |
| Because the PO files must be portable to operating systems with |
| less advanced internationalization facilities, the character |
| encodings that can be used are limited to those supported by both |
| GNU `libc' and GNU `libiconv'. These are: `ASCII', `ISO-8859-1', |
| `ISO-8859-2', `ISO-8859-3', `ISO-8859-4', `ISO-8859-5', |
| `ISO-8859-6', `ISO-8859-7', `ISO-8859-8', `ISO-8859-9', |
| `ISO-8859-13', `ISO-8859-14', `ISO-8859-15', `KOI8-R', `KOI8-U', |
| `KOI8-T', `CP850', `CP866', `CP874', `CP932', `CP949', `CP950', |
| `CP1250', `CP1251', `CP1252', `CP1253', `CP1254', `CP1255', |
| `CP1256', `CP1257', `GB2312', `EUC-JP', `EUC-KR', `EUC-TW', |
| `BIG5', `BIG5-HKSCS', `GBK', `GB18030', `SHIFT_JIS', `JOHAB', |
| `TIS-620', `VISCII', `GEORGIAN-PS', `UTF-8'. |
| |
| In the GNU system, the following encodings are frequently used for |
| the corresponding languages. |
| |
| * `ISO-8859-1' for Afrikaans, Albanian, Basque, Breton, |
| Catalan, Cornish, Danish, Dutch, English, Estonian, Faroese, |
| Finnish, French, Galician, German, Greenlandic, Icelandic, |
| Indonesian, Irish, Italian, Malay, Manx, Norwegian, Occitan, |
| Portuguese, Spanish, Swedish, Tagalog, Uzbek, Walloon, |
| |
| * `ISO-8859-2' for Bosnian, Croatian, Czech, Hungarian, Polish, |
| Romanian, Serbian, Slovak, Slovenian, |
| |
| * `ISO-8859-3' for Maltese, |
| |
| * `ISO-8859-5' for Macedonian, Serbian, |
| |
| * `ISO-8859-6' for Arabic, |
| |
| * `ISO-8859-7' for Greek, |
| |
| * `ISO-8859-8' for Hebrew, |
| |
| * `ISO-8859-9' for Turkish, |
| |
| * `ISO-8859-13' for Latvian, Lithuanian, Maori, |
| |
| * `ISO-8859-14' for Welsh, |
| |
| * `ISO-8859-15' for Basque, Catalan, Dutch, English, Finnish, |
| French, Galician, German, Irish, Italian, Portuguese, |
| Spanish, Swedish, Walloon, |
| |
| * `KOI8-R' for Russian, |
| |
| * `KOI8-U' for Ukrainian, |
| |
| * `KOI8-T' for Tajik, |
| |
| * `CP1251' for Bulgarian, Belarusian, |
| |
| * `GB2312', `GBK', `GB18030' for simplified writing of Chinese, |
| |
| * `BIG5', `BIG5-HKSCS' for traditional writing of Chinese, |
| |
| * `EUC-JP' for Japanese, |
| |
| * `EUC-KR' for Korean, |
| |
| * `TIS-620' for Thai, |
| |
| * `GEORGIAN-PS' for Georgian, |
| |
| * `UTF-8' for any language, including those listed above. |
| |
| When single quote characters or double quote characters are used in |
| translations for your language, and your locale's encoding is one |
| of the ISO-8859-* charsets, it is best if you create your PO files |
| in UTF-8 encoding, instead of your locale's encoding. This is |
| because in UTF-8 the real quote characters can be represented |
| (single quote characters: U+2018, U+2019, double quote characters: |
| U+201C, U+201D), whereas none of ISO-8859-* charsets has them all. |
| Users in UTF-8 locales will see the real quote characters, whereas |
| users in ISO-8859-* locales will see the vertical apostrophe and |
| the vertical double quote instead (because that's what the |
| character set conversion will transliterate them to). |
| |
| To enter such quote characters under X11, you can change your |
| keyboard mapping using the `xmodmap' program. The X11 names of |
| the quote characters are "leftsinglequotemark", |
| "rightsinglequotemark", "leftdoublequotemark", |
| "rightdoublequotemark", "singlelowquotemark", "doublelowquotemark". |
| |
| Note that only recent versions of GNU Emacs support the UTF-8 |
| encoding: Emacs 20 with Mule-UCS, and Emacs 21. As of January |
| 2001, XEmacs doesn't support the UTF-8 encoding. |
| |
| The character encoding name can be written in either upper or |
| lower case. Usually upper case is preferred. |
| |
| Content-Transfer-Encoding |
| Set this to `8bit'. |
| |
| Plural-Forms |
| This field is optional. It is only needed if the PO file has |
| plural forms. You can find them by searching for the |
| `msgid_plural' keyword. The format of the plural forms field is |
| described in *note Plural forms:: and *note Translating plural |
| forms::. |
| |
| |
| File: gettext.info, Node: Updating, Next: Editing, Prev: Creating, Up: Top |
| |
| 7 Updating Existing PO Files |
| **************************** |
| |
| * Menu: |
| |
| * msgmerge Invocation:: Invoking the `msgmerge' Program |
| |
| |
| File: gettext.info, Node: msgmerge Invocation, Prev: Updating, Up: Updating |
| |
| 7.1 Invoking the `msgmerge' Program |
| =================================== |
| |
| msgmerge [OPTION] DEF.po REF.pot |
| |
| The `msgmerge' program merges two Uniforum style .po files together. |
| The DEF.po file is an existing PO file with translations which will be |
| taken over to the newly created file as long as they still match; |
| comments will be preserved, but extracted comments and file positions |
| will be discarded. The REF.pot file is the last created PO file with |
| up-to-date source references but old translations, or a PO Template file |
| (generally created by `xgettext'); any translations or comments in the |
| file will be discarded, however dot comments and file positions will be |
| preserved. Where an exact match cannot be found, fuzzy matching is |
| used to produce better results. |
| |
| 7.1.1 Input file location |
| ------------------------- |
| |
| `DEF.po' |
| Translations referring to old sources. |
| |
| `REF.pot' |
| References to the new sources. |
| |
| `-D DIRECTORY' |
| `--directory=DIRECTORY' |
| Add DIRECTORY to the list of directories. Source files are |
| searched relative to this list of directories. The resulting `.po' |
| file will be written relative to the current directory, though. |
| |
| `-C FILE' |
| `--compendium=FILE' |
| Specify an additional library of message translations. *Note |
| Compendium::. This option may be specified more than once. |
| |
| |
| 7.1.2 Operation mode |
| -------------------- |
| |
| `-U' |
| `--update' |
| Update DEF.po. Do nothing if DEF.po is already up to date. |
| |
| |
| 7.1.3 Output file location |
| -------------------------- |
| |
| `-o FILE' |
| `--output-file=FILE' |
| Write output to specified file. |
| |
| |
| The results are written to standard output if no output file is |
| specified or if it is `-'. |
| |
| 7.1.4 Output file location in update mode |
| ----------------------------------------- |
| |
| The result is written back to DEF.po. |
| |
| `--backup=CONTROL' |
| Make a backup of DEF.po |
| |
| `--suffix=SUFFIX' |
| Override the usual backup suffix. |
| |
| |
| The version control method may be selected via the `--backup' option |
| or through the `VERSION_CONTROL' environment variable. Here are the |
| values: |
| |
| `none' |
| `off' |
| Never make backups (even if `--backup' is given). |
| |
| `numbered' |
| `t' |
| Make numbered backups. |
| |
| `existing' |
| `nil' |
| Make numbered backups if numbered backups for this file already |
| exist, otherwise make simple backups. |
| |
| `simple' |
| `never' |
| Always make simple backups. |
| |
| |
| The backup suffix is `~', unless set with `--suffix' or the |
| `SIMPLE_BACKUP_SUFFIX' environment variable. |
| |
| 7.1.5 Operation modifiers |
| ------------------------- |
| |
| `-m' |
| `--multi-domain' |
| Apply REF.pot to each of the domains in DEF.po. |
| |
| `-N' |
| `--no-fuzzy-matching' |
| Do not use fuzzy matching when an exact match is not found. This |
| may speed up the operation considerably. |
| |
| `--previous' |
| Keep the previous msgids of translated messages, marked with `#|', |
| when adding the fuzzy marker to such messages. |
| |
| 7.1.6 Input file syntax |
| ----------------------- |
| |
| `-P' |
| `--properties-input' |
| Assume the input files are Java ResourceBundles in Java |
| `.properties' syntax, not in PO file syntax. |
| |
| `--stringtable-input' |
| Assume the input files are NeXTstep/GNUstep localized resource |
| files in `.strings' syntax, not in PO file syntax. |
| |
| |
| 7.1.7 Output details |
| -------------------- |
| |
| `--lang=CATALOGNAME' |
| Specify the `Language' field to be used in the header entry. See |
| *note Header Entry:: for the meaning of this field. Note: The |
| `Language-Team' and `Plural-Forms' fields are left unchanged. If |
| this option is not specified, the `Language' field is inferred, as |
| best as possible, from the `Language-Team' field. |
| |
| `--color' |
| `--color=WHEN' |
| Specify whether or when to use colors and other text attributes. |
| See *note The --color option:: for details. |
| |
| `--style=STYLE_FILE' |
| Specify the CSS style rule file to use for `--color'. See *note |
| The --style option:: for details. |
| |
| `--force-po' |
| Always write an output file even if it contains no message. |
| |
| `-i' |
| `--indent' |
| Write the .po file using indented style. |
| |
| `--no-location' |
| Do not write `#: FILENAME:LINE' lines. |
| |
| `--add-location' |
| Generate `#: FILENAME:LINE' lines (default). |
| |
| `--strict' |
| Write out a strict Uniforum conforming PO file. Note that this |
| Uniforum format should be avoided because it doesn't support the |
| GNU extensions. |
| |
| `-p' |
| `--properties-output' |
| Write out a Java ResourceBundle in Java `.properties' syntax. Note |
| that this file format doesn't support plural forms and silently |
| drops obsolete messages. |
| |
| `--stringtable-output' |
| Write out a NeXTstep/GNUstep localized resource file in `.strings' |
| syntax. Note that this file format doesn't support plural forms. |
| |
| `-w NUMBER' |
| `--width=NUMBER' |
| Set the output page width. Long strings in the output files will |
| be split across multiple lines in order to ensure that each line's |
| width (= number of screen columns) is less or equal to the given |
| NUMBER. |
| |
| `--no-wrap' |
| Do not break long message lines. Message lines whose width |
| exceeds the output page width will not be split into several |
| lines. Only file reference lines which are wider than the output |
| page width will be split. |
| |
| `-s' |
| `--sort-output' |
| Generate sorted output. Note that using this option makes it much |
| harder for the translator to understand each message's context. |
| |
| `-F' |
| `--sort-by-file' |
| Sort output by file location. |
| |
| |
| 7.1.8 Informative output |
| ------------------------ |
| |
| `-h' |
| `--help' |
| Display this help and exit. |
| |
| `-V' |
| `--version' |
| Output version information and exit. |
| |
| `-v' |
| `--verbose' |
| Increase verbosity level. |
| |
| `-q' |
| `--quiet' |
| `--silent' |
| Suppress progress indicators. |
| |
| |
| |
| File: gettext.info, Node: Editing, Next: Manipulating, Prev: Updating, Up: Top |
| |
| 8 Editing PO Files |
| ****************** |
| |
| * Menu: |
| |
| * KBabel:: KDE's PO File Editor |
| * Gtranslator:: GNOME's PO File Editor |
| * PO Mode:: Emacs's PO File Editor |
| * Compendium:: Using Translation Compendia |
| |
| |
| File: gettext.info, Node: KBabel, Next: Gtranslator, Prev: Editing, Up: Editing |
| |
| 8.1 KDE's PO File Editor |
| ======================== |
| |
| |
| File: gettext.info, Node: Gtranslator, Next: PO Mode, Prev: KBabel, Up: Editing |
| |
| 8.2 GNOME's PO File Editor |
| ========================== |
| |
| |
| File: gettext.info, Node: PO Mode, Next: Compendium, Prev: Gtranslator, Up: Editing |
| |
| 8.3 Emacs's PO File Editor |
| ========================== |
| |
| For those of you being the lucky users of Emacs, PO mode has been |
| specifically created for providing a cozy environment for editing or |
| modifying PO files. While editing a PO file, PO mode allows for the |
| easy browsing of auxiliary and compendium PO files, as well as for |
| following references into the set of C program sources from which PO |
| files have been derived. It has a few special features, among which |
| are the interactive marking of program strings as translatable, and the |
| validation of PO files with easy repositioning to PO file lines showing |
| errors. |
| |
| For the beginning, besides main PO mode commands (*note Main PO |
| Commands::), you should know how to move between entries (*note Entry |
| Positioning::), and how to handle untranslated entries (*note |
| Untranslated Entries::). |
| |
| * Menu: |
| |
| * Installation:: Completing GNU `gettext' Installation |
| * Main PO Commands:: Main Commands |
| * Entry Positioning:: Entry Positioning |
| * Normalizing:: Normalizing Strings in Entries |
| * Translated Entries:: Translated Entries |
| * Fuzzy Entries:: Fuzzy Entries |
| * Untranslated Entries:: Untranslated Entries |
| * Obsolete Entries:: Obsolete Entries |
| * Modifying Translations:: Modifying Translations |
| * Modifying Comments:: Modifying Comments |
| * Subedit:: Mode for Editing Translations |
| * C Sources Context:: C Sources Context |
| * Auxiliary:: Consulting Auxiliary PO Files |
| |
| |
| File: gettext.info, Node: Installation, Next: Main PO Commands, Prev: PO Mode, Up: PO Mode |
| |
| 8.3.1 Completing GNU `gettext' Installation |
| ------------------------------------------- |
| |
| Once you have received, unpacked, configured and compiled the GNU |
| `gettext' distribution, the `make install' command puts in place the |
| programs `xgettext', `msgfmt', `gettext', and `msgmerge', as well as |
| their available message catalogs. To top off a comfortable |
| installation, you might also want to make the PO mode available to your |
| Emacs users. |
| |
| During the installation of the PO mode, you might want to modify your |
| file `.emacs', once and for all, so it contains a few lines looking |
| like: |
| |
| (setq auto-mode-alist |
| (cons '("\\.po\\'\\|\\.po\\." . po-mode) auto-mode-alist)) |
| (autoload 'po-mode "po-mode" "Major mode for translators to edit PO files" t) |
| |
| Later, whenever you edit some `.po' file, or any file having the |
| string `.po.' within its name, Emacs loads `po-mode.elc' (or |
| `po-mode.el') as needed, and automatically activates PO mode commands |
| for the associated buffer. The string _PO_ appears in the mode line |
| for any buffer for which PO mode is active. Many PO files may be |
| active at once in a single Emacs session. |
| |
| If you are using Emacs version 20 or newer, and have already |
| installed the appropriate international fonts on your system, you may |
| also tell Emacs how to determine automatically the coding system of |
| every PO file. This will often (but not always) cause the necessary |
| fonts to be loaded and used for displaying the translations on your |
| Emacs screen. For this to happen, add the lines: |
| |
| (modify-coding-system-alist 'file "\\.po\\'\\|\\.po\\." |
| 'po-find-file-coding-system) |
| (autoload 'po-find-file-coding-system "po-mode") |
| |
| to your `.emacs' file. If, with this, you still see boxes instead of |
| international characters, try a different font set (via Shift Mouse |
| button 1). |
| |
| |
| File: gettext.info, Node: Main PO Commands, Next: Entry Positioning, Prev: Installation, Up: PO Mode |
| |
| 8.3.2 Main PO mode Commands |
| --------------------------- |
| |
| After setting up Emacs with something similar to the lines in *note |
| Installation::, PO mode is activated for a window when Emacs finds a PO |
| file in that window. This puts the window read-only and establishes a |
| po-mode-map, which is a genuine Emacs mode, in a way that is not derived |
| from text mode in any way. Functions found on `po-mode-hook', if any, |
| will be executed. |
| |
| When PO mode is active in a window, the letters `PO' appear in the |
| mode line for that window. The mode line also displays how many |
| entries of each kind are held in the PO file. For example, the string |
| `132t+3f+10u+2o' would tell the translator that the PO mode contains |
| 132 translated entries (*note Translated Entries::, 3 fuzzy entries |
| (*note Fuzzy Entries::), 10 untranslated entries (*note Untranslated |
| Entries::) and 2 obsolete entries (*note Obsolete Entries::). |
| Zero-coefficients items are not shown. So, in this example, if the |
| fuzzy entries were unfuzzied, the untranslated entries were translated |
| and the obsolete entries were deleted, the mode line would merely |
| display `145t' for the counters. |
| |
| The main PO commands are those which do not fit into the other |
| categories of subsequent sections. These allow for quitting PO mode or |
| for managing windows in special ways. |
| |
| `_' |
| Undo last modification to the PO file (`po-undo'). |
| |
| `Q' |
| Quit processing and save the PO file (`po-quit'). |
| |
| `q' |
| Quit processing, possibly after confirmation |
| (`po-confirm-and-quit'). |
| |
| `0' |
| Temporary leave the PO file window (`po-other-window'). |
| |
| `?' |
| `h' |
| Show help about PO mode (`po-help'). |
| |
| `=' |
| Give some PO file statistics (`po-statistics'). |
| |
| `V' |
| Batch validate the format of the whole PO file (`po-validate'). |
| |
| |
| The command `_' (`po-undo') interfaces to the Emacs _undo_ facility. |
| *Note Undoing Changes: (emacs)Undo. Each time `_' is typed, |
| modifications which the translator did to the PO file are undone a |
| little more. For the purpose of undoing, each PO mode command is |
| atomic. This is especially true for the `<RET>' command: the whole |
| edition made by using a single use of this command is undone at once, |
| even if the edition itself implied several actions. However, while in |
| the editing window, one can undo the edition work quite parsimoniously. |
| |
| The commands `Q' (`po-quit') and `q' (`po-confirm-and-quit') are |
| used when the translator is done with the PO file. The former is a bit |
| less verbose than the latter. If the file has been modified, it is |
| saved to disk first. In both cases, and prior to all this, the |
| commands check if any untranslated messages remain in the PO file and, |
| if so, the translator is asked if she really wants to leave off working |
| with this PO file. This is the preferred way of getting rid of an |
| Emacs PO file buffer. Merely killing it through the usual command |
| `C-x k' (`kill-buffer') is not the tidiest way to proceed. |
| |
| The command `0' (`po-other-window') is another, softer way, to leave |
| PO mode, temporarily. It just moves the cursor to some other Emacs |
| window, and pops one if necessary. For example, if the translator just |
| got PO mode to show some source context in some other, she might |
| discover some apparent bug in the program source that needs correction. |
| This command allows the translator to change sex, become a programmer, |
| and have the cursor right into the window containing the program she |
| (or rather _he_) wants to modify. By later getting the cursor back in |
| the PO file window, or by asking Emacs to edit this file once again, PO |
| mode is then recovered. |
| |
| The command `h' (`po-help') displays a summary of all available PO |
| mode commands. The translator should then type any character to resume |
| normal PO mode operations. The command `?' has the same effect as `h'. |
| |
| The command `=' (`po-statistics') computes the total number of |
| entries in the PO file, the ordinal of the current entry (counted from |
| 1), the number of untranslated entries, the number of obsolete entries, |
| and displays all these numbers. |
| |
| The command `V' (`po-validate') launches `msgfmt' in checking and |
| verbose mode over the current PO file. This command first offers to |
| save the current PO file on disk. The `msgfmt' tool, from GNU |
| `gettext', has the purpose of creating a MO file out of a PO file, and |
| PO mode uses the features of this program for checking the overall |
| format of a PO file, as well as all individual entries. |
| |
| The program `msgfmt' runs asynchronously with Emacs, so the |
| translator regains control immediately while her PO file is being |
| studied. Error output is collected in the Emacs `*compilation*' buffer, |
| displayed in another window. The regular Emacs command `C-x`' |
| (`next-error'), as well as other usual compile commands, allow the |
| translator to reposition quickly to the offending parts of the PO file. |
| Once the cursor is on the line in error, the translator may decide on |
| any PO mode action which would help correcting the error. |
| |
| |
| File: gettext.info, Node: Entry Positioning, Next: Normalizing, Prev: Main PO Commands, Up: PO Mode |
| |
| 8.3.3 Entry Positioning |
| ----------------------- |
| |
| The cursor in a PO file window is almost always part of an entry. |
| The only exceptions are the special case when the cursor is after the |
| last entry in the file, or when the PO file is empty. The entry where |
| the cursor is found to be is said to be the current entry. Many PO |
| mode commands operate on the current entry, so moving the cursor does |
| more than allowing the translator to browse the PO file, this also |
| selects on which entry commands operate. |
| |
| Some PO mode commands alter the position of the cursor in a |
| specialized way. A few of those special purpose positioning are |
| described here, the others are described in following sections (for a |
| complete list try `C-h m'): |
| |
| `.' |
| Redisplay the current entry (`po-current-entry'). |
| |
| `n' |
| Select the entry after the current one (`po-next-entry'). |
| |
| `p' |
| Select the entry before the current one (`po-previous-entry'). |
| |
| `<' |
| Select the first entry in the PO file (`po-first-entry'). |
| |
| `>' |
| Select the last entry in the PO file (`po-last-entry'). |
| |
| `m' |
| Record the location of the current entry for later use |
| (`po-push-location'). |
| |
| `r' |
| Return to a previously saved entry location (`po-pop-location'). |
| |
| `x' |
| Exchange the current entry location with the previously saved one |
| (`po-exchange-location'). |
| |
| |
| Any Emacs command able to reposition the cursor may be used to |
| select the current entry in PO mode, including commands which move by |
| characters, lines, paragraphs, screens or pages, and search commands. |
| However, there is a kind of standard way to display the current entry |
| in PO mode, which usual Emacs commands moving the cursor do not |
| especially try to enforce. The command `.' (`po-current-entry') has |
| the sole purpose of redisplaying the current entry properly, after the |
| current entry has been changed by means external to PO mode, or the |
| Emacs screen otherwise altered. |
| |
| It is yet to be decided if PO mode helps the translator, or otherwise |
| irritates her, by forcing a rigid window disposition while she is doing |
| her work. We originally had quite precise ideas about how windows |
| should behave, but on the other hand, anyone used to Emacs is often |
| happy to keep full control. Maybe a fixed window disposition might be |
| offered as a PO mode option that the translator might activate or |
| deactivate at will, so it could be offered on an experimental basis. |
| If nobody feels a real need for using it, or a compulsion for writing |
| it, we should drop this whole idea. The incentive for doing it should |
| come from translators rather than programmers, as opinions from an |
| experienced translator are surely more worth to me than opinions from |
| programmers _thinking_ about how _others_ should do translation. |
| |
| The commands `n' (`po-next-entry') and `p' (`po-previous-entry') |
| move the cursor the entry following, or preceding, the current one. If |
| `n' is given while the cursor is on the last entry of the PO file, or |
| if `p' is given while the cursor is on the first entry, no move is done. |
| |
| The commands `<' (`po-first-entry') and `>' (`po-last-entry') move |
| the cursor to the first entry, or last entry, of the PO file. When the |
| cursor is located past the last entry in a PO file, most PO mode |
| commands will return an error saying `After last entry'. Moreover, the |
| commands `<' and `>' have the special property of being able to work |
| even when the cursor is not into some PO file entry, and one may use |
| them for nicely correcting this situation. But even these commands |
| will fail on a truly empty PO file. There are development plans for |
| the PO mode for it to interactively fill an empty PO file from sources. |
| *Note Marking::. |
| |
| The translator may decide, before working at the translation of a |
| particular entry, that she needs to browse the remainder of the PO |
| file, maybe for finding the terminology or phraseology used in related |
| entries. She can of course use the standard Emacs idioms for saving |
| the current cursor location in some register, and use that register for |
| getting back, or else, use the location ring. |
| |
| PO mode offers another approach, by which cursor locations may be |
| saved onto a special stack. The command `m' (`po-push-location') |
| merely adds the location of current entry to the stack, pushing the |
| already saved locations under the new one. The command `r' |
| (`po-pop-location') consumes the top stack element and repositions the |
| cursor to the entry associated with that top element. This position is |
| then lost, for the next `r' will move the cursor to the previously |
| saved location, and so on until no locations remain on the stack. |
| |
| If the translator wants the position to be kept on the location |
| stack, maybe for taking a look at the entry associated with the top |
| element, then go elsewhere with the intent of getting back later, she |
| ought to use `m' immediately after `r'. |
| |
| The command `x' (`po-exchange-location') simultaneously repositions |
| the cursor to the entry associated with the top element of the stack of |
| saved locations, and replaces that top element with the location of the |
| current entry before the move. Consequently, repeating the `x' command |
| toggles alternatively between two entries. For achieving this, the |
| translator will position the cursor on the first entry, use `m', then |
| position to the second entry, and merely use `x' for making the switch. |
| |
| |
| File: gettext.info, Node: Normalizing, Next: Translated Entries, Prev: Entry Positioning, Up: PO Mode |
| |
| 8.3.4 Normalizing Strings in Entries |
| ------------------------------------ |
| |
| There are many different ways for encoding a particular string into a |
| PO file entry, because there are so many different ways to split and |
| quote multi-line strings, and even, to represent special characters by |
| backslashed escaped sequences. Some features of PO mode rely on the |
| ability for PO mode to scan an already existing PO file for a |
| particular string encoded into the `msgid' field of some entry. Even |
| if PO mode has internally all the built-in machinery for implementing |
| this recognition easily, doing it fast is technically difficult. To |
| facilitate a solution to this efficiency problem, we decided on a |
| canonical representation for strings. |
| |
| A conventional representation of strings in a PO file is currently |
| under discussion, and PO mode experiments with a canonical |
| representation. Having both `xgettext' and PO mode converging towards |
| a uniform way of representing equivalent strings would be useful, as |
| the internal normalization needed by PO mode could be automatically |
| satisfied when using `xgettext' from GNU `gettext'. An explicit PO |
| mode normalization should then be only necessary for PO files imported |
| from elsewhere, or for when the convention itself evolves. |
| |
| So, for achieving normalization of at least the strings of a given |
| PO file needing a canonical representation, the following PO mode |
| command is available: |
| |
| `M-x po-normalize' |
| Tidy the whole PO file by making entries more uniform. |
| |
| |
| The special command `M-x po-normalize', which has no associated |
| keys, revises all entries, ensuring that strings of both original and |
| translated entries use uniform internal quoting in the PO file. It |
| also removes any crumb after the last entry. This command may be |
| useful for PO files freshly imported from elsewhere, or if we ever |
| improve on the canonical quoting format we use. This canonical format |
| is not only meant for getting cleaner PO files, but also for greatly |
| speeding up `msgid' string lookup for some other PO mode commands. |
| |
| `M-x po-normalize' presently makes three passes over the entries. |
| The first implements heuristics for converting PO files for GNU |
| `gettext' 0.6 and earlier, in which `msgid' and `msgstr' fields were |
| using K&R style C string syntax for multi-line strings. These |
| heuristics may fail for comments not related to obsolete entries and |
| ending with a backslash; they also depend on subsequent passes for |
| finalizing the proper commenting of continued lines for obsolete |
| entries. This first pass might disappear once all oldish PO files |
| would have been adjusted. The second and third pass normalize all |
| `msgid' and `msgstr' strings respectively. They also clean out those |
| trailing backslashes used by XView's `msgfmt' for continued lines. |
| |
| Having such an explicit normalizing command allows for importing PO |
| files from other sources, but also eases the evolution of the current |
| convention, evolution driven mostly by aesthetic concerns, as of now. |
| It is easy to make suggested adjustments at a later time, as the |
| normalizing command and eventually, other GNU `gettext' tools should |
| greatly automate conformance. A description of the canonical string |
| format is given below, for the particular benefit of those not having |
| Emacs handy, and who would nevertheless want to handcraft their PO |
| files in nice ways. |
| |
| Right now, in PO mode, strings are single line or multi-line. A |
| string goes multi-line if and only if it has _embedded_ newlines, that |
| is, if it matches `[^\n]\n+[^\n]'. So, we would have: |
| |
| msgstr "\n\nHello, world!\n\n\n" |
| |
| but, replacing the space by a newline, this becomes: |
| |
| msgstr "" |
| "\n" |
| "\n" |
| "Hello,\n" |
| "world!\n" |
| "\n" |
| "\n" |
| |
| We are deliberately using a caricatural example, here, to make the |
| point clearer. Usually, multi-lines are not that bad looking. It is |
| probable that we will implement the following suggestion. We might |
| lump together all initial newlines into the empty string, and also all |
| newlines introducing empty lines (that is, for N > 1, the N-1'th last |
| newlines would go together on a separate string), so making the |
| previous example appear: |
| |
| msgstr "\n\n" |
| "Hello,\n" |
| "world!\n" |
| "\n\n" |
| |
| There are a few yet undecided little points about string |
| normalization, to be documented in this manual, once these questions |
| settle. |
| |
| |
| File: gettext.info, Node: Translated Entries, Next: Fuzzy Entries, Prev: Normalizing, Up: PO Mode |
| |
| 8.3.5 Translated Entries |
| ------------------------ |
| |
| Each PO file entry for which the `msgstr' field has been filled with |
| a translation, and which is not marked as fuzzy (*note Fuzzy Entries::), |
| is said to be a "translated" entry. Only translated entries will later |
| be compiled by GNU `msgfmt' and become usable in programs. Other entry |
| types will be excluded; translation will not occur for them. |
| |
| Some commands are more specifically related to translated entry |
| processing. |
| |
| `t' |
| Find the next translated entry (`po-next-translated-entry'). |
| |
| `T' |
| Find the previous translated entry |
| (`po-previous-translated-entry'). |
| |
| |
| The commands `t' (`po-next-translated-entry') and `T' |
| (`po-previous-translated-entry') move forwards or backwards, chasing |
| for an translated entry. If none is found, the search is extended and |
| wraps around in the PO file buffer. |
| |
| Translated entries usually result from the translator having edited |
| in a translation for them, *note Modifying Translations::. However, if |
| the variable `po-auto-fuzzy-on-edit' is not `nil', the entry having |
| received a new translation first becomes a fuzzy entry, which ought to |
| be later unfuzzied before becoming an official, genuine translated |
| entry. *Note Fuzzy Entries::. |
| |
| |
| File: gettext.info, Node: Fuzzy Entries, Next: Untranslated Entries, Prev: Translated Entries, Up: PO Mode |
| |
| 8.3.6 Fuzzy Entries |
| ------------------- |
| |
| Each PO file entry may have a set of "attributes", which are |
| qualities given a name and explicitly associated with the translation, |
| using a special system comment. One of these attributes has the name |
| `fuzzy', and entries having this attribute are said to have a fuzzy |
| translation. They are called fuzzy entries, for short. |
| |
| Fuzzy entries, even if they account for translated entries for most |
| other purposes, usually call for revision by the translator. Those may |
| be produced by applying the program `msgmerge' to update an older |
| translated PO files according to a new PO template file, when this tool |
| hypothesises that some new `msgid' has been modified only slightly out |
| of an older one, and chooses to pair what it thinks to be the old |
| translation for the new modified entry. The slight alteration in the |
| original string (the `msgid' string) should often be reflected in the |
| translated string, and this requires the intervention of the |
| translator. For this reason, `msgmerge' might mark some entries as |
| being fuzzy. |
| |
| Also, the translator may decide herself to mark an entry as fuzzy |
| for her own convenience, when she wants to remember that the entry has |
| to be later revisited. So, some commands are more specifically related |
| to fuzzy entry processing. |
| |
| `f' |
| Find the next fuzzy entry (`po-next-fuzzy-entry'). |
| |
| `F' |
| Find the previous fuzzy entry (`po-previous-fuzzy-entry'). |
| |
| `<TAB>' |
| Remove the fuzzy attribute of the current entry (`po-unfuzzy'). |
| |
| |
| The commands `f' (`po-next-fuzzy-entry') and `F' |
| (`po-previous-fuzzy-entry') move forwards or backwards, chasing for a |
| fuzzy entry. If none is found, the search is extended and wraps around |
| in the PO file buffer. |
| |
| The command `<TAB>' (`po-unfuzzy') removes the fuzzy attribute |
| associated with an entry, usually leaving it translated. Further, if |
| the variable `po-auto-select-on-unfuzzy' has not the `nil' value, the |
| `<TAB>' command will automatically chase for another interesting entry |
| to work on. The initial value of `po-auto-select-on-unfuzzy' is `nil'. |
| |
| The initial value of `po-auto-fuzzy-on-edit' is `nil'. However, if |
| the variable `po-auto-fuzzy-on-edit' is set to `t', any entry edited |
| through the `<RET>' command is marked fuzzy, as a way to ensure some |
| kind of double check, later. In this case, the usual paradigm is that |
| an entry becomes fuzzy (if not already) whenever the translator |
| modifies it. If she is satisfied with the translation, she then uses |
| `<TAB>' to pick another entry to work on, clearing the fuzzy attribute |
| on the same blow. If she is not satisfied yet, she merely uses `<SPC>' |
| to chase another entry, leaving the entry fuzzy. |
| |
| The translator may also use the `<DEL>' command |
| (`po-fade-out-entry') over any translated entry to mark it as being |
| fuzzy, when she wants to easily leave a trace she wants to later return |
| working at this entry. |
| |
| Also, when time comes to quit working on a PO file buffer with the |
| `q' command, the translator is asked for confirmation, if fuzzy string |
| still exists. |
| |
| |
| File: gettext.info, Node: Untranslated Entries, Next: Obsolete Entries, Prev: Fuzzy Entries, Up: PO Mode |
| |
| 8.3.7 Untranslated Entries |
| -------------------------- |
| |
| When `xgettext' originally creates a PO file, unless told otherwise, |
| it initializes the `msgid' field with the untranslated string, and |
| leaves the `msgstr' string to be empty. Such entries, having an empty |
| translation, are said to be "untranslated" entries. Later, when the |
| programmer slightly modifies some string right in the program, this |
| change is later reflected in the PO file by the appearance of a new |
| untranslated entry for the modified string. |
| |
| The usual commands moving from entry to entry consider untranslated |
| entries on the same level as active entries. Untranslated entries are |
| easily recognizable by the fact they end with `msgstr ""'. |
| |
| The work of the translator might be (quite naively) seen as the |
| process of seeking for an untranslated entry, editing a translation for |
| it, and repeating these actions until no untranslated entries remain. |
| Some commands are more specifically related to untranslated entry |
| processing. |
| |
| `u' |
| Find the next untranslated entry (`po-next-untranslated-entry'). |
| |
| `U' |
| Find the previous untranslated entry |
| (`po-previous-untransted-entry'). |
| |
| `k' |
| Turn the current entry into an untranslated one (`po-kill-msgstr'). |
| |
| |
| The commands `u' (`po-next-untranslated-entry') and `U' |
| (`po-previous-untransted-entry') move forwards or backwards, chasing |
| for an untranslated entry. If none is found, the search is extended |
| and wraps around in the PO file buffer. |
| |
| An entry can be turned back into an untranslated entry by merely |
| emptying its translation, using the command `k' (`po-kill-msgstr'). |
| *Note Modifying Translations::. |
| |
| Also, when time comes to quit working on a PO file buffer with the |
| `q' command, the translator is asked for confirmation, if some |
| untranslated string still exists. |
| |
| |
| File: gettext.info, Node: Obsolete Entries, Next: Modifying Translations, Prev: Untranslated Entries, Up: PO Mode |
| |
| 8.3.8 Obsolete Entries |
| ---------------------- |
| |
| By "obsolete" PO file entries, we mean those entries which are |
| commented out, usually by `msgmerge' when it found that the translation |
| is not needed anymore by the package being localized. |
| |
| The usual commands moving from entry to entry consider obsolete |
| entries on the same level as active entries. Obsolete entries are |
| easily recognizable by the fact that all their lines start with `#', |
| even those lines containing `msgid' or `msgstr'. |
| |
| Commands exist for emptying the translation or reinitializing it to |
| the original untranslated string. Commands interfacing with the kill |
| ring may force some previously saved text into the translation. The |
| user may interactively edit the translation. All these commands may |
| apply to obsolete entries, carefully leaving the entry obsolete after |
| the fact. |
| |
| Moreover, some commands are more specifically related to obsolete |
| entry processing. |
| |
| `o' |
| Find the next obsolete entry (`po-next-obsolete-entry'). |
| |
| `O' |
| Find the previous obsolete entry (`po-previous-obsolete-entry'). |
| |
| `<DEL>' |
| Make an active entry obsolete, or zap out an obsolete entry |
| (`po-fade-out-entry'). |
| |
| |
| The commands `o' (`po-next-obsolete-entry') and `O' |
| (`po-previous-obsolete-entry') move forwards or backwards, chasing for |
| an obsolete entry. If none is found, the search is extended and wraps |
| around in the PO file buffer. |
| |
| PO mode does not provide ways for un-commenting an obsolete entry |
| and making it active, because this would reintroduce an original |
| untranslated string which does not correspond to any marked string in |
| the program sources. This goes with the philosophy of never |
| introducing useless `msgid' values. |
| |
| However, it is possible to comment out an active entry, so making it |
| obsolete. GNU `gettext' utilities will later react to the |
| disappearance of a translation by using the untranslated string. The |
| command `<DEL>' (`po-fade-out-entry') pushes the current entry a little |
| further towards annihilation. If the entry is active (it is a |
| translated entry), then it is first made fuzzy. If it is already fuzzy, |
| then the entry is merely commented out, with confirmation. If the entry |
| is already obsolete, then it is completely deleted from the PO file. |
| It is easy to recycle the translation so deleted into some other PO file |
| entry, usually one which is untranslated. *Note Modifying |
| Translations::. |
| |
| Here is a quite interesting problem to solve for later development of |
| PO mode, for those nights you are not sleepy. The idea would be that |
| PO mode might become bright enough, one of these days, to make good |
| guesses at retrieving the most probable candidate, among all obsolete |
| entries, for initializing the translation of a newly appeared string. |
| I think it might be a quite hard problem to do this algorithmically, as |
| we have to develop good and efficient measures of string similarity. |
| Right now, PO mode completely lets the decision to the translator, when |
| the time comes to find the adequate obsolete translation, it merely |
| tries to provide handy tools for helping her to do so. |
| |
| |
| File: gettext.info, Node: Modifying Translations, Next: Modifying Comments, Prev: Obsolete Entries, Up: PO Mode |
| |
| 8.3.9 Modifying Translations |
| ---------------------------- |
| |
| PO mode prevents direct modification of the PO file, by the usual |
| means Emacs gives for altering a buffer's contents. By doing so, it |
| pretends helping the translator to avoid little clerical errors about |
| the overall file format, or the proper quoting of strings, as those |
| errors would be easily made. Other kinds of errors are still possible, |
| but some may be caught and diagnosed by the batch validation process, |
| which the translator may always trigger by the `V' command. For all |
| other errors, the translator has to rely on her own judgment, and also |
| on the linguistic reports submitted to her by the users of the |
| translated package, having the same mother tongue. |
| |
| When the time comes to create a translation, correct an error |
| diagnosed mechanically or reported by a user, the translators have to |
| resort to using the following commands for modifying the translations. |
| |
| `<RET>' |
| Interactively edit the translation (`po-edit-msgstr'). |
| |
| `<LFD>' |
| `C-j' |
| Reinitialize the translation with the original, untranslated string |
| (`po-msgid-to-msgstr'). |
| |
| `k' |
| Save the translation on the kill ring, and delete it |
| (`po-kill-msgstr'). |
| |
| `w' |
| Save the translation on the kill ring, without deleting it |
| (`po-kill-ring-save-msgstr'). |
| |
| `y' |
| Replace the translation, taking the new from the kill ring |
| (`po-yank-msgstr'). |
| |
| |
| The command `<RET>' (`po-edit-msgstr') opens a new Emacs window |
| meant to edit in a new translation, or to modify an already existing |
|