fe/imp/ConvertUTF.h File Reference

Conversions between UTF32, UTF-16, and UTF-8. More...


Namespaces

namespace  fe
 A scope for Ferry API.
namespace  fe::imp
 A scope for imported stuff.

Typedefs

typedef unsigned short fe::imp::UTF16
typedef unsigned long fe::imp::UTF32
typedef unsigned char fe::imp::UTF8

Enumerations

enum  fe::imp::ConversionFlags { fe::imp::strictConversion, fe::imp::lenientConversion, fe::imp::firstCodePointStrictConversion, fe::imp::firstCodePointLenientConversion }
enum  fe::imp::ConversionResult { fe::imp::conversionOK, fe::imp::sourceExhausted, fe::imp::targetExhausted, fe::imp::sourceIllegal }

Functions

fe::imp::ConversionResult Fe_ConvertUTF16toUTF32 (const fe::imp::UTF16 **sourceStart, const fe::imp::UTF16 *sourceEnd, fe::imp::UTF32 **targetStart, fe::imp::UTF32 *targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult Fe_ConvertUTF16toUTF8 (const fe::imp::UTF16 **sourceStart, const fe::imp::UTF16 *sourceEnd, fe::imp::UTF8 **targetStart, fe::imp::UTF8 *targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult Fe_ConvertUTF32toUTF16 (const fe::imp::UTF32 **sourceStart, const fe::imp::UTF32 *sourceEnd, fe::imp::UTF16 **targetStart, fe::imp::UTF16 *targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult Fe_ConvertUTF32toUTF8 (const fe::imp::UTF32 **sourceStart, const fe::imp::UTF32 *sourceEnd, fe::imp::UTF8 **targetStart, fe::imp::UTF8 *targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult Fe_ConvertUTF8toUTF16 (const fe::imp::UTF8 **sourceStart, const fe::imp::UTF8 *sourceEnd, fe::imp::UTF16 **targetStart, fe::imp::UTF16 *targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult Fe_ConvertUTF8toUTF32 (const fe::imp::UTF8 **sourceStart, const fe::imp::UTF8 *sourceEnd, fe::imp::UTF32 **targetStart, fe::imp::UTF32 *targetEnd, fe::imp::ConversionFlags flags)
bool Fe_IsLegalUTF32 (fe::imp::UTF32 ch)
bool Fe_IsLegalUTF8Sequence (const fe::imp::UTF8 *source, const fe::imp::UTF8 *sourceEnd)


Detailed Description

Conversions between UTF32, UTF-16, and UTF-8.

This file is modified http://unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.h, as well as implementation is modified http://unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c.

Several funtions are included here, forming a complete set of conversions between the three formats.

Each of these routines takes pointers to input buffers and output buffers. The input buffers are const.

Each routine converts the text between *sourceStart and sourceEnd, putting the result into the buffer between *targetStart and targetEnd. Note: the end pointers are *after* the last item: e.g. (sourceEnd - 1) is the last item.

The return result indicates whether the conversion was successful, and if not, whether the problem was in the source or target buffers. (Only the first encountered problem is indicated.)

After the conversion, *sourceStart and *targetStart are both updated to point to the end of last text successfully converted in the respective buffers.

Input parameters:

These conversion functions take a fe::imp::ConversionFlags argument. When this flag is set to fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion, both irregular sequences and isolated surrogates will cause an error. When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion, both irregular sequences and isolated surrogates are converted. When the flag is set to fe::imp::firstCodePointStrictConversion or fe::imp::firstCodePointLenientConversion, function returns after the first code point converted.

If the flag is fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion, all illegal sequences will cause an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>, or <A0> in UTF-8, and values above 0x10FFFF in UTF-32.

When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion, characters over 0x10FFFF are converted to the replacement character; otherwise (when the flag is set to fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion) they constitute an error.

Output parameters:

Note:
When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion conversion routines may produce output code unit sequences that encode illegal characters. This is intended behavior, though it is not Unicode conformant.

Generated on Tue Nov 18 21:08:22 2008 for Ferry by doxygen 1.5.7.1
http://sourceforge.net