fe/imp/ConvertUTF.h File Reference

Conversions between UTF32, UTF-16, and UTF-8. More...

Namespaces

namespace fe

A scope for Ferry API.

namespace fe::imp

A scope for imported stuff.

Typedefs

typedef unsigned short fe::imp::UTF16

typedef unsigned long fe::imp::UTF32

typedef unsigned char fe::imp::UTF8

Enumerations

enum fe::imp::ConversionFlags { fe::imp::strictConversion, fe::imp::lenientConversion, fe::imp::firstCodePointStrictConversion, fe::imp::firstCodePointLenientConversion }

enum fe::imp::ConversionResult { fe::imp::conversionOK, fe::imp::sourceExhausted, fe::imp::targetExhausted, fe::imp::sourceIllegal }

Functions

fe::imp::ConversionResult Fe_ConvertUTF16toUTF32 (const fe::imp::UTF16 **sourceStart, const fe::imp::UTF16 *sourceEnd, fe::imp::UTF32 **targetStart, fe::imp::UTF32 *targetEnd, fe::imp::ConversionFlags flags)

fe::imp::ConversionResult Fe_ConvertUTF16toUTF8 (const fe::imp::UTF16 **sourceStart, const fe::imp::UTF16 *sourceEnd, fe::imp::UTF8 **targetStart, fe::imp::UTF8 *targetEnd, fe::imp::ConversionFlags flags)

fe::imp::ConversionResult Fe_ConvertUTF32toUTF16 (const fe::imp::UTF32 **sourceStart, const fe::imp::UTF32 *sourceEnd, fe::imp::UTF16 **targetStart, fe::imp::UTF16 *targetEnd, fe::imp::ConversionFlags flags)

fe::imp::ConversionResult Fe_ConvertUTF32toUTF8 (const fe::imp::UTF32 **sourceStart, const fe::imp::UTF32 *sourceEnd, fe::imp::UTF8 **targetStart, fe::imp::UTF8 *targetEnd, fe::imp::ConversionFlags flags)

fe::imp::ConversionResult Fe_ConvertUTF8toUTF16 (const fe::imp::UTF8 **sourceStart, const fe::imp::UTF8 *sourceEnd, fe::imp::UTF16 **targetStart, fe::imp::UTF16 *targetEnd, fe::imp::ConversionFlags flags)

fe::imp::ConversionResult Fe_ConvertUTF8toUTF32 (const fe::imp::UTF8 **sourceStart, const fe::imp::UTF8 *sourceEnd, fe::imp::UTF32 **targetStart, fe::imp::UTF32 *targetEnd, fe::imp::ConversionFlags flags)

bool Fe_IsLegalUTF32 (fe::imp::UTF32 ch)

bool Fe_IsLegalUTF8Sequence (const fe::imp::UTF8 *source, const fe::imp::UTF8 *sourceEnd)

Detailed Description

Conversions between UTF32, UTF-16, and UTF-8.

This file is modified http://unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.h, as well as implementation is modified http://unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c.

Several funtions are included here, forming a complete set of conversions between the three formats.

Each of these routines takes pointers to input buffers and output buffers. The input buffers are const.

Each routine converts the text between *sourceStart and sourceEnd, putting the result into the buffer between *targetStart and targetEnd. Note: the end pointers are *after* the last item: e.g. (sourceEnd - 1) is the last item.

The return result indicates whether the conversion was successful, and if not, whether the problem was in the source or target buffers. (Only the first encountered problem is indicated.)

After the conversion, *sourceStart and *targetStart are both updated to point to the end of last text successfully converted in the respective buffers.

Input parameters:

sourceStart - pointer to a pointer to the source buffer. The contents of this are modified on return so that it points at the next thing to be converted;
targetStart - similarly, pointer to pointer to the target buffer;
sourceEnd, targetEnd - respectively pointers to the ends of the two buffers, for overflow checking only.

These conversion functions take a fe::imp::ConversionFlags argument. When this flag is set to fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion, both irregular sequences and isolated surrogates will cause an error. When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion, both irregular sequences and isolated surrogates are converted. When the flag is set to fe::imp::firstCodePointStrictConversion or fe::imp::firstCodePointLenientConversion, function returns after the first code point converted.

If the flag is fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion, all illegal sequences will cause an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>, or <A0> in UTF-8, and values above 0x10FFFF in UTF-32.

When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion, characters over 0x10FFFF are converted to the replacement character; otherwise (when the flag is set to fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion) they constitute an error.

Output parameters:

The value fe::imp::sourceIllegal is returned from some routines if the input sequence is malformed. When fe::imp::sourceIllegal is returned, the source value will point to the illegal value that caused the problem. E.g., in UTF-8 when a sequence is malformed, it points to the start of the malformed sequence. If the flag is fe::imp::lenientConversion and input sequence is malformed the return value indicates error, but conversion completes, i.e. input sequence is read and processed up to the sourceEnd. If the flag is fe::imp::firstCodePointLenientConversion and input sequence is malformed the return value indicates error, but conversion completes, i.e. input sequence is read and processed up to the first code unit following the last code unit of the decoded code point.

Note:: When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion conversion routines may produce output code unit sequences that encode illegal characters. This is intended behavior, though it is not Unicode conformant.


Namespaces
namespace	fe
	A scope for Ferry API.
namespace	fe::imp
	A scope for imported stuff.
Typedefs
typedef unsigned short	fe::imp::UTF16
typedef unsigned long	fe::imp::UTF32
typedef unsigned char	fe::imp::UTF8
Enumerations
enum	fe::imp::ConversionFlags { fe::imp::strictConversion, fe::imp::lenientConversion, fe::imp::firstCodePointStrictConversion, fe::imp::firstCodePointLenientConversion }
enum	fe::imp::ConversionResult { fe::imp::conversionOK, fe::imp::sourceExhausted, fe::imp::targetExhausted, fe::imp::sourceIllegal }
Functions
fe::imp::ConversionResult	Fe_ConvertUTF16toUTF32 (const fe::imp::UTF16 *sourceStart, const fe::imp::UTF16 sourceEnd, fe::imp::UTF32 *targetStart, fe::imp::UTF32 targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult	Fe_ConvertUTF16toUTF8 (const fe::imp::UTF16 *sourceStart, const fe::imp::UTF16 sourceEnd, fe::imp::UTF8 *targetStart, fe::imp::UTF8 targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult	Fe_ConvertUTF32toUTF16 (const fe::imp::UTF32 *sourceStart, const fe::imp::UTF32 sourceEnd, fe::imp::UTF16 *targetStart, fe::imp::UTF16 targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult	Fe_ConvertUTF32toUTF8 (const fe::imp::UTF32 *sourceStart, const fe::imp::UTF32 sourceEnd, fe::imp::UTF8 *targetStart, fe::imp::UTF8 targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult	Fe_ConvertUTF8toUTF16 (const fe::imp::UTF8 *sourceStart, const fe::imp::UTF8 sourceEnd, fe::imp::UTF16 *targetStart, fe::imp::UTF16 targetEnd, fe::imp::ConversionFlags flags)
fe::imp::ConversionResult	Fe_ConvertUTF8toUTF32 (const fe::imp::UTF8 *sourceStart, const fe::imp::UTF8 sourceEnd, fe::imp::UTF32 *targetStart, fe::imp::UTF32 targetEnd, fe::imp::ConversionFlags flags)
bool	Fe_IsLegalUTF32 (fe::imp::UTF32 ch)
bool	Fe_IsLegalUTF8Sequence (const fe::imp::UTF8 source, const fe::imp::UTF8 sourceEnd)