fe/imp/ConvertUTF.h File Reference
Conversions between UTF32, UTF-16, and UTF-8.
More...
|
Namespaces |
namespace | fe |
| A scope for Ferry API.
|
namespace | fe::imp |
| A scope for imported stuff.
|
Typedefs |
typedef unsigned short | fe::imp::UTF16 |
typedef unsigned long | fe::imp::UTF32 |
typedef unsigned char | fe::imp::UTF8 |
Enumerations |
enum | fe::imp::ConversionFlags { fe::imp::strictConversion,
fe::imp::lenientConversion,
fe::imp::firstCodePointStrictConversion,
fe::imp::firstCodePointLenientConversion
} |
enum | fe::imp::ConversionResult { fe::imp::conversionOK,
fe::imp::sourceExhausted,
fe::imp::targetExhausted,
fe::imp::sourceIllegal
} |
Functions |
fe::imp::ConversionResult | Fe_ConvertUTF16toUTF32 (const fe::imp::UTF16 **sourceStart, const fe::imp::UTF16 *sourceEnd, fe::imp::UTF32 **targetStart, fe::imp::UTF32 *targetEnd, fe::imp::ConversionFlags flags) |
fe::imp::ConversionResult | Fe_ConvertUTF16toUTF8 (const fe::imp::UTF16 **sourceStart, const fe::imp::UTF16 *sourceEnd, fe::imp::UTF8 **targetStart, fe::imp::UTF8 *targetEnd, fe::imp::ConversionFlags flags) |
fe::imp::ConversionResult | Fe_ConvertUTF32toUTF16 (const fe::imp::UTF32 **sourceStart, const fe::imp::UTF32 *sourceEnd, fe::imp::UTF16 **targetStart, fe::imp::UTF16 *targetEnd, fe::imp::ConversionFlags flags) |
fe::imp::ConversionResult | Fe_ConvertUTF32toUTF8 (const fe::imp::UTF32 **sourceStart, const fe::imp::UTF32 *sourceEnd, fe::imp::UTF8 **targetStart, fe::imp::UTF8 *targetEnd, fe::imp::ConversionFlags flags) |
fe::imp::ConversionResult | Fe_ConvertUTF8toUTF16 (const fe::imp::UTF8 **sourceStart, const fe::imp::UTF8 *sourceEnd, fe::imp::UTF16 **targetStart, fe::imp::UTF16 *targetEnd, fe::imp::ConversionFlags flags) |
fe::imp::ConversionResult | Fe_ConvertUTF8toUTF32 (const fe::imp::UTF8 **sourceStart, const fe::imp::UTF8 *sourceEnd, fe::imp::UTF32 **targetStart, fe::imp::UTF32 *targetEnd, fe::imp::ConversionFlags flags) |
bool | Fe_IsLegalUTF32 (fe::imp::UTF32 ch) |
bool | Fe_IsLegalUTF8Sequence (const fe::imp::UTF8 *source, const fe::imp::UTF8 *sourceEnd) |
Detailed Description
Conversions between UTF32, UTF-16, and UTF-8.
This file is modified http://unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.h, as well as implementation is modified http://unicode.org/Public/PROGRAMS/CVTUTF/ConvertUTF.c.
Several funtions are included here, forming a complete set of conversions between the three formats.
Each of these routines takes pointers to input buffers and output buffers. The input buffers are const.
Each routine converts the text between *sourceStart and sourceEnd, putting the result into the buffer between *targetStart and targetEnd. Note: the end pointers are *after* the last item: e.g. (sourceEnd - 1) is the last item.
The return result indicates whether the conversion was successful, and if not, whether the problem was in the source or target buffers. (Only the first encountered problem is indicated.)
After the conversion, *sourceStart and *targetStart are both updated to point to the end of last text successfully converted in the respective buffers.
Input parameters:
- sourceStart - pointer to a pointer to the source buffer. The contents of this are modified on return so that it points at the next thing to be converted;
- targetStart - similarly, pointer to pointer to the target buffer;
- sourceEnd, targetEnd - respectively pointers to the ends of the two buffers, for overflow checking only.
These conversion functions take a fe::imp::ConversionFlags argument. When this flag is set to fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion, both irregular sequences and isolated surrogates will cause an error. When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion, both irregular sequences and isolated surrogates are converted. When the flag is set to fe::imp::firstCodePointStrictConversion or fe::imp::firstCodePointLenientConversion, function returns after the first code point converted.
If the flag is fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion, all illegal sequences will cause an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>, or <A0> in UTF-8, and values above 0x10FFFF in UTF-32.
When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion, characters over 0x10FFFF are converted to the replacement character; otherwise (when the flag is set to fe::imp::strictConversion or fe::imp::firstCodePointStrictConversion) they constitute an error.
Output parameters:
- The value fe::imp::sourceIllegal is returned from some routines if the input sequence is malformed. When fe::imp::sourceIllegal is returned, the source value will point to the illegal value that caused the problem. E.g., in UTF-8 when a sequence is malformed, it points to the start of the malformed sequence. If the flag is fe::imp::lenientConversion and input sequence is malformed the return value indicates error, but conversion completes, i.e. input sequence is read and processed up to the sourceEnd. If the flag is fe::imp::firstCodePointLenientConversion and input sequence is malformed the return value indicates error, but conversion completes, i.e. input sequence is read and processed up to the first code unit following the last code unit of the decoded code point.
- Note:
- When the flag is set to fe::imp::lenientConversion or fe::imp::firstCodePointLenientConversion conversion routines may produce output code unit sequences that encode illegal characters. This is intended behavior, though it is not Unicode conformant.