Unit synachar

DescriptionusesClasses, Interfaces and ObjectsFunctions and ProceduresTypesConstantsVariables

Description

Charset conversion support

This unit contains a routines for lot of charset conversions.

It using built-in conversion tables or external Iconv library. Iconv is used when needed conversion is known by Iconv library. When Iconv library is not found or Iconv not know requested conversion, then are internal routines used for conversion. (You can disable Iconv support from your program too!)

Internal routines knows all major charsets for Europe or America. For East-Asian charsets you must use Iconv library!

Functions and Procedures

Overview

function CharsetConversion(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar): AnsiString;
function CharsetConversionEx(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word): AnsiString;
function CharsetConversionTrans(Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word; Translit: Boolean): AnsiString;
function GetCurCP: TMimeChar;
function GetCurOEMCP: TMimeChar;
function GetCPFromID(Value: AnsiString): TMimeChar;
function GetIDFromCP(Value: TMimeChar): AnsiString;
function NeedCharsetConversion(const Value: AnsiString): Boolean;
function IdealCharsetCoding(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeSetChar): TMimeChar;
function GetBOM(Value: TMimeChar): AnsiString;
function StringToWide(const Value: AnsiString): WideString;
function WideToString(const Value: WideString): AnsiString;

Description

function CharsetConversion(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar): AnsiString;

Convert Value from one charset to another. See: CharsetConversionEx

function CharsetConversionEx(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word): AnsiString;

Convert Value from one charset to another with additional character conversion. see: Replace_None and Replace_Czech

function CharsetConversionTrans(Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeChar; const TransformTable: array of Word; Translit: Boolean): AnsiString;

Convert Value from one charset to another with additional character conversion. This funtion is similar to CharsetConversionEx, but you can disable transliteration of unconvertible characters.

function GetBOM(Value: TMimeChar): AnsiString;

Return BOM (Byte Order Mark) for given unicode charset.

function GetCPFromID(Value: AnsiString): TMimeChar;

Converting string with charset name to TMimeChar.

function GetCurCP: TMimeChar;

Returns charset used by operating system.

function GetCurOEMCP: TMimeChar;

Returns charset used by operating system as OEM charset. (in Windows DOS box, for example)

function GetIDFromCP(Value: TMimeChar): AnsiString;

Converting TMimeChar to string with name of charset.

function IdealCharsetCoding(const Value: AnsiString; CharFrom: TMimeChar; CharTo: TMimeSetChar): TMimeChar;

Finding best target charset from set of TMimeChars with minimal count of unconvertible characters.

function NeedCharsetConversion(const Value: AnsiString): Boolean;

return True when value need to be converted. (It is not 7-bit ASCII)

function StringToWide(const Value: AnsiString): WideString;

Convert binary string with unicode content to WideString.

function WideToString(const Value: WideString): AnsiString;

Convert WideString to binary string with unicode content.

Types

NameDescription
TMimeChar

Type with all supported charsets.

  • ISO_8859_1:
  • ISO_8859_2:
  • ISO_8859_3:
  • ISO_8859_4:
  • ISO_8859_5:
  • ISO_8859_6:
  • ISO_8859_7:
  • ISO_8859_8:
  • ISO_8859_9:
  • ISO_8859_10:
  • ISO_8859_13:
  • ISO_8859_14:
  • ISO_8859_15:
  • CP1250:
  • CP1251:
  • CP1252:
  • CP1253:
  • CP1254:
  • CP1255:
  • CP1256:
  • CP1257:
  • CP1258:
  • KOI8_R:
  • CP895:
  • CP852:
  • UCS_2:
  • UCS_4:
  • UTF_8:
  • UTF_7:
  • UTF_7mod:
  • UCS_2LE:
  • UCS_4LE:
  • UTF_16:
  • UTF_16LE:
  • UTF_32:
  • UTF_32LE:
  • C99:
  • JAVA:
  • ISO_8859_16:
  • KOI8_U:
  • KOI8_RU:
  • CP862:
  • CP866:
  • MAC:
  • MACCE:
  • MACICE:
  • MACCRO:
  • MACRO:
  • MACCYR:
  • MACUK:
  • MACGR:
  • MACTU:
  • MACHEB:
  • MACAR:
  • MACTH:
  • ROMAN8:
  • NEXTSTEP:
  • ARMASCII:
  • GEORGIAN_AC:
  • GEORGIAN_PS:
  • KOI8_T:
  • MULELAO:
  • CP1133:
  • TIS620:
  • CP874:
  • VISCII:
  • TCVN:
  • ISO_IR_14:
  • JIS_X0201:
  • JIS_X0208:
  • JIS_X0212:
  • GB1988_80:
  • GB2312_80:
  • ISO_IR_165:
  • ISO_IR_149:
  • EUC_JP:
  • SHIFT_JIS:
  • CP932:
  • ISO_2022_JP:
  • ISO_2022_JP1:
  • ISO_2022_JP2:
  • GB2312:
  • CP936:
  • GB18030:
  • ISO_2022_CN:
  • ISO_2022_CNE:
  • HZ:
  • EUC_TW:
  • BIG5:
  • CP950:
  • BIG5_HKSCS:
  • EUC_KR:
  • CP949:
  • CP1361:
  • ISO_2022_KR:
  • CP737:
  • CP775:
  • CP853:
  • CP855:
  • CP857:
  • CP858:
  • CP860:
  • CP861:
  • CP863:
  • CP864:
  • CP865:
  • CP869:
  • CP1125:
TMimeSetChar = set of TMimeChar;

Set of any charsets.

Constants

NameDescription
IconvOnlyChars: set of TMimeChar = [UTF_16, UTF_16LE, UTF_32, UTF_32LE, C99, JAVA, ISO_8859_16, KOI8_U, KOI8_RU, CP862, CP866, MAC, MACCE, MACICE, MACCRO, MACRO, MACCYR, MACUK, MACGR, MACTU, MACHEB, MACAR, MACTH, ROMAN8, NEXTSTEP, ARMASCII, GEORGIAN_AC, GEORGIAN_PS, KOI8_T, MULELAO, CP1133, TIS620, CP874, VISCII, TCVN, ISO_IR_14, JIS_X0201, JIS_X0208, JIS_X0212, GB1988_80, GB2312_80, ISO_IR_165, ISO_IR_149, EUC_JP, SHIFT_JIS, CP932, ISO_2022_JP, ISO_2022_JP1, ISO_2022_JP2, GB2312, CP936, GB18030, ISO_2022_CN, ISO_2022_CNE, HZ, EUC_TW, BIG5, CP950, BIG5_HKSCS, EUC_KR, CP949, CP1361, ISO_2022_KR, CP737, CP775, CP853, CP855, CP857, CP858, CP860, CP861, CP863, CP864, CP865, CP869, CP1125];

Set of charsets supported by Iconv library only.

NoIconvChars: set of TMimeChar = [CP895, UTF_7mod];

Set of charsets supported by internal routines only.

Replace_None: array[0..0] of Word = (0);

null character replace table. (Usable for disable charater replacing.)

Replace_Czech: array[0..59] of Word = ( $00E1, $0061, $010D, $0063, $010F, $0064, $010E, $0044, $00E9, $0065, $011B, $0065, $00ED, $0069, $0148, $006E, $00F3, $006F, $0159, $0072, $0161, $0073, $0165, $0074, $00FA, $0075, $016F, $0075, $00FD, $0079, $017E, $007A, $00C1, $0041, $010C, $0043, $00C9, $0045, $011A, $0045, $00CD, $0049, $0147, $004E, $00D3, $004F, $0158, $0052, $0160, $0053, $0164, $0054, $00DA, $0055, $016E, $0055, $00DD, $0059, $017D, $005A );

Character replace table for remove Czech diakritics.

Variables

NameDescription
DisableIconv: Boolean = False;

By this you can generally disable/enable Iconv support.


Generated by PasDoc 0.8.8.2 on 2005-01-19 20:01:19