| README 2007/05/31 |
| |
| Oniguruma ---- (C) K.Kosako <sndgk393 AT ybb DOT ne DOT jp> |
| |
| http://www.geocities.jp/kosako3/oniguruma/ |
| |
| Oniguruma is a regular expressions library. |
| The characteristics of this library is that different character encoding |
| for every regular expression object can be specified. |
| |
| Supported character encodings: |
| |
| ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, |
| EUC-JP, EUC-TW, EUC-KR, EUC-CN, |
| Shift_JIS, Big5, GB18030, KOI8-R, CP1251, |
| ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, |
| ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, |
| ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16 |
| |
| * GB18030: contributed by KUBO Takehiro |
| * CP1251: contributed by Byte |
| ------------------------------------------------------------ |
| |
| License |
| |
| BSD license. |
| |
| |
| Install |
| |
| Case 1: Unix and Cygwin platform |
| |
| 1. ./configure |
| 2. make |
| 3. make install |
| |
| * uninstall |
| |
| make uninstall |
| |
| * test (ASCII/EUC-JP) |
| |
| make atest |
| |
| * configuration check |
| |
| onig-config --cflags |
| onig-config --libs |
| onig-config --prefix |
| onig-config --exec-prefix |
| |
| |
| |
| Case 2: Win32 platform (VC++) |
| |
| 1. copy win32\Makefile Makefile |
| 2. copy win32\config.h config.h |
| 3. nmake |
| |
| onig_s.lib: static link library |
| onig.dll: dynamic link library |
| |
| * test (ASCII/Shift_JIS) |
| 4. copy win32\testc.c testc.c |
| 5. nmake ctest |
| |
| |
| |
| Regular Expressions |
| |
| See doc/RE (or doc/RE.ja for Japanese). |
| |
| |
| Usage |
| |
| Include oniguruma.h in your program. (Oniguruma API) |
| See doc/API for Oniguruma API. |
| |
| If you want to disable UChar type (== unsigned char) definition |
| in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then |
| include oniguruma.h. |
| |
| If you want to disable regex_t type definition in oniguruma.h, |
| define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h. |
| |
| Example of the compiling/linking command line in Unix or Cygwin, |
| (prefix == /usr/local case) |
| |
| cc sample.c -L/usr/local/lib -lonig |
| |
| |
| If you want to use static link library(onig_s.lib) in Win32, |
| add option -DONIG_EXTERN=extern to C compiler. |
| |
| |
| |
| Sample Programs |
| |
| sample/simple.c example of the minimum (Oniguruma API) |
| sample/names.c example of the named group callback. |
| sample/encode.c example of some encodings. |
| sample/listcap.c example of the capture history. |
| sample/posix.c POSIX API sample. |
| sample/sql.c example of the variable meta characters. |
| (SQL-like pattern matching) |
| |
| Test Programs |
| sample/syntax.c Perl, Java and ASIS syntax test. |
| sample/crnl.c --enable-crnl-as-line-terminator test |
| |
| |
| Source Files |
| |
| oniguruma.h Oniguruma API header file. (public) |
| onig-config.in configuration check program template. |
| |
| regenc.h character encodings framework header file. |
| regint.h internal definitions |
| regparse.h internal definitions for regparse.c and regcomp.c |
| regcomp.c compiling and optimization functions |
| regenc.c character encodings framework. |
| regerror.c error message function |
| regext.c extended API functions. (deluxe version API) |
| regexec.c search and match functions |
| regparse.c parsing functions. |
| regsyntax.c pattern syntax functions and built-in syntax definitions. |
| regtrav.c capture history tree data traverse functions. |
| regversion.c version info function. |
| st.h hash table functions header file |
| st.c hash table functions |
| |
| oniggnu.h GNU regex API header file. (public) |
| reggnu.c GNU regex API functions |
| |
| onigposix.h POSIX API header file. (public) |
| regposerr.c POSIX error message function. |
| regposix.c POSIX API functions. |
| |
| enc/mktable.c character type table generator. |
| enc/ascii.c ASCII encoding. |
| enc/euc_jp.c EUC-JP encoding. |
| enc/euc_tw.c EUC-TW encoding. |
| enc/euc_kr.c EUC-KR, EUC-CN encoding. |
| enc/sjis.c Shift_JIS encoding. |
| enc/big5.c Big5 encoding. |
| enc/gb18030.c GB18030 encoding. |
| enc/koi8.c KOI8 encoding. |
| enc/koi8_r.c KOI8-R encoding. |
| enc/cp1251.c CP1251 encoding. |
| enc/iso8859_1.c ISO-8859-1 encoding. (Latin-1) |
| enc/iso8859_2.c ISO-8859-2 encoding. (Latin-2) |
| enc/iso8859_3.c ISO-8859-3 encoding. (Latin-3) |
| enc/iso8859_4.c ISO-8859-4 encoding. (Latin-4) |
| enc/iso8859_5.c ISO-8859-5 encoding. (Cyrillic) |
| enc/iso8859_6.c ISO-8859-6 encoding. (Arabic) |
| enc/iso8859_7.c ISO-8859-7 encoding. (Greek) |
| enc/iso8859_8.c ISO-8859-8 encoding. (Hebrew) |
| enc/iso8859_9.c ISO-8859-9 encoding. (Latin-5 or Turkish) |
| enc/iso8859_10.c ISO-8859-10 encoding. (Latin-6 or Nordic) |
| enc/iso8859_11.c ISO-8859-11 encoding. (Thai) |
| enc/iso8859_13.c ISO-8859-13 encoding. (Latin-7 or Baltic Rim) |
| enc/iso8859_14.c ISO-8859-14 encoding. (Latin-8 or Celtic) |
| enc/iso8859_15.c ISO-8859-15 encoding. (Latin-9 or West European with Euro) |
| enc/iso8859_16.c ISO-8859-16 encoding. |
| (Latin-10 or South-Eastern European with Euro) |
| enc/utf8.c UTF-8 encoding. |
| enc/utf16_be.c UTF-16BE encoding. |
| enc/utf16_le.c UTF-16LE encoding. |
| enc/utf32_be.c UTF-32BE encoding. |
| enc/utf32_le.c UTF-32LE encoding. |
| enc/unicode.c Unicode information data. |
| |
| win32/Makefile Makefile for Win32 (VC++) |
| win32/config.h config.h for Win32 |
| |
| |
| |
| ToDo |
| |
| ? case fold flag: Katakana <-> Hiragana. |
| ? add ONIG_OPTION_NOTBOS/NOTEOS. (\A, \z, \Z) |
| ?? \X (== \PM\pM*) |
| ?? implement syntax behavior ONIG_SYN_CONTEXT_INDEP_ANCHORS. |
| ?? transmission stopper. (return ONIG_STOP from match_at()) |
| |
| and I'm thankful to Akinori MUSHA. |
| |
| |
| Mail Address: K.Kosako <sndgk393 AT ybb DOT ne DOT jp> |