commit | e9e7e81a1abf75dfbe5c729f55ff58608c47b38a | [log] [tgz] |
---|---|---|
author | mirabilos <t.glaser@tarent.de> | Sun Jul 28 18:30:56 2019 |
committer | mirabilos <mirabilos@evolvis.org> | Mon Jul 29 14:55:09 2019 |
tree | 170f1d6fa473e158c5e579e2f23d8bae1bbb5ff3 | |
parent | 1193457d7276cd49968f46e98fccc78a27881937 [diff] |
fix memrchr detection this function is defined not on Linux but on GNU, that is, systems with glibc 2.2 or higher; also, use an intermediate HAVE_MEMRCHR symbol that people with alternative C libraries can define to indicate its presence
Compact Encoding Detection(CED for short) is a library written in C++ that scans given raw bytes and detect the most likely text encoding.
Basic usage:
#include "compact_enc_det/compact_enc_det.h" const char* text = "Input text"; bool is_reliable; int bytes_consumed; Encoding encoding = CompactEncDet::DetectEncoding( text, strlen(text), nullptr, nullptr, nullptr, UNKNOWN_ENCODING, UNKNOWN_LANGUAGE, CompactEncDet::WEB_CORPUS, false, &bytes_consumed, &is_reliable);
You need CMake to build the package. After unzipping the source code , run autogen.sh
to build everything automatically. The script also downloads Google Test framework needed to build the unittest.
$ cd compact_enc_det $ ./autogen.sh ... $ bin/ced_unittest
On Windows, run cmake .
to download the test framework, and generate project files for Visual Studio.
D:\packages\compact_enc_det> cmake .