commit | e57cdc44bd541d10669312a6fdc59fc4bf52d2b9 | [log] [tgz] |
---|---|---|
author | Jinsuk Kim <jinsukkim@google.com> | Wed Nov 30 22:36:31 2016 |
committer | GitHub <noreply@github.com> | Wed Nov 30 22:36:31 2016 |
tree | 5b3397fbdd49f372e914eda354fd11fa78f13350 | |
parent | 9012c0ab648025dd0f8df14294bf5d6d73793ac9 [diff] | |
parent | 3882135777a95d993a04a59a9328ab179dfb2707 [diff] |
Merge pull request #3 from sgraham/master Encode compact_enc_det.cc as utf-8
Compact Encoding Detection(CED for short) is a library written in C++ that scans given raw bytes and detect the most likely text encoding.
Basic usage:
#include "compact_enc_det/compact_enc_det.h" const char* text = "Input text"; bool is_reliable; int bytes_consumed; Encoding encoding = CompactEncDet::DetectEncoding( text, strlen(text), nullptr, nullptr, nullptr, UNKNOWN_ENCODING, UNKNOWN_LANGUAGE, CompactEncDet::WEB_CORPUS, false, &bytes_consumed, &is_reliable);
You need CMake to build the package. After unzipping the source code , run autogen.sh
to build everything automatically. The script also downloads Google Test framework needed to build the unittest.
$ cd compact_enc_det $ ./autogen.sh ... $ bin/ced_unittest
On Windows, run cmake .
to download the test framework, and generate project files for Visual Studio.
D:\packages\compact_enc_det> cmake .