commit | 3882135777a95d993a04a59a9328ab179dfb2707 | [log] [tgz] |
---|---|---|
author | Scott Graham <scottmg@chromium.org> | Wed Nov 30 19:39:48 2016 |
committer | Scott Graham <scottmg@chromium.org> | Wed Nov 30 19:39:48 2016 |
tree | 5b3397fbdd49f372e914eda354fd11fa78f13350 | |
parent | 9012c0ab648025dd0f8df14294bf5d6d73793ac9 [diff] |
Encode compact_enc_det.cc as utf-8
Compact Encoding Detection(CED for short) is a library written in C++ that scans given raw bytes and detect the most likely text encoding.
Basic usage:
#include "compact_enc_det/compact_enc_det.h" const char* text = "Input text"; bool is_reliable; int bytes_consumed; Encoding encoding = CompactEncDet::DetectEncoding( text, strlen(text), nullptr, nullptr, nullptr, UNKNOWN_ENCODING, UNKNOWN_LANGUAGE, CompactEncDet::WEB_CORPUS, false, &bytes_consumed, &is_reliable);
You need CMake to build the package. After unzipping the source code , run autogen.sh
to build everything automatically. The script also downloads Google Test framework needed to build the unittest.
$ cd compact_enc_det $ ./autogen.sh ... $ bin/ced_unittest
On Windows, run cmake .
to download the test framework, and generate project files for Visual Studio.
D:\packages\compact_enc_det> cmake .