Segment strings by lines, graphemes, words, and sentences.
This module is published as its own crate (icu_segmenter) and as part of the icu crate. See the latter for more details on the ICU4X project.
This module contains segmenter implementation for the following rules.
line-break and word-break properties.Find line break opportunities:
use icu::segmenter::LineSegmenter; let segmenter = LineSegmenter::new_auto(Default::default()); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!(&breakpoints, &[0, 6, 13, 17, 23, 29, 36]);
See [LineSegmenter] for more examples.
Find all grapheme cluster boundaries:
use icu::segmenter::GraphemeClusterSegmenter; let segmenter = GraphemeClusterSegmenter::new(); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!( &breakpoints, &[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 28, 29, 30, 31, 34, 35, 36 ] );
See [GraphemeClusterSegmenter] for more examples.
Find all word boundaries:
use icu::segmenter::{options::WordBreakInvariantOptions, WordSegmenter}; let segmenter = WordSegmenter::new_auto(WordBreakInvariantOptions::default()); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!( &breakpoints, &[0, 5, 6, 11, 12, 13, 16, 17, 22, 23, 28, 29, 35, 36] );
See [WordSegmenter] for more examples.
Segment the string into sentences:
use icu::segmenter::{ options::SentenceBreakInvariantOptions, SentenceSegmenter, }; let segmenter = SentenceSegmenter::new(SentenceBreakInvariantOptions::default()); let breakpoints: Vec<usize> = segmenter .segment_str("Hello World. Xin chào thế giới!") .collect(); assert_eq!(&breakpoints, &[0, 13, 36]);
See [SentenceSegmenter] for more examples.
For more information on development, authorship, contributing etc. please visit ICU4X home page.