Documentation
¶
Overview ¶
Package graphemes implements Unicode grapheme cluster boundaries: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
Index ¶
- func NewScanner(r io.Reader) *iterators.Scanner
- func NewSegmenter(data []byte) *iterators.Segmenter
- func NewStringSegmenter(data string) *iterators.StringSegmenter
- func SegmentAll(data []byte) [][]byte
- func SegmentAllString(data string) []string
- func SplitFunc(data []byte, atEOF bool) (advance int, token []byte, err error)
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func NewScanner ¶ added in v1.0.4
NewScanner returns a Scanner, to tokenize graphemes per https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries. Iterate through graphemes by calling Scan() until false, then check Err(). See also the bufio.Scanner docs.
Example ¶
package main
import (
"fmt"
"log"
"strings"
"github.com/clipperhouse/uax29/graphemes"
)
func main() {
text := "Hello, 世界. Nice dog! 👍🐶"
reader := strings.NewReader(text)
scanner := graphemes.NewScanner(reader)
// Scan returns true until error or EOF
for scanner.Scan() {
fmt.Printf("%q\n", scanner.Text())
}
// Gotta check the error!
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
Output: "H" "e" "l" "l" "o" "," " " "世" "界" "." " " "N" "i" "c" "e" " " "d" "o" "g" "!" " " "👍" "🐶"
func NewSegmenter ¶ added in v1.7.0
NewSegmenter returns a Segmenter, which is an iterator over the source text. Iterate while Next() is true, and access the grapheme via Bytes().
Example ¶
package main
import (
"fmt"
"log"
"github.com/clipperhouse/uax29/graphemes"
)
func main() {
text := []byte("Hello, 世界. Nice dog! 👍🐶")
segments := graphemes.NewSegmenter(text)
// Next() returns true until end of data or error
for segments.Next() {
fmt.Printf("%q\n", segments.Bytes())
}
// Should check the error
if err := segments.Err(); err != nil {
log.Fatal(err)
}
}
Output: "H" "e" "l" "l" "o" "," " " "世" "界" "." " " "N" "i" "c" "e" " " "d" "o" "g" "!" " " "👍" "🐶"
func NewStringSegmenter ¶ added in v1.16.0
func NewStringSegmenter(data string) *iterators.StringSegmenter
NewStringSegmenter returns a StringSegmenter, which is an iterator over the source text. Iterate while Next() is true, and access the grapheme via Text().
func SegmentAll ¶ added in v1.7.0
SegmentAll will iterate through all graphemes and collect them into a [][]byte. This is a convenience method -- if you will be allocating such a slice anyway, this will save you some code.
The downside is that this allocation is unbounded -- O(n) on the number of graphemes. Use Segmenter for more bounded memory usage.
Example ¶
package main
import (
"fmt"
"github.com/clipperhouse/uax29/graphemes"
)
func main() {
text := []byte("Hello, 世界. Nice dog! 👍🐶")
segments := graphemes.SegmentAll(text)
fmt.Printf("%q\n", segments)
}
Output: ["H" "e" "l" "l" "o" "," " " "世" "界" "." " " "N" "i" "c" "e" " " "d" "o" "g" "!" " " "👍" "🐶"]
func SegmentAllString ¶ added in v1.16.0
SegmentAllString will iterate through all graphemes and collect them into a []string. This is a convenience method -- if you will be allocating such a slice anyway, this will save you some code.
The downside is that this allocation is unbounded -- O(n) on the number of graphemes. Use StringSegmenter for more bounded memory usage.
Types ¶
This section is empty.