tokenize

package module
Version: v0.0.0-...-b5add9b Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 8, 2016 License: MIT Imports: 0 Imported by: 0

README

Tokenize

Takes any string as text, tokenization characters as runes rune, and returns results as a slice of string tokens []string. Where each item in the result set are the tokenized words followed by the runes tokenized on, in order.

Example

Print a set of all substrings tokenized by the following punctuation characters ['.', '!', '?', ',', ' '].

func main() {
	str := "Lorem ipsum dolor sit amet! consectetur adipiscing elit. Nunc viverra, quam sit amet varius accumsan, augue mi viverra lacus, sed hendrerit justo magna eu augue. Aliquam in pretium justo. Nulla pulvinar tempus tempus. Nulla luctus lacus sed gravida congue. Aliquam a est magna. Nullam condimentum dui ut tortor placerat accumsan. Nullam eu ligula ante. Quisque finibus est eu lorem gravida, sit amet hendrerit metus pellentesque. Fusce vitae arcu sem."

	var punctuation = []rune{'.', '!', '?', ',', ' '}

	words := tokenize.Create(str, punctuation)
	for _, s := range words {
		fmt.Println(s)
	}
}

The above returns all words by splitting the text on all punctuation and spaces.

License

This project is released under the terms of the MIT LICENSE.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Create

func Create(text string, tokenizeon []rune) []string

Create takes any text as string, tokenization runes, and returns a slice of string tokens, where each item in the result set are the tokenized words followed by the runes to tokenize on in order.

func RuneIndexOf

func RuneIndexOf(r []rune, el rune) int

RuneIndexOf returns the index of a rune in a slice of runes or -1 if it doesn't exist

Types

This section is empty.

Source Files

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
t or T : Toggle theme light dark auto
y or Y : Canonical URL