mahonia

package
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2018 License: Apache-2.0 Imports: 8 Imported by: 0

README

mahonia

character-set conversion library implemented in Go.

Mahonia is a character-set conversion library implemented in Go. All data is compiled into the executable; it doesn't need any external data files.

Copy from http://code.google.com/p/mahonia/

install

go get github.com/henrylee2cn/mahonia

example

  package main
  import "fmt"
  import "github.com/henrylee2cn/mahonia"
  func main(){
    enc:=mahonia.NewEncoder("gbk")
    //converts a  string from UTF-8 to gbk encoding.
    fmt.Println(enc.ConvertString("hello,世界"))  
  }

Documentation

Overview

This package is a character-set conversion library for Go.

(DEPRECATED: use code.google.com/p/go.text/encoding, perhaps along with code.google.com/p/go.net/html/charset)

Index

Constants

View Source
const (
	// SUCCESS means that the character was converted with no problems.
	SUCCESS = Status(iota)

	// INVALID_CHAR means that the source contained invalid bytes, or that the character
	// could not be represented in the destination encoding.
	// The Encoder or Decoder should have output a substitute character.
	INVALID_CHAR

	// NO_ROOM means there were not enough input bytes to form a complete character,
	// or there was not enough room in the output buffer to write a complete character.
	// No bytes were written, and no internal state was changed in the Encoder or Decoder.
	NO_ROOM

	// STATE_ONLY means that bytes were read or written indicating a state transition,
	// but no actual character was processed. (Examples: byte order marks, ISO-2022 escape sequences)
	STATE_ONLY
)

Variables

This section is empty.

Functions

func RegisterCharset

func RegisterCharset(cs *Charset)

RegisterCharset adds a charset to the charsetMap.

Types

type Charset

type Charset struct {
	// Name is the character set's canonical name.
	Name string

	// Aliases returns a list of alternate names.
	Aliases []string

	// NewDecoder returns a Decoder to convert from the charset to Unicode.
	NewDecoder func() Decoder

	// NewEncoder returns an Encoder to convert from Unicode to the charset.
	NewEncoder func() Encoder
}

A Charset represents a character set that can be converted, and contains functions to create Converters to encode and decode strings in that character set.

func GetCharset

func GetCharset(name string) *Charset

GetCharset fetches a charset by name. If the name is not found, it returns nil.

type Decoder

type Decoder func(p []byte) (c rune, size int, status Status)

A Decoder is a function that decodes a character set, one character at a time. It works much like utf8.DecodeRune, but has an aditional status return value.

func EntityDecoder

func EntityDecoder() Decoder

EntityDecoder returns a Decoder that decodes HTML character entities. If there is no valid character entity at the current position, it returns INVALID_CHAR. So it needs to be combined with another Decoder via FallbackDecoder.

func FallbackDecoder

func FallbackDecoder(decoders ...Decoder) Decoder

FallbackDecoder combines a series of Decoders into one. If the first Decoder returns a status of INVALID_CHAR, the others are tried as well.

Note: if the text to be decoded ends with a sequence of bytes that is not a valid character in the first charset, but it could be the beginning of a valid character, the FallbackDecoder will give a status of NO_ROOM instead of falling back to the other Decoders.

func NewDecoder

func NewDecoder(name string) Decoder

NewDecoder returns a Decoder to decode the named charset. If the name is not found, it returns nil.

func (Decoder) ConvertString

func (d Decoder) ConvertString(s string) string

ConvertString converts a string from d's encoding to UTF-8.

func (Decoder) ConvertStringOK

func (d Decoder) ConvertStringOK(s string) (result string, ok bool)

ConvertStringOK converts a string from d's encoding to UTF-8. It also returns a boolean indicating whether every character was converted successfully.

func (Decoder) NewReader

func (d Decoder) NewReader(rd io.Reader) *Reader

NewReader creates a new Reader that uses the receiver to decode text.

func (Decoder) Translate

func (d Decoder) Translate(data []byte, eof bool) (n int, cdata []byte, err error)

Translate enables a Decoder to implement go-charset's Translator interface.

type Encoder

type Encoder func(p []byte, c rune) (size int, status Status)

An Encoder is a function that encodes a character set, one character at a time. It works much like utf8.EncodeRune, but has an additional status return value.

func NewEncoder

func NewEncoder(name string) Encoder

NewEncoder returns an Encoder to encode the named charset.

func (Encoder) ConvertString

func (e Encoder) ConvertString(s string) string

ConvertString converts a string from UTF-8 to e's encoding.

func (Encoder) ConvertStringOK

func (e Encoder) ConvertStringOK(s string) (result string, ok bool)

ConvertStringOK converts a string from UTF-8 to e's encoding. It also returns a boolean indicating whether every character was converted successfully.

func (Encoder) NewWriter

func (e Encoder) NewWriter(wr io.Writer) *Writer

NewWriter creates a new Writer that uses the receiver to encode text.

type MBCSTable

type MBCSTable struct {
	// contains filtered or unexported fields
}

A MBCSTable holds the data to convert to and from Unicode.

func (*MBCSTable) AddCharacter

func (table *MBCSTable) AddCharacter(c rune, bytes string)

AddCharacter adds a character to the table. rune is its Unicode code point, and bytes contains the bytes used to encode it in the character set.

func (*MBCSTable) Decoder

func (table *MBCSTable) Decoder() Decoder

func (*MBCSTable) Encoder

func (table *MBCSTable) Encoder() Encoder

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader implements character-set decoding for an io.Reader object.

func (*Reader) Read

func (b *Reader) Read(p []byte) (n int, err error)

Read reads data into p. It returns the number of bytes read into p. It calls Read at most once on the underlying Reader, hence n may be less than len(p). At EOF, the count will be zero and err will be os.EOF.

func (*Reader) ReadRune

func (b *Reader) ReadRune() (c rune, size int, err error)

ReadRune reads a single Unicode character and returns the rune and its size in bytes.

type Status

type Status int

Status is the type for the status return value from a Decoder or Encoder.

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer implements character-set encoding for an io.Writer object.

func (*Writer) Write

func (w *Writer) Write(p []byte) (n int, err error)

Write encodes and writes the data from p.

func (*Writer) WriteRune

func (w *Writer) WriteRune(c rune) (size int, err error)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL