utf8reader

package module
v0.1.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 26, 2024 License: MIT Imports: 6 Imported by: 1

README

utf8reader

A simple go package that converts an io.Reader to a utf8 encoded io.Reader. It automatically detects the encoding of the input and converts it to utf8.

Usage


package main

import (
    "fmt"
    "bytes"

    "github.com/kpym/utf8reader"
)

func main() {
    // Create a reader with koi8-r encoded "Това е на български"
    r := bytes.NewReader([]byte{0xF4, 0xCF, 0xD7, 0xC1, 0x20, 0xC5, 0x20, 0xCE, 0xC1, 0x20, 0xC2, 0xDF, 0xCC, 0xC7, 0xC1, 0xD2, 0xD3, 0xCB, 0xC9})
    reader := utf8reader.New(r)

    // Read the content of the reader
    buf := make([]byte, 100)
    n, err := reader.Read(buf)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Println(string(buf[:n]))
    // Output: Това е на български
}

Documentation

Go Reference

License

MIT

Documentation

Overview

utf8reader is a package that detects the encoding of a reader and provides a new reader that converts the input to UTF-8. The unicode normalization form can be set to NFC or NFD.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func WithNormalizationForm

func WithNormalizationForm(nor string) option

WithNormalizationForm sets the normalization form. The normalization form can be "NFC" or "NFD". By default no normalization is done.

func WithPeakSize

func WithPeakSize(size int) option

WithPeakSize sets the number of bytes to peak. By default it peaks 4096 bytes. The peaked bytes are used to detect the encoding.

Types

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader is a reader that converts the input to UTF-8.

func New

func New(r io.Reader, options ...option) *Reader

New returns a reader that converts the input to UTF-8 if it is not already encoded in UTF-8. If the encoding cannot be detected it returns buffered version of the original reader.

func (*Reader) Encoding

func (r *Reader) Encoding() string

Encoding returns the detected encoding.

func (*Reader) Peak

func (r *Reader) Peak() ([]byte, error)

Peak returns the first bytes of the reader transformed to UTF-8. This function should be called before any Read operation.

func (*Reader) Read

func (r *Reader) Read(p []byte) (n int, err error)

Read reads the UTF-8 encoded bytes from the reader.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL