tldr

package module
v0.5.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 19, 2018 License: MIT Imports: 7 Imported by: 0

README

tldr

When you are too lazy to read the entire text

Build Status Coverage Status GoDoc

What?

tldr is a golang package to summarize a text automatically using lexrank algorithm.

How?

There are two main steps in lexrank, weighing, and ranking. tldr have two weighing and two ranking algorithm included, they are Jaccard coeficient and Hamming distance, then PageRank and centrality, respectively. The default settings use Hamming distance and pagerank.

Is This Fast?

Test it yourself, my system is i3-3217@1.8GHz with single channel 4GB RAM using Ubuntu 15.10 with kernel 4.5.0

$ go test -bench . -benchmem -benchtime 5s -cpu 4
BenchmarkSummarizeCentralityHamming-4	    2000	   6429340 ns/op	  401204 B/op	    3551 allocs/op
BenchmarkSummarizeCentralityJaccard-4	     200	  30036357 ns/op	 3449461 B/op	   12543 allocs/op
BenchmarkSummarizePagerankHamming-4  	    1000	   7015008 ns/op	  420665 B/op	    3731 allocs/op
BenchmarkSummarizePagerankJaccard-4  	     200	  31066764 ns/op	 3469629 B/op	   12737 allocs/op

So, not bad huh?

Installation

go get github.com/JesusIslam/tldr

Example
package main

import (
	"fmt"
	"io/ioutil"
	"github.com/JesusIslam/tldr"
)

func main() {
	intoSentences := 3
	textB, _ := ioutil.ReadFile("./sample.txt")
	text := string(textB)
	bag := tldr.New()
	result, _ := bag.Summarize(text, intoSentences)
	fmt.Println(result)
}
Testing

To test, just run go test, but you need to have gomega and ginkgo installed.

Dependencies?

tldr depends on pagerank package, and you can install it with go get github.com/alixaxel/pagerank.

License?

Check the LICENSE file. tldr: MIT.

Have fun!

Documentation

Overview

Dependencies:

go get github.com/alixaxel/pagerank

WARNING: This package is not thread safe, so you cannot use *Bag from many goroutines.

Index

Constants

View Source
const (
	VERSION                              = "0.5.0"
	DEFAULT_ALGORITHM                    = "pagerank"
	DEFAULT_WEIGHING                     = "hamming"
	DEFAULT_DAMPING                      = 0.85
	DEFAULT_TOLERANCE                    = 0.0001
	DEFAULT_THRESHOLD                    = 0.001
	DEFAULT_MAX_CHARACTERS               = 0
	DEFAULT_SENTENCES_DISTANCE_THRESHOLD = 0.95
)

The default values of each settings

Variables

This section is empty.

Functions

func Intersection

func Intersection(src, dst []int) []int

func ReverseEdge added in v0.4.1

func ReverseEdge(num []*Edge)

func ReverseRank added in v0.4.1

func ReverseRank(num []*Rank)

func SanitizeWord

func SanitizeWord(word string) string

func SymmetricDifference

func SymmetricDifference(src, dst []int) []int

func TokenizeSentences added in v0.4.1

func TokenizeSentences(text string) []string

func UniqSentences

func UniqSentences(sentences [][]string, sentenceDistanceThreshold float64)

Types

type Bag

type Bag struct {
	BagOfWordsPerSentence [][]string
	OriginalSentences     []string
	Dict                  map[string]int
	Nodes                 []*Node
	Edges                 []*Edge
	Ranks                 []int

	MaxCharacters              int
	Algorithm                  string // "centrality" or "pagerank" or "custom"
	Weighing                   string // "hamming" or "jaccard" or "custom"
	Damping                    float64
	Tolerance                  float64
	Threshold                  float64
	SentencesDistanceThreshold float64
	// contains filtered or unexported fields
}

func New

func New() *Bag

Create new summarizer

func (*Bag) Set added in v0.4.1

func (bag *Bag) Set(m int, d, t, th, sth float64, alg, w string)

Set max characters, damping, tolerance, threshold, sentences distance threshold, algorithm, and weighing

func (*Bag) SetCustomAlgorithm added in v0.4.4

func (bag *Bag) SetCustomAlgorithm(f func(e []*Edge) []int)

func (*Bag) SetCustomWeighing added in v0.4.4

func (bag *Bag) SetCustomWeighing(f func(src, dst []int) float64)

func (*Bag) SetDictionary added in v0.4.4

func (bag *Bag) SetDictionary(dict map[string]int)

Useful if you already have your own dictionary (example: from your database) Dictionary is a map[string]int where the key is the word and int is the position in vector, starting from 1

func (*Bag) SetWordTokenizer added in v0.5.0

func (bag *Bag) SetWordTokenizer(f func(string) []string)

func (*Bag) Summarize

func (bag *Bag) Summarize(text string, num int) (string, error)

Summarize the text to num sentences

type ByScore added in v0.4.4

type ByScore []*Rank

func (ByScore) Len added in v0.4.4

func (b ByScore) Len() int

func (ByScore) Less added in v0.4.4

func (b ByScore) Less(i, j int) bool

func (ByScore) Swap added in v0.4.4

func (b ByScore) Swap(i, j int)

type ByWeight added in v0.4.4

type ByWeight []*Edge

func (ByWeight) Len added in v0.4.4

func (b ByWeight) Len() int

func (ByWeight) Less added in v0.4.4

func (b ByWeight) Less(i, j int) bool

func (ByWeight) Swap added in v0.4.4

func (b ByWeight) Swap(i, j int)

type Edge

type Edge struct {
	// contains filtered or unexported fields
}

type Node

type Node struct {
	// contains filtered or unexported fields
}

type Rank

type Rank struct {
	// contains filtered or unexported fields
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL