kowalski

package module
v5.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 14, 2022 License: MIT Imports: 22 Imported by: 1

README

Kowalski

Kowalski, analysis

Kowalski is a Go library for performing various operations to help solve puzzles, and an accompanying Discord bot.

Supported functions

Wildcard matching

AKA Crossword Solving. Given an input with one or more missing letters (represented by ? characters), returns a list of dictionary words that match.

Anagram solving

Given a set of letters, possibly including ? wildcards, checks all possible anagrams and returns a list of dictionary words that match.

Morse decoding

Given a Morse-encoded word (represented with - and . characters) without spaces, finds all valid dictionary words that could match.

Image processing

Various utilities to analyse images, find hidden parts, etc.

Library usage

SpellChecker

Most functions that involve English text use the SpellChecker struct, which can indicate whether a word is a valid dictionary word or not. To create a new SpellChecker you must provide it with an io.reader where it can read words line-by-line, and a rough estimate of the number of words it will find:

package example

import (
  "os"

  "github.com/csmith/kowalski/v5"
)

func create() {
  f, _ := os.Open("file.txt")
  defer f.Close()

  checker, err := kowalski.CreateSpellChecker(f, 100)
}

As creating a spellchecker can be expensive and cumbersome, Kowalski supports serialising data to disk and loading it back. Some examples of these serialised models are available in the models directory. To load a model:

package example

import (
  "os"

  "github.com/csmith/kowalski/v5"
)

func create() {
  f, _ := os.Open("file.wl")
  defer f.Close()

  checker, err := kowalski.LoadSpellChecker(f)
}

This repository also contains a command-line tool to generate a new SpellChecker and export the serialised model:

go run cmd/compile -in wordlist.txt -out model.wl
Vellum

The fst package contains automata for use with the Vellum finite state transducer. This allows for quick matching of anagrams, regular expressions, etc, against a pre-prepared transducer.

To use these you need to open a transducer using Vellum, create an iterator, and then iterate over it:

package example

import (
  "github.com/blevesearch/vellum"
  "github.com/csmith/kowalski/v5/fst"
)

func create() {
  transducer, err := vellum.Open("some_file.fst")
  if err != nil {
    panic(err)
  }

  i, err := transducer.Search(fst.NewMorseAutomaton("... --- .-.. .. -.-."), nil, nil)
  if err != nil {
    panic(err)
  }

  for err == nil {
    key, val := i.Current()
    println("Do something here with ", key, val)
    err = i.Next()
  }
}

Discord bot

This repository also contains a Discord bot that allows users to perform analysis.

It currently supports these commands:

!anagram Attempts to find single-word anagrams, expanding '*' and '?' wildcards
!analysis Analyses text and provides a summary of potentially interesting findings [Aliases: !analyze, !analyse]
!chunk Splits the text into chunks of a given size
!colours Counts the colours within the image [Aliases: !colors]
!hidden Finds hidden pixels in images [Aliases: !hiddenpixels]
!letters Shows a frequency histogram of the number of letters in the input
!match Attempts to expand '?' wildcards to find a single-word match
!morse Attempts to split a morse code input to spell a single word
!multigram Attempts to find multi-word anagrams, expanding '?' wildcards [Aliases: !multianagram]
!multimatch Attempts to expand '?' wildcards to find multi-word matches
!obo Finds all words that are one character different from the input [Aliases: !offbyone, !ob1]
!rgb Splits an image into its red, green and blue channels
!shift Shows the result of the 25 possible caesar shifts [Aliases: !caesar]
!t9 Attempts to treat a series of numbers as T9 input to spell a single word
!transpose Transposes columns to rows and rows to columns
!wordsearch Searches for words in the given text grid
!help Shows this help text
!fstanagram Attempts to find anagrams from wikipedia, expanding '*' wildcards [Aliases: !fstagram]
!fstregex Attempts to find word matches from wikipedia using regexp [Aliases: !fstre]
!fstmorse Attempts to find word matches from wikipedia using morse

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Anagram

func Anagram(ctx context.Context, checker *SpellChecker, word string) ([]string, error)

Anagram finds all single-word anagrams of the given word, expanding '?' as a single wildcard character

func Analyse

func Analyse(checker *SpellChecker, input string) []string

Analyse performs various forms of text analysis on the input and returns findings.

func Chunk

func Chunk(input string, parts ...int) []string

Chunk takes the input, and splits it up into chunks of the given length. If the input is longer than the list of part lengths, the lengths will be repeated.

func Dedupe

func Dedupe(options *multiplexOptions)

Dedupe removes duplicate entries from multiplexed results. That is, if the first checker provides words A, B and C, the second checker provides B and D, and the third A, D, and E, then the result will be: {A,B,C},{D},{E}.

func FindWords

func FindWords(checker *SpellChecker, input string) []string

FindWords attempts to find substrings of the input that are valid words according to the checker. Duplicates may be present in the output if they occur at multiple positions.

func FromMorse

func FromMorse(checker *SpellChecker, input string) []string

FromMorse takes a sequence of morse signals (as ASCII dots and hyphens) and returns a set of possible words that could be constructed from them.

func FromT9

func FromT9(checker *SpellChecker, input string) []string

FromT9 takes an input that represents a sequence of key presses on a T9 keyboard and returns possible words that match. The input should not contain spaces (the "0" digit) - words should be solved independently, to avoid an explosion of possible results.

func HiddenPixels

func HiddenPixels(reader io.Reader) (io.Reader, error)

HiddenPixels attempts to find patterns of hidden pixels (those that are consistently used near very similar colours).

func Match

func Match(ctx context.Context, checker *SpellChecker, pattern string) ([]string, error)

Match returns all valid words that match the given pattern, expanding '?' as a single character wildcard

func MultiAnagram

func MultiAnagram(ctx context.Context, checker *SpellChecker, word string) ([]string, error)

MultiAnagram finds all single- and multi-word anagrams of the given word, expanding '?' as a single wildcard character. To avoid duplicates, words are sorted lexicographically (i.e., "a ball" will be returned and "ball a" will not).

func MultiMatch

func MultiMatch(ctx context.Context, checker *SpellChecker, pattern string) ([]string, error)

MultiMatch returns valid sequences of words that match the given pattern, expanding '?' as a single character wildcard. To reduce the search space, multi-match will first try to look for matches consisting only of longer words, then gradually reduce that threshold until at least one match is found.

func MultiplexAnagram

func MultiplexAnagram(ctx context.Context, checkers []*SpellChecker, pattern string, opts ...MultiplexOption) ([][]string, error)

MultiplexAnagram performs the Anagram operation over a number of different checkers.

func MultiplexFindWords

func MultiplexFindWords(checkers []*SpellChecker, pattern string, opts ...MultiplexOption) [][]string

MultiplexFindWords performs the FindWords operation over a number of different checkers.

func MultiplexFromMorse

func MultiplexFromMorse(checkers []*SpellChecker, pattern string, opts ...MultiplexOption) [][]string

MultiplexFromMorse performs the FromMorse operation over a number of different checkers.

func MultiplexFromT9

func MultiplexFromT9(checkers []*SpellChecker, pattern string, opts ...MultiplexOption) [][]string

MultiplexFromT9 performs the FromT9 operation over a number of different checkers.

func MultiplexMatch

func MultiplexMatch(ctx context.Context, checkers []*SpellChecker, pattern string, opts ...MultiplexOption) ([][]string, error)

MultiplexMatch performs the Match operation over a number of different checkers.

func MultiplexMultiAnagram

func MultiplexMultiAnagram(ctx context.Context, checkers []*SpellChecker, pattern string, opts ...MultiplexOption) ([][]string, error)

MultiplexMultiAnagram performs the MultiAnagram operation over a number of different checkers.

func MultiplexMultiMatch

func MultiplexMultiMatch(ctx context.Context, checkers []*SpellChecker, pattern string, opts ...MultiplexOption) ([][]string, error)

MultiplexMultiMatch performs the MultiMatch operation over a number of different checkers.

func MultiplexOffByOne

func MultiplexOffByOne(ctx context.Context, checkers []*SpellChecker, pattern string, opts ...MultiplexOption) ([][]string, error)

MultiplexOffByOne performs the OffByOne operation over a number of different checkers.

func MultiplexWordSearch

func MultiplexWordSearch(checkers []*SpellChecker, pattern []string, opts ...MultiplexOption) [][]string

MultiplexWordSearch performs the WordSearch operation over a number of different checkers.

func OffByOne

func OffByOne(ctx context.Context, checker *SpellChecker, input string) ([]string, error)

OffByOne returns all words that can be made by performing one character change on the input. The input is assumed to be a single, lowercase word containing a-z chars only.

func SaveSpellChecker

func SaveSpellChecker(writer io.Writer, checker *SpellChecker) error

SaveSpellChecker serialises the given checker and writes it to the writer. It can later be restored with LoadSpellChecker.

func Score

func Score(checker *SpellChecker, input string) float64

Score assigns a score to an input showing how likely it is to be English text. A score of 1.0 means almost certainly English, a score of 0.0 means almost certainly not. This is fairly arbitrary and is not very good.

func SplitRGB

func SplitRGB(reader io.Reader) (r, g, b io.Reader, err error)

SplitRGB reads an image, and returns three readers containing PNG encoded images containing just the red, green and blue channels respectively. If the input image is excessively small, the outputs will be scaled up.

func Transpose

func Transpose(input []string) []string

Transpose rotates the input text so rows become columns, and columns become rows.

func WordSearch

func WordSearch(checker *SpellChecker, input []string) []string

WordSearch returns all words found by FindWords in the input word search grid. Words may occur horizontally, vertically or diagonally, and may read in either direction. If a word is found multiple times in different places it will be returned multiple times.

Types

type ColourCount

type ColourCount struct {
	Colour color.Color
	Count  int
}

func ExtractColours

func ExtractColours(reader io.Reader) ([]ColourCount, error)

ExtractColours returns a sorted slice containing each individual colour used in the image, and the total number of pixels that have that colour. Colours are sorted from most-used to least-used.

type MultiplexOption

type MultiplexOption func(*multiplexOptions)

type SpellChecker

type SpellChecker struct {
	// contains filtered or unexported fields
}

SpellChecker provides a way to tell whether a word exists in a dictionary.

func CreateSpellChecker

func CreateSpellChecker(reader io.Reader, wordCount int) (*SpellChecker, error)

CreateSpellChecker creates a new SpellChecker by reading words line-by-line from the given reader. The wordCount parameter should be an approximation of the number of words available.

This is likely to be a relatively expensive operation; for routine use prefer saving the spell checker via SaveSpellChecker and restoring it with LoadSpellChecker.

func LoadSpellChecker

func LoadSpellChecker(reader io.Reader) (*SpellChecker, error)

LoadSpellChecker attempts to load a SpellChecker that was previously saved with SaveSpellChecker.

func (*SpellChecker) Prefix

func (c *SpellChecker) Prefix(prefix string) bool

Prefix determines - probabilistically - whether the given string is a prefix of any known word. There is a small chance of false positives, i.e. an input that is not a prefix to any word in the word list might be incorrectly identified as valid; there is no chance of false negatives.

func (*SpellChecker) Valid

func (c *SpellChecker) Valid(word string) bool

Valid determines - probabilistically - whether the given word was in the word list used to create this SpellChecker. There is a small chance of false positives, i.e. a word that wasn't in the word list might be incorrectly identified as valid; there is no chance of false negatives.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL