gocd

package module
v0.1.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2022 License: MIT Imports: 13 Imported by: 0

README

gocd

gocd is a Go library for matching and parsing company designators (like Limited, LLC, Incorporée) attached to company names.

It uses (and bundles) Profound Network's company designator dataset maintained here:

https://github.com/ProfoundNetworks/company_designator

Usage

    go get github.com/ProfoundNetworks/gocd
    // Instantiate a parser
    parser, err := gocd.New()
    if err != nil {
            log.Fatal(err)
    }

    // Parse a company name string
    res := parser.Parse("Profound Networks LLC")

    // Report parse results
    fmt.Println(res.Input)      // Profound Networks LLC
    fmt.Println(res.Matched)    // true
    fmt.Println(res.ShortName)  // Profound Networks
    fmt.Println(res.Designator) // LLC
    fmt.Println(res.Position)   // end

If no designators are found, res.Matched will be false, res.ShortName will equal res.Input, and res.Position will be "none".

Status

gocd is alpha software. Interfaces may break and change until an official version 1.0.0 is released. gocd uses semantic versioning conventions.

Copyright 2021 Profound Networks LLC

This project is licensed under the terms of the MIT licence.

Documentation

Index

Constants

View Source
const (
	DefaultDataset   = "/company_designator.yml"
	StrBeginBefore   = `^\pZ*`
	StrBeginAfter    = `[\pZ\pP]\pZ*(.+?)\pZ*$`
	StrEndBefore     = `^\pZ*(.+?)\pZ*([\pZ\pP])\pZ*`
	StrEndAfter      = `\pZ*$`
	StrEndContBefore = `^\pZ*(.+?)\pZ*`
	StrEndContAfter  = `\pZ*$`
)

Variables

View Source
var EndDesignatorBlacklist = map[string]bool{
	"Vennootschap": true,
	"L.L.C.":       true,
	"L.C.":         true,
	"Co.":          true,
	"Co. L.L.C.":   true,
}

The standard/perl RE engine in Go doesn't use POSIX-style longest match semantics, which bites us where we have proper subset alternates e.g. `Vennootschap` vs `Vennootschap Onder Firma`. We can workaround this by blacklisting the shorter variant and doing a second pass match if the first one fails.

View Source
var LangContinua = map[string]bool{
	"zh": true,
	"ja": true,
	"ko": true,
}

In languages with continuous scripts, we don't require a word break ([\pZ\pP] before/after designators

Functions

This section is empty.

Types

type Context

type Context struct {
	// contains filtered or unexported fields
}

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

func New

func New() (*Parser, error)

New returns a new Parser using the default company designator dataset

func (*Parser) Parse

func (p *Parser) Parse(input string) (*Result, error)

Parse matches an input company name string against the company designator dataset and returns a Result object containing match results and any parsed components

type PositionType

type PositionType int
const (
	None PositionType = iota
	End
	EndFallback
	EndCont
	Begin
	BeginFallback
)

func (PositionType) String

func (p PositionType) String() string

type Remap

type Remap map[string]*regexp.Regexp

type Result

type Result struct {
	Input      string       // Initial input string
	Matched    bool         // True if a Designator was found
	ShortName  string       // Input with any matched Designator removed
	Designator string       // The Designator found in input, if any (verbatim)
	Position   PositionType // The Designator position, if found
}

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL