linguist

package module
Version: v0.0.0-...-b1a8da6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 3, 2020 License: Apache-2.0 Imports: 11 Imported by: 4

README

linguist

godoc reference

Go port of github linguist.

Many thanks to @petermattis for his initial work in laying the groundwork of creating this project, and especially for suggesting the use of naive Bayesian classification.

Thanks also to @jbrukh for github.com/jbrukh/bayesian

install

prerequisites:
go get github.com/jteeuwen/go-bindata/go-bindata
mkdir -p $GOPATH/src/github.com/dayvonjersen/linguist
git clone --depth=1 https://github.com/dayvonjersen/linguist $GOPATH/src/github.com/dayvonjersen/linguist
go get -d github.com/dayvonjersen/linguist
cd $GOPATH/src/github.com/dayvonjersen/linguist
make
l

see also

command-line reference implentation which is documented separately

tokenizer | (godoc reference)

Documentation

Overview

Detect programming language of source files. Go port of GitHub Linguist: https://github.com/github/linguist

Prerequisites:

go get github.com/jteeuwen/go-bindata/go-bindata

Installation:

mkdir -p $GOPATH/src/github.com/dayvonjersen/linguist
git clone --depth=1 https://github.com/dayvonjersen/linguist $GOPATH/src/github.com/dayvonjersen/linguist
go get -d github.com/dayvonjersen/linguist
cd $GOPATH/src/github.com/dayvonjersen/linguist
make
l

Usage:

Please refer to the source code for the reference implementation at:

https://github.com/dayvonjersen/linguist/tree/master/cmd/l

See also:

https://github.com/dayvonjersen/linguist/tree/master/tokenizer

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Analyse

func Analyse(contents []byte, hints []string) (language string)

Uses Naive Bayesian Classification on the file contents provided.

Returns the name of a programming language, or the empty string if one could not be determined.

It is recommended to use LanguageByContents() instead of this function directly.

Obtain hints from LanguageHints()

NOTE(tso): May yield inaccurate results

func IsBinary

func IsBinary(contents []byte) bool

Checks contents for known character escape codes which frequently show up in binary files but rarely (if ever) in text.

Use this check before using LanguageFromContents to reduce likelihood of passing binary data into it which can cause inaccurate results.

func IsDocumentation

func IsDocumentation(path string) bool

Checks if path contains a filename commonly belonging to documentation.

func IsVendored

func IsVendored(path string) bool

Checks if path contains a filename commonly belonging to configuration files.

func LanguageByContents

func LanguageByContents(contents []byte, hints []string) string

Attempts to detect the language of a source file based on its contents and a slice of hints to the possible answer.

Obtain hints with LanguageHints()

Returns the empty string a language could not be determined.

func LanguageByFilename

func LanguageByFilename(filename string) string

Attempts to determine the language of a source file based solely on common naming conventions and file extensions from the languages.yml file provided by https://github.com/github/linguist

Returns the empty string in ambiguous or unrecognized cases.

func LanguageColor

func LanguageColor(language string) string

Convenience function that returns the color associated with the language, in HTML Hex notation (e.g. "#123ABC") from the languages.yml file provided by https://github.com/github/linguist

Returns the empty string if there is no associated color for the language.

func LanguageHints

func LanguageHints(filename string) (hints []string)

Attempts to detect all possible languages of a source file based solely on common naming conventions and file extensions from the languages.yml file provided by https://github.com/github/linguist

Intended to be used with LanguageByContents.

May return an empty slice.

func ShouldIgnoreContents

func ShouldIgnoreContents(contents []byte) bool

Checks if contents should not be passed to LangugeByContents.

(this simply calls IsBinary)

func ShouldIgnoreFilename

func ShouldIgnoreFilename(filename string) bool

Checks if filename should not be passed to LanguageByFilename.

(this simply calls IsVendored and IsDocumentation)

Types

This section is empty.

Directories

Path Synopsis
cmd
l
go port of https://github.com/github/linguist/blob/master/lib/linguist/tokenizer.rb in their words: # Generic programming language tokenizer.
go port of https://github.com/github/linguist/blob/master/lib/linguist/tokenizer.rb in their words: # Generic programming language tokenizer.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL