README

linguist

godoc reference

Go port of github linguist.

Many thanks to @petermattis for his initial work in laying the groundwork of creating this project, and especially for suggesting the use of naive Bayesian classification.

Thanks also to @jbrukh for github.com/jbrukh/bayesian

install

prerequisites:
go get github.com/jteeuwen/go-bindata/go-bindata
mkdir -p $GOPATH/src/github.com/dayvonjersen/linguist
git clone --depth=1 https://github.com/dayvonjersen/linguist $GOPATH/src/github.com/dayvonjersen/linguist
go get -d github.com/dayvonjersen/linguist
cd $GOPATH/src/github.com/dayvonjersen/linguist
make
l

see also

command-line reference implentation which is documented separately

tokenizer | (godoc reference)

Documentation

Overview

    Detect programming language of source files. Go port of GitHub Linguist: https://github.com/github/linguist

    Prerequisites:

    go get github.com/jteeuwen/go-bindata/go-bindata
    

    Installation:

    mkdir -p $GOPATH/src/github.com/dayvonjersen/linguist
    git clone --depth=1 https://github.com/dayvonjersen/linguist $GOPATH/src/github.com/dayvonjersen/linguist
    go get -d github.com/dayvonjersen/linguist
    cd $GOPATH/src/github.com/dayvonjersen/linguist
    make
    l
    

    Usage:

    Please refer to the source code for the reference implementation at:

    https://github.com/dayvonjersen/linguist/tree/master/cmd/l

    See also:

    https://github.com/dayvonjersen/linguist/tree/master/tokenizer

    Index

    Constants

    This section is empty.

    Variables

    This section is empty.

    Functions

    func Analyse

    func Analyse(contents []byte, hints []string) (language string)

      Uses Naive Bayesian Classification on the file contents provided.

      Returns the name of a programming language, or the empty string if one could not be determined.

      It is recommended to use LanguageByContents() instead of this function directly.

      Obtain hints from LanguageHints()

      NOTE(tso): May yield inaccurate results

      func IsBinary

      func IsBinary(contents []byte) bool

        Checks contents for known character escape codes which frequently show up in binary files but rarely (if ever) in text.

        Use this check before using LanguageFromContents to reduce likelihood of passing binary data into it which can cause inaccurate results.

        func IsDocumentation

        func IsDocumentation(path string) bool

          Checks if path contains a filename commonly belonging to documentation.

          func IsVendored

          func IsVendored(path string) bool

            Checks if path contains a filename commonly belonging to configuration files.

            func LanguageByContents

            func LanguageByContents(contents []byte, hints []string) string

              Attempts to detect the language of a source file based on its contents and a slice of hints to the possible answer.

              Obtain hints with LanguageHints()

              Returns the empty string a language could not be determined.

              func LanguageByFilename

              func LanguageByFilename(filename string) string

                Attempts to determine the language of a source file based solely on common naming conventions and file extensions from the languages.yml file provided by https://github.com/github/linguist

                Returns the empty string in ambiguous or unrecognized cases.

                func LanguageColor

                func LanguageColor(language string) string

                  Convenience function that returns the color associated with the language, in HTML Hex notation (e.g. "#123ABC") from the languages.yml file provided by https://github.com/github/linguist

                  Returns the empty string if there is no associated color for the language.

                  func LanguageHints

                  func LanguageHints(filename string) (hints []string)

                    Attempts to detect all possible languages of a source file based solely on common naming conventions and file extensions from the languages.yml file provided by https://github.com/github/linguist

                    Intended to be used with LanguageByContents.

                    May return an empty slice.

                    func ShouldIgnoreContents

                    func ShouldIgnoreContents(contents []byte) bool

                      Checks if contents should not be passed to LangugeByContents.

                      (this simply calls IsBinary)

                      func ShouldIgnoreFilename

                      func ShouldIgnoreFilename(filename string) bool

                        Checks if filename should not be passed to LanguageByFilename.

                        (this simply calls IsVendored and IsDocumentation)

                        Types

                        This section is empty.

                        Directories

                        Path Synopsis
                        cmd
                        l
                        go port of https://github.com/github/linguist/blob/master/lib/linguist/tokenizer.rb in their words: # Generic programming language tokenizer.
                        go port of https://github.com/github/linguist/blob/master/lib/linguist/tokenizer.rb in their words: # Generic programming language tokenizer.