Documentation
¶
Index ¶
- Variables
- func Asset(name string) ([]byte, error)
- func AssetDir(name string) ([]string, error)
- func AssetInfo(name string) (os.FileInfo, error)
- func AssetNames() []string
- func ExtractLicenseFiles(files []string, reader func(string) (string, error)) []string
- func ExtractReadmeFiles(files []string, reader func(string) (string, error)) []string
- func InvestigateFilesLicenses(fileNames []string, reader func(string) (string, error)) (map[string]float32, error)
- func InvestigateLicenseText(text string) map[string]float32
- func InvestigateLicenseTexts(texts []string) map[string]float32
- func InvestigateProjectLicenses(path string) (map[string]float32, error)
- func InvestigateReadmeText(text string) map[string]float32
- func InvestigateReadmeTexts(texts []string) map[string]float32
- func MustAsset(name string) []byte
- func NormalizeLicenseText(text string, strict bool) string
- func PreprocessHTML(htmlSource string) string
- func PreprocessMarkdown(text string) string
- func PreprocessRestructuredText(text string) string
- func RestoreAsset(dir, name string) error
- func RestoreAssets(dir, name string) error
- type LicenseDatabase
- type WeightedMinHasher
Constants ¶
This section is empty.
Variables ¶
var ( // ErrNoLicenseFound is raised if no license files were found. ErrNoLicenseFound = errors.New("no license file was found") )
Functions ¶
func Asset ¶
Asset loads and returns the asset for the given name. It returns an error if the asset could not be found or could not be loaded.
func AssetDir ¶
AssetDir returns the file names below a certain directory embedded in the file by go-bindata. For example if you run go-bindata on data/... and data contains the following hierarchy:
data/ foo.txt img/ a.png b.png
then AssetDir("data") would return []string{"foo.txt", "img"} AssetDir("data/img") would return []string{"a.png", "b.png"} AssetDir("foo.txt") and AssetDir("notexist") would return an error AssetDir("") will return []string{"data"}.
func AssetInfo ¶
AssetInfo loads and returns the asset info for the given name. It returns an error if the asset could not be found or could not be loaded.
func ExtractLicenseFiles ¶
ExtractLicenseFiles returns the list of possible license texts. The file names are matched against the template. Reader is used to to read file contents.
func ExtractReadmeFiles ¶
ExtractReadmeFiles searches for README files. Reader is used to to read file contents.
func InvestigateFilesLicenses ¶
func InvestigateFilesLicenses( fileNames []string, reader func(string) (string, error)) (map[string]float32, error)
InvestigateFilesLicenses scans the given list of file names, reads them with `reader` and detects the licenses. Each match has the confidence assigned, from 0 to 1, 1 means 100% confident.
func InvestigateLicenseText ¶
InvestigateLicenseText takes the license text and returns the most probable reference licenses matched. Each match has the confidence assigned, from 0 to 1, 1 means 100% confident.
func InvestigateLicenseTexts ¶
InvestigateLicenseTexts takes the list of candidate license texts and returns the most probable reference licenses matched. Each match has the confidence assigned, from 0 to 1, 1 means 100% confident.
func InvestigateProjectLicenses ¶
InvestigateProjectLicenses returns the most probable reference licenses matched for the given file tree. Each match has the confidence assigned, from 0 to 1, 1 means 100% confident.
func InvestigateReadmeText ¶
InvestigateReadmeText scans the README file for licensing information and outputs probable names found with Named Entity Recognition from NLP.
func InvestigateReadmeTexts ¶
InvestigateReadmeTexts scans README files for licensing information and outputs the probable names using NER.
func MustAsset ¶
MustAsset is like Asset but panics when Asset would return an error. It simplifies safe initialization of global variables.
func NormalizeLicenseText ¶
NormalizeLicenseText makes a license text ready for analysis. It follows SPDX guidelines at https://spdx.org/spdx-license-list/matching-guidelines
func PreprocessHTML ¶
PreprocessHTML converts HTML to plain text. E.g. it rips all the tags.
func PreprocessMarkdown ¶
PreprocessMarkdown converts Markdown to plain text. It tries to revert all the decorations.
func PreprocessRestructuredText ¶
PreprocessRestructuredText converts ReStructuredText to plain text. It tries to revert all the decorations.
func RestoreAsset ¶
RestoreAsset restores an asset under the given directory
func RestoreAssets ¶
RestoreAssets restores an asset under the given directory recursively
Types ¶
type LicenseDatabase ¶
type LicenseDatabase struct { Debug bool // contains filtered or unexported fields }
LicenseDatabase holds the license texts, their hashes and the hashtables to query for nearest neighbors.
func (LicenseDatabase) Length ¶
func (db LicenseDatabase) Length() int
Length returns the number of registered licenses.
func (*LicenseDatabase) Load ¶
func (db *LicenseDatabase) Load()
Load takes the licenses from the embedded storage, normalizes, hashes them and builds the LSH hashtables.
func (*LicenseDatabase) QueryLicenseText ¶
func (db *LicenseDatabase) QueryLicenseText(text string) map[string]float32
QueryLicenseText returns the most similar registered licenses.
func (*LicenseDatabase) QueryReadmeText ¶
func (db *LicenseDatabase) QueryReadmeText(text string) map[string]float32
QueryReadmeText tries to detect licenses mentioned in the README.
func (LicenseDatabase) VocabularySize ¶
func (db LicenseDatabase) VocabularySize() int
VocabularySize returns the number of unique unigrams.
type WeightedMinHasher ¶
type WeightedMinHasher struct {
// contains filtered or unexported fields
}
WeightedMinHasher calculates Weighted MinHash-es. https://ekzhu.github.io/datasketch/weightedminhash.html
func NewWeightedMinHasher ¶
func NewWeightedMinHasher(dim int, sampleSize int, seed int64) *WeightedMinHasher
NewWeightedMinHasher initializes a new instance of WeightedMinHasher. `dim` is the bag size. `sampleSize` is the hash length. `seed` is the random generator seed, as Weighted MinHash is probabilistic.
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
cmd
|
|