Documentation
¶
Overview ¶
Package src provides a set of structures for representing a project with its related source code independently of the language. In other words, it provides a generic representation (abstraction) of a source code.
Goal ¶
The goal of this package is to provide a generic representation of a project that can be analyzed by the anlzr package as well as an API for encoding/decoding it to/from JSON.
A presentation video is available on the DevMine website:
http://devmine.ch/news/2015/06/08/srcanlzr-presentation/
Usage ¶
There are two kinds of program that interact with a src.Project: language parsers and VCS support tools. The former visits all source files inside the project folder and parse every source file in order to fill the src.Project.Packages field (and few others). The latter read the VCS folder that contains VCS data and fill the src.Project.Repo structure. The next two chapters treat about them more in details.
Language parsers ¶
Language parsers must output the same structure as defined by the src.Project type. They have to first parse a project in order to get the specific AST for that project. Then, they have to make that AST match with our generic AST defined in the package:
http://godoc.org/github.com/DevMine/srcanlzr/src/ast
To get more detail about how to write a language parser for srcanlzr, refer to that tutorial:
http://devmine.ch/news/2015/05/31/how-to-write-a-parser/
VCS support tools ¶
Language parsers must not provide any information related to Version Control Systems (VCS). VCS metadata is the job of repotool:
http://devmine.ch/news/2015/06/01/repotool-presentation/ http://devmine.ch/doc/repotool/
Example ¶
For the following Go source file (greet/main.go):
package main import ( "fmt" ) func greet(name string) { fmt.Printf("Hello, %s!\n", name) } func main() { name := "World" greet(name) }
The language parser must produce the following JSON output:
{ "name": "greet", "loc": 5, "languages": [ { "language": "go", "paradigms": [ "compiled", "concurrent", "imperative", "structured" ] } ], "packages": [ { "loc": 5, "name": "greet", "path": "/home/revan/go/src/foo/greet", "source_files": [ { "functions": [ { "body": [ { "expression": { "arguments": [ { "expression_name": "BASIC_LIT", "kind": "STRING", "value": "Hello, %s!\\n" }, { "expression_name": "IDENT", "name": "name" } ], "expression_name": "CALL", "function": { "function_name": "Printf", "namespace": "fmt" }, "line": 0 }, "statement_name": "EXPR" } ], "loc": 0, "name": "greet", "type": { "parameters": [ { "name": "name", "type": "string" } ] }, "visibility": "" }, { "body": [ { "left_hand_side": [ { "expression_name": "IDENT", "name": "name" } ], "line": 1, "right_hand_side": [ { "expression_name": "BASIC_LIT", "kind": "STRING", "value": "World" } ], "statement_name": "ASSIGN" }, { "expression": { "arguments": [ { "expression_name": "IDENT", "name": "name" } ], "expression_name": "CALL", "function": { "function_name": "greet", "namespace": "" }, "line": 0 }, "statement_name": "EXPR" } ], "loc": 0, "name": "main", "type": null, "visibility": "" } ], "imports": [ "fmt" ], "language": { "language": "go", "paradigms": [ "compiled", "concurrent", "imperative", "structured" ] }, "loc": 5, "path": "/home/revan/go/src/foo/greet/main.go" } ] } ] }
Lines of Code counting ¶
The number of real lines of code must be precomputed by the language parsers. This is the only "feature" that must be precomputed because it may have multiple usages:
1. Eliminate empty projects
2. Evalutate project size
3. Verify that the decoding is correct
4. Normalize various counts
5. ...
Therefore, this count must be accurate and strictly follow the following rules:
We only count statements and declarations as a line of code. Comments, package declaration, imports, expression, etc. must not be taken into account. Since an exemple is worth more than a thousand words, let's consider the following snippet:
// Package doc (does not count as a line of code) package main // does not count as a line of code import "fmt" // does not count as a line of code func main() { // count as 1 line of code fmt.Println( "Hello, World! ) // count as 1 line of code }
The expected number of lines of code is 2: The main function declaration and the call to fmt.Println function.
Performance ¶
DevMine project is dealing with Terabytes of source code, therefore the JSON decoding must be efficient. That is why we implemented our own JSON decoder that focuses on performance. To do so, we had to make some choices and add some constraints for language parsers in order to make this process as fast as possible.
JSON is usually unpredicatable which forces JSON parsers to be generic to deal with every possible kind of input. In DevMine, we have a well defined structure, thus instead of writting a generic JSON decoder we wrote one that decodes only src.Project objects. This really improves the performances since we don't need to use reflextion, generic types (interface{}) and type assertion. The drawback of this choice is that we have to update the decoder everytime we modify our structures.
Most JSON parsers assume that the JSON input is potentially invalid (ie. malformed). We don't. Unlike json.Unmarshal, we don't Check for well-formedness.
We also force the language parsers to put the "expression_name" and "statement_name" fields at the beginning of the JSON object. We use that convention to decode generic ast.Expr and ast.Stmt without reading the whole JSON object.
Besides, we restrict the supported JSON types to:
string int64 float64 bool object array
All objects used (even inside an array) must absolutely be a pointer. This is required by the decoder generator.
The only officially supported encoding is UTF-8.
Index ¶
Constants ¶
const ( Git = "git" Hg = "mercurial" SVN = "subversion" Bzr = "bazaar" CVS = "cvs" )
Supported VCS (Version Control System)
const ( Go = "go" Ruby = "ruby" Python = "python" C = "c" Java = "java" Scala = "scala" )
Supported programming languages
const ( Structured = "structured" Imperative = "imperative" Procedural = "procedural" Compiled = "compiled" Concurrent = "concurrent" Functional = "functional" ObjectOriented = "object oriented" Generic = "generic" Reflective = "reflective" )
Supported paradigms
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Language ¶
type Language struct { // The programming language name (e.g. go, ruby, java, etc.) // // The name must match one of the supported programming languages defined in // the constants. Lang string `json:"language"` // TODO rename into name // The paradigms of the programming language (e.g. structured, imperative, // object oriented, etc.) // // The name must match one of the supported paradigms defined in the // constants. Paradigms []string `json:"paradigms"` }
A Language represents a programming language.
type Package ¶
type Package struct { // The package documentation, or nil. // TODO support docucmentation for multiple languages. Doc []string `json:"doc,omitempty"` // The package name. This should be the name of the parent folder. Name string `json:"name"` // The full path of the package. The path must be relative to the root of // the project and never be an absolute path. Path string `json:"path"` // The list of all source files contained in the package. SrcFiles []*SrcFile `json:"source_files"` // The total number of lines of code of the package. LoC int64 `json:"loc"` }
Package holds information about a package, which is, basically, just a folder that contains at least one source file.
type Project ¶
type Project struct { // The name of the project. Since it may be something really difficult to // guess, it should generally be the name of the folder containing the // project. Name string `json:"name"` // The repository in which the project is hosted, or nil. This field is not // meant to be filled by one of the language parsers. Only repotool should // take care of it. For more details, see: // https://github.com/DevMine/repotool // // Since this field uses an external type, it is not unmarshalled by // src.Unmarshal itself but by the standard json.Unmarshal function. // To do so, its unmarshalling is defered using json.RawMessage. // See the RepoRaw field. Repo *model.Repository `json:"repository,omitempty"` // The list of all programming languages used by the project. Each language // must be added by the corresponding language parsers if and only if the // project contains at least one line of code written in this language. Langs []*Language `json:"languages"` // List of all packages of the project. We call "package" every folder that // contains at least one source file. Packages []*Package `json:"packages"` // The total number of lines of code in the whole project, independently of // the language. LoC int64 `json:"loc"` }
Project is the root of the src API and therefore it must be at the root of the JSON.
It contains the metadata of a project and the list of all packages.
func DecodeFile ¶
DecodeFile decodes a JSON encoded src.Project read from a given file.
func MergeAll ¶
MergeAll merges a list of projects.
There must be at least one project. In this case, it just returns a copy of the project. Moreover, the projects must be distinct.
The merge only performs shallow copies, which means that if the field value is a pointer it copies the memory address and not the value pointed.
func (*Project) Encode ¶
Encode writes JSON representation of the project into w.
For now, encoding still make use of the json package of the standard libary.
func (*Project) EncodeToFile ¶
EncodeToFile writes JSON representation of the project into a file located at path.
type SrcFile ¶
type SrcFile struct { // The path of the source file, relative to the root of the project. Path string `json:"path"` // Programming language used. Lang *Language `json:"language"` // List of the imports used by the srouce file. Imports []string `json:"imports,omitempty"` // Types definition TypeSpecs []*ast.TypeSpec `json:"type_specifiers,omitempty"` // Structures definition // TODO rename JSON key into structures Structs []*ast.StructType `json:"structs,omitempty"` // List of constants defined at the file level (e.g. global constants) Constants []*ast.GlobalDecl `json:"constants,omitempty"` // List of variables defined at the file level (e.g. global variables) Vars []*ast.GlobalDecl `json:"variables,omitempty"` // List of functions Funcs []*ast.FuncDecl `json:"functions,omitempty"` // List of interfaces Interfaces []*ast.Interface `json:"interfaces,omitempty"` // List of classes Classes []*ast.ClassDecl `json:"classes,omitempty"` // List of enums Enums []*ast.EnumDecl `json:"enums,omitempty"` // List of traits // See http://en.wikipedia.org/wiki/Trait_%28computer_programming%29 Traits []*ast.Trait `json:"traits,omitempty"` // The total number of lines of code. LoC int64 `json:"loc"` }
SrcFile holds information about a source file.