The decomp project
The aim of this project is to implement a decompilation pipeline composed of independent components interacting through well-defined interfaces, as further described in the design documents of the project.
go get github.com/decomp/decomp/...
From a high-level perspective, the components of the decompilation pipeline are conceptually grouped into three modules. Firstly, the front-end translates a source language (e.g. x86 assembly) into LLVM IR; a platform-independent low-level intermediate representation. Secondly, the middle-end structures the LLVM IR by identifying high-level control flow primitives (e.g. pre-test loops, 2-way conditionals). Lastly, the back-end translates the structured LLVM IR into a high-level target programming language (e.g. Go).
The following poster summarizes the current capabilities of the decompilation pipeline, using a composition of independent components to translate LLVM IR to Go.
Translate machine code (e.g. x86 assembly) to LLVM IR.
Perform control flow analysis on the LLVM IR to identify high-level control flow primitives (e.g. pre-test loops).
Control flow graph generation tool.
Generate control flow graphs from LLVM IR assembly (*.ll -> *.dot).
Control flow recovery tool.
Recover control flow primitives from control flow graphs (*.dot -> *.json).
Translate structured LLVM IR to a high-level target language (e.g. Go).
Go code generation tool.
Decompile LLVM IR assembly to Go source code (*.ll -> *.go).
Go post-processing tool.
Post-process Go source code to make it more idiomatic (*.go -> *.go).
Version 0.2 (2018-01-30)
Primary focus of version 0.2: project-wide compilation speed.
Developing decompilation components should be fun.
There seem to be an inverse correlation between depending on a huge C++ library and having fun developing decompilation components.
Version 0.2 of the decompilation pipeline strives to resolve this issue by leveraging an LLVM IR library written in pure Go. Prior to this release, project-wide compilation could take several hours to complete. Now, they complete in less than 1 minute -- the established hard limit for all future releases.
Version 0.1 (2015-04-21)
Primary focus of version 0.1: compositional decompilation.
Decompilers should be composable and open source.
A decompilation pipeline should be composed of individual components, each with a single purpose and well-defined input and output.
Version 0.1 of the decomp project explores the feasibility of composing a decompilation pipeline from independent components, and the potential of exposing those components to the end-user.
For further background, refer to the Compositional Decompilation using LLVM IR design document.
Version 0.3 (to be released)
Primary focus of version 0.3: type-aware binary lifting.
Decompilers rely on high-quality binary lifting.
The quality of the output IR of the binary lifting front-end fundamentally determines the quality of the output of the entire decompilation pipeline.
Version 0.3 aims to improve the quality of the output LLVM IR by implementing a type-aware binary lifting front-end.
Version 0.4 (to be released)
Primary focus of version 0.4: control flow analysis.
Decompilers should recover high-level control flow primitives.
One of the primary differences between low-level assembly and high-level source code is the use of high-level control flow primitives; e.g. 1-way, 2-way and n-way conditionals (
switch), pre- and post-test loops (
Version 0.4 seeks to recover high-level control flow primitives using robust control flow analysis algorithms.
Version 0.5 (to be released)
Primary focus of version 0.5: fault tolerance.
Decompilers should be robust.
Decompilation components should respond well to unexpected states and incomplete analysis.
Version 0.5 focuses on stability, and seeks to stress test the decompilation pipeline using semi-real world software (see the challenge issue series).
Version 0.6 (to be released)
Primary focus of version 0.6: data flow analysis.
Version 0.7 (to be released)
Primary focus of version 0.7: type analysis.
|cfa||Package cfa implements control flow analysis of control flow graphs.|
|cfa/primitive||Package primitive defines the types used to represent high-level control flow primitives.|
|cmd/go-post||The go-post tool post-processes Go source code to make it more idiomatic (*.go -> *.go).|
|cmd/go-post/internal/diff||Package diff implements a Diff function that compare two inputs using the 'diff' tool.|
|cmd/ll2dot||The ll2dot tool generates control flow graphs from LLVM IR assembly (*.ll -> *.dot).|
|cmd/ll2go||The ll2go tool decompiles LLVM IR assembly to Go source code (*.ll -> *.go).|
|cmd/restructure||The restructure tool recovers control flow primitives from DOT control flow graphs (*.dot -> *.json).|
|graph/cfg||Package cfg provides access to control flow graphs of LLVM IR functions.|