aces

package module
v2.2.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 3, 2024 License: MIT Imports: 7 Imported by: 0

README

Aces

Any Character Encoding Set

Aces is a command line utility that lets you encode any data to a character set of your choice.

Psst... it is also now a library that you can use for encoding and decoding and also writing and reading at a bit level! See documentation here.

For example, you could encode "Foo Bar" to a combination of these four characters: "HhAa", resulting in this hilarious sequence of laughs:

hHhAhAaahAaaHAHHhHHAhAHhhaHA

With Aces installed, you can actually do that with:

$ echo -n "Foo Bar" | aces HhAa
hHhAhAaahAaaHAHHhHHAhAHhhaHA

This was the original use of Aces (it was called ha, increased data size by 4X and had no decoder)

If you're on macOS, you can even convert that output to speech:

echo -n "Matthew Stanciu" | aces HhAa | say

Make your own wacky encoding:

$ echo HELLO WORLD | aces "DORK BUM"
RRD RBO RKD M  DRBU MBRRRKD RDOR

You can also use emojis:

$ echo -n yay | aces 🥇🥈🥉
🥇🥈🥉🥇🥉🥇🥉🥉🥇🥉🥉🥇🥈🥇🥉🥇🥉🥉🥇🥉🥇🥈🥇

With Aces, you can see the actual 0s and 1s of files:

aces 01 < $(which echo)

You can also write hex/octal/binary/your own format by hand:

echo C2A70A   | aces -d 0123456789ABCDEF # try this!
echo .+=...++ | aces -d ./+=

Convert binary to hex:

echo 01001010 | aces -d 01 | aces 0123456789ABCDEF

Also check out the examples!

Installing

macOS or Linux with linuxbrew
brew install quackduck/tap/aces
Other platforms

Head over to releases and download the latest binary!

Usage

Aces - Encode in any character set

Usage:
  aces <charset>                  - encode data from STDIN into <charset>
  aces -d/--decode <charset>      - decode data from STDIN from <charset>
  aces -v/--version | -h/--help   - print version or this help message

  Aces reads from STDIN for your data and outputs the result to STDOUT. An optimized algorithm is used
  for character sets with a power of 2 length. Newlines are ignored when decoding.

Examples:
  echo hello world | aces "<>(){}[]" | aces --decode "<>(){}[]"      # basic usage
  echo matthew stanciu | aces HhAa | say                             # make funny sounds (macOS)
  aces " X" < /bin/echo                                              # see binaries visually
  echo 0100100100100001 | aces -d 01 | aces 0123456789abcdef         # convert bases
  echo Calculus | aces 01                                            # what's stuff in binary?
  echo Aces™ | base64 | aces -d
  ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/   # even decode base64
  echo -n yay | aces 🥇🥈🥉                                          # emojis work too! 
  Set the encoding/decoding buffer size with --bufsize <size> (default 16KiB).

  File issues, contribute or star at github.com/quackduck/aces

How does it work?

To answer that, we need to know how encoding works in general. Let's take the example of Base64.

Base64
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

That is the Base64 character set. As you may expect, it's 64 characters long.

Let's say we want to somehow represent these two bytes in those 64 characters:

00001001 10010010    # 09 92 in hex

To do that, Base64 does something very smart: it uses the bits, interpreted as a number, as indexes of the character set.

To explain what that means, let's consider what possible values 6 bits can represent: 000000 (decimal 0) to 111111 (decimal 63). Since 0 to 63 is the exact range of indices that can be used with the 64 element character set, we'll group our 8 bit chunks (bytes) of data in 6 bit chunks (to use as indices):

000010 011001 0010

000010 is 2 in decimal, so by using it as an index of the character set, Base64 adds C (index 2) to the result.

011001 is 16 + 8 + 1 = 25 in decimal, so Base64 appends Z (index 25) to the result.

You may have spotted a problem with the next chunk - it's only 4 bits long!

To get around this, Base64 pretends it's a 6 bit chunk and simply appends how many zeros are needed:

0010 + 00 => 001000

001000 is 8 in decimal, so Base64 appends I to the result

But then, on the decoding side, how do you know where real data ends and where the pretend data starts?

It turns out that we don't need to do anything. On the decoding side, we know that the decoded data has to be a multiple of 8 bits. So, the decoder ignores the bits which make the output not a multiple of 8 bits, which will always be the extra bits we added.

Finally, encoding 00001001 10010010 to Base64 should result in CZI

Try this in your terminal with the real Base64!

echo -n -e \\x09\\x92 | base64 # base64 also adds a "=" character called "padding" to fit to a standard input length to output length ratio
Aces

Now we generalize this to all character sets of any length.

Generalizing the characters is easy, we just switch out the characters of the array storing the character set.

Changing the length of the character set is harder. For every character set length, we need to figure out how many bits the chunked data should have.

In the Base64 example, the chunk length (let's call it that) was 6. The character set length was 64.

It looks like 2^(chunk len) = set len. We can prove this is true with this observation:

Every bit can either be 1 or 0, so the total possible values of a certain number of bits will just be 2^(number of bits) (if you need further proof, observe that every bit we add doubles the total possibilities since there's an additional choice: the new bit being 0 or the new bit being 1)

The total possible values is the length of the character set (of course, since we need the indices to cover all the characters of the set)

So, to find the number of bits the chunked data should have, we just do log2(character set length). Then, we divide the bytes into chunks of that many bits (which was pretty hard to implement: knowing when to read more bytes, crossing over into the next byte to fetch more bits, etc, etc.), use those bits as indices for the user-supplied character set, and print the result.

Unfortunately, this algorithm only works for character sets with a length that is a power of 2. For character sets with a length that is not a power of 2, we need to do something else.

Sets that are not power of 2 in length use an algorithm that may not have the same output as other encoders with the same character set. For example, using the base58 character set does not mean that the output will be the same as a base58-specific encoder. This is because most encoders interpret data as a number and use a base conversion algorithm to convert it to the character set. For non-power-of-2 charsets, this requires all data to be read before encoding, which is not possible with streams. To enable stream encoding for non-power-of-2 charsets, Aces converts the base of a default of 8 bytes of data at a time, which is not the same as converting the base of the entire data.

Easy! (Nope, this is the work of several showers and a lot of late night pondering :)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BitReader

type BitReader struct {
	// contains filtered or unexported fields
}

BitReader reads a constant number of bits from an io.Reader

func NewBitReader

func NewBitReader(chunkLen uint8, in io.Reader) (*BitReader, error)

NewBitReader returns a BitReader that reads chunkLen bits at a time from in.

func NewBitReaderSize

func NewBitReaderSize(chunkLen uint8, in io.Reader, bufSize int) (*BitReader, error)

NewBitReaderSize is like NewBitReader but allows setting the internal buffer size

func (*BitReader) Read

func (br *BitReader) Read() (byte, error)

Read returns the next chunkLen bits from the stream. If there is no more data to read, it returns io.EOF. For example, if chunkLen is 3 and the next 3 bits are 101, Read returns 5, nil.

type BitWriter

type BitWriter struct {
	// contains filtered or unexported fields
}

BitWriter writes a constant number of bits to an io.Writer

func NewBitWriter

func NewBitWriter(chunkLen uint8, out io.Writer) *BitWriter

NewBitWriter returns a BitWriter that writes chunkLen bits at a time to out.

func NewBitWriterSize

func NewBitWriterSize(chunkLen uint8, out io.Writer, bufSize int) *BitWriter

NewBitWriterSize is like NewBitWriter but allows setting the internal buffer size

func (*BitWriter) Flush

func (bw *BitWriter) Flush() error

Flush writes any remaining data in the buffer to the underlying io.Writer.

func (*BitWriter) Write

func (bw *BitWriter) Write(b byte) error

Write writes the last chunkLen bits from b to the stream. For example, if chunkLen is 3 and b is 00000101, Write writes 101.

type Coding

type Coding interface {
	// SetBufferSize sets internal buffer sizes
	SetBufferSize(size int)
	// SetByteChunkSize sets the number of bytes whose base is converted at time if the character set does not have a
	// length that is a power of 2. Encoding and decoding must be done with the same byte chunk size. The size must be
	// greater than 0 and less than 256.
	SetByteChunkSize(size int)
	// Encode reads from src and encodes to dst
	Encode(dst io.Writer, src io.Reader) error
	// Decode reads from src and decodes to dst
	Decode(dst io.Writer, src io.Reader) error
}

Coding represents an encoding scheme for a character set. See NewCoding for more detail.

func NewCoding

func NewCoding(charset []rune) (Coding, error)

NewCoding creates a new coding with the given character set.

For example,

NewCoding([]rune("0123456789abcdef"))

creates a hex encoding scheme, and

NewCoding([]rune(" ❗"))

creates a binary encoding scheme: 0s are represented by a space and 1s are represented by an exclamation mark.

While a character set of any length can be used, those with power of 2 lengths (2, 4, 8, 16, 32, 64, 128, 256) use a more optimized algorithm.

Sets that are not power of 2 in length use an algorithm that may not have the same output as other encoders with the same character set. For example, using the base58 character set does not mean that the output will be the same as a base58-specific encoder.

This is because most encoders interpret data as a number and use a base conversion algorithm to convert it to the character set. For non-power-of-2 charsets, this requires all data to be read before encoding, which is not possible with streams. To enable stream encoding for non-power-of-2 charsets, Aces converts a default of 4 bytes (adjustable with Coding.SetByteChunkSize) of data at a time, which is not the same as converting the base of the entire data. If stream encoding is not necessary, use StaticCoding, for which using the base58 character set, for example, will produce the same output as a base58-specific encoder.

type StaticCoding

type StaticCoding struct {
	// contains filtered or unexported fields
}

func NewStaticCoding

func NewStaticCoding(charset []rune) (*StaticCoding, error)

NewStaticCoding creates a StaticCoding with the given character set, which must be a set of unique runes. StaticCoding differs from Coding in that it does not accept streamed input, but instead requires the entire input to be provided at once. So, StaticCoding is not recommended for very large inputs. It encodes by changing the mathematical base of the input (interpreted as a binary number) to the length of the charset. Each null byte at the beginning of the input are encoded as the first character in the charset.

For example,

NewStaticCoding([]rune("123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"))

creates the base58 encoding scheme compatible with Bitcoin's implementation.

func (*StaticCoding) Decode

func (c *StaticCoding) Decode(data string) ([]byte, error)

func (*StaticCoding) Encode

func (c *StaticCoding) Encode(data []byte) (string, error)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL