terf

package module
v0.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 27, 2019 License: GPL-3.0 Imports: 19 Imported by: 0

README

===============================================================================
terf - TensorFlow TFRecords file format Reader/Writer
===============================================================================

|godoc|

terf is a Go library for reading/writing TensorFlow `TFRecords files
<https://www.tensorflow.org/versions/r1.1/api_guides/python/python_io#tfrecords_format_details>`_.
The goals of this project are two fold:

1. Read/Write TensorFlow TFRecords files in Go
2. Provide an easy way to generate example image datasets for use in TensorFlow

With terf you can easily build, inspect, and extract image datasets from the
command line without having to install TensorFlow. terf was developed for use
with `MARCO <https://marco.ccr.buffalo.edu>`_ but should work with most image
datasets. The TFRecords file format is based on the imagenet dataset from the
inception research model in TensorFlow.

-------------------------------------------------------------------------------
Install
-------------------------------------------------------------------------------

Binaries for your platform can be found `here <https://github.com/ubccr/terf/releases>`_

Usage::

    $ ./terf --help

-------------------------------------------------------------------------------
Examples
-------------------------------------------------------------------------------

~~~~~~~~~~~~~~~~~~~~~~~~~
Create an image dataset
~~~~~~~~~~~~~~~~~~~~~~~~~

You have a directory of images that have been labeled and you want to build an
image dataset that can be used in TensorFlow. First step is to generate a CSV
file in the following format::

	image_path,image_id,label_id,label_text,label_raw,source

Where image_path is the path to the raw image file, image_id is the unique
identifier for an image, label_id is the integer identifier of the normalized
label, label_raw is the integer identifier for the raw label, label_text is the
normalized label, and source is the source (organization/creator etc) that
produced the image. For example::

	image_path,image_id,label_id,label_text,label_raw,source
	/data/03c3_G6_ImagerDefaults_6.jpg,123,1,Crystals,12,101
	/data/X0000056450155200509052032.png,124,0,Clear,15,104


To build the image dataset run the following command::

	$ ./terf -d build --input images.csv --output train_directory/ --size 1024	

This will convert the image data into a sharded data set of TFRecords files in
the train/ output directory::
	
	train_directory/train-00000-of-00024
	train_directory/train-00001-of-00024
	...
	train_directory/train-00023-of-00024

Each TFRecord file will contain ~1024 records. Each record within the TFRecord
file is a serialized Example proto. The Example proto contains the following
fields::

	image/height: integer, image height in pixels
	image/width: integer, image width in pixels
	image/colorspace: string, specifying the colorspace, always 'RGB'
	image/channels: integer, specifying the number of channels, always 3
	image/class/label: integer, specifying the index in a normalized classification layer
	image/class/raw: integer, specifying the index in the raw (original) classification layer
	image/class/source: integer, specifying the index of the source (creator of the image)
	image/class/text: string, specifying the human-readable version of the normalized label
	image/format: string, specifying the format, always 'JPEG'
	image/filename: string containing the basename of the image file
	image/id: integer, specifying the unique id for the image
	image/encoded: string, containing JPEG encoded image in RGB colorspace

~~~~~~~~~~~~~~~~~~~~~~~~~
Inspect an image dataset
~~~~~~~~~~~~~~~~~~~~~~~~~

Generate summary statistics on an image dataset::

	$ ./terf -d summary --input train_directory/
	INFO[0000] Processing file  path=train_directory/train-00001-of-00001 zlib=false
	Total: 10
	Label: 
		- Clear: 5
		- Precipitate: 4
		- Crystals: 1
	Source: 
		- 2: 2
		- 3: 6
		- 1: 2
	Label ID: 
		- 1: 1
		- 0: 5
		- 3: 4
	Label Raw: 
		- 30: 1
		- 2: 3
		- 8: 1
		- 16: 1
		- 1: 2
		- 14: 2

~~~~~~~~~~~~~~~~~~~~~~~~~
Extract an image dataset
~~~~~~~~~~~~~~~~~~~~~~~~~

Extract the raw image data from a dataset::

	$ ./terf -d extract --input train_directory -o dump/
	INFO[0000] Processing file    path=train_directory/train-00001-of-00001 zlib=false
	$ find dump/
	dump/
	dump/info.csv
	dump/Clear
	dump/Clear/396612.jpg
	dump/Clear/90089.jpg
	dump/Clear/192089.jpg
	dump/Clear/283709.jpg
	dump/Clear/82162.jpg
	dump/Precipitate
	dump/Precipitate/286612.jpg
	dump/Precipitate/421709.jpg
	dump/Precipitate/296118.jpg
	dump/Precipitate/163507.jpg
	dump/Crystals
	dump/Crystals/80373.jpg


~~~~~~~~~~~~~~~~~~~~~~
Go
~~~~~~~~~~~~~~~~~~~~~~

Parse TFRecords file in Go:

.. code-block:: go

	// Open TFRecord file
	in, err := os.Open("train-000")
	if err != nil {
		log.Fatal(err)
	}
	defer in.Close()

	r := terf.NewReader(in)

	count := 0
	for {
		// example will be a TensorFlow Example proto
		example, err := r.Next()
		if err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}

		// Do something with example

		id := terf.ExampleFeatureInt64(example, "image/id")
		labelID := terf.ExampleFeatureInt64(example, "image/class/label")
		labelText := string(terf.ExampleFeatureBytes(example, "image/class/text"))

		fmt.Printf("Image: %d Label: %s (%d)\n", id, labelText, labelID)
		count++
	}

	fmt.Printf("Total records: %d\n", count)

-------------------------------------------------------------------------------
License
-------------------------------------------------------------------------------

terf is released under the GPLv3 License. See the LICENSE file.

.. |godoc| image:: https://godoc.org/github.com/golang/gddo?status.svg
    :target: https://godoc.org/github.com/ubccr/terf
    :alt: Godoc

Documentation

Overview

Package terf implements a reader/writer for TensorFlow TFRecords files

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func BytesFeature

func BytesFeature(val []byte) *protobuf.Feature

BytesFeature is a helper function for encoding TensorFlow Example proto Bytes features

func ExampleFeatureBytes

func ExampleFeatureBytes(example *protobuf.Example, key string) []byte

ExampleFeatureBytes is a helper function for decoding proto Bytes feature from a TensorFlow Example. If key is not found it returns default value

func ExampleFeatureBytesList added in v0.0.4

func ExampleFeatureBytesList(example *protobuf.Example, key string) [][]byte

ExampleFeatureBytesList is a helper function for decoding proto Bytes feature from a TensorFlow Example. If key is not found it returns default value

func ExampleFeatureFloat

func ExampleFeatureFloat(example *protobuf.Example, key string) float64

ExampleFeatureFloat is a helper function for decoding proto Float feature from a TensorFlow Example. If key is not found it returns default value

func ExampleFeatureFloatList added in v0.0.4

func ExampleFeatureFloatList(example *protobuf.Example, key string) []float32

ExampleFeatureFloatList is a helper function for decoding proto Float feature from a TensorFlow Example. If key is not found it returns default value

func ExampleFeatureInt64

func ExampleFeatureInt64(example *protobuf.Example, key string) int64

ExampleFeatureInt64 is a helper function for decoding proto Int64 feature from a TensorFlow Example. If key is not found it returns default value

func ExampleFeatureInt64List added in v0.0.4

func ExampleFeatureInt64List(example *protobuf.Example, key string) []int64

ExampleFeatureInt64List is a helper function for decoding proto Int64 feature from a TensorFlow Example. If key is not found it returns default value

func FloatFeature

func FloatFeature(val float32) *protobuf.Feature

FloatFeature is a helper function for encoding TensorFlow Example proto Float features

func Int64Feature

func Int64Feature(val int64) *protobuf.Feature

Int64Feature is a helper function for encoding TensorFlow Example proto Int64 features

Types

type Image

type Image struct {
	// Unique ID for the image
	ID int

	// Width in pixels of the image
	Width int

	// Height in pixels of the image
	Height int

	// Integer ID for the normalized label (class)
	LabelID int

	// Integer ID for the raw label
	LabelRaw int

	// The human-readable version of the normalized label
	LabelText string

	// Integer ID for the source of the image. This is typically the
	// organization or owner that created the image
	SourceID int

	// Base filename of the original image
	Filename string

	// Image format (JPEG, PNG)
	Format string

	// Image colorpace (RGB, Gray)
	Colorspace string

	// Raw image data
	Raw []byte
}

Image is an Example image for training/validating in TensorFlow

func NewImage

func NewImage(r io.Reader, id, labelID, labelRaw int, labelText, filename string, sourceID int) (*Image, error)

NewImage returns a new Image. r is the io.Reader for the raw image data, id is the unique identifier for the image, labelID is the integer identifier of the normalized label, labelRaw is the integer identifier for the raw label, labelText is the normalized label, filename is the base name of the file, and sourceID is the source that produced the image

func (*Image) MarshalCSV

func (i *Image) MarshalCSV(baseDir string) []string

MarshalCSV encodes Image i into a CSV record. This is the inverse of UnmarshalCSV. The image_path will be generated based on the id of the image and the provided baseDir.

func (*Image) MarshalExample

func (i *Image) MarshalExample() (*protobuf.Example, error)

MarshalExample converts the Image to a TensorFlow Example proto. The Example proto schema is as follows:

image/height: integer, image height in pixels
image/width: integer, image width in pixels
image/colorspace: string, specifying the colorspace
image/channels: integer, specifying the number of channels, always 3
image/class/label: integer, specifying the index in a normalized classification layer
image/class/raw: integer, specifying the index in the raw (original) classification layer
image/class/source: integer, specifying the index of the source (creator of the image)
image/class/text: string, specifying the human-readable version of the normalized label
image/format: string, specifying the format
image/filename: string containing the basename of the image file
image/id: integer, specifying the unique id for the image
image/encoded: string, containing the raw encoded image

func (*Image) Name

func (i *Image) Name() string

Name returns the generated base filename for the image: [id].[format]

func (*Image) Read added in v0.0.2

func (i *Image) Read(r io.Reader) error

Reads raw image data from r, parses image config and sets Format, Colorspace, Width and Height

func (*Image) Save

func (i *Image) Save(file string) error

Save writes the Image to a file

func (*Image) ToJPEG added in v0.0.2

func (i *Image) ToJPEG() error

ToJPEG converts Image to JPEG format in RGB colorspace

func (*Image) UnmarshalCSV

func (i *Image) UnmarshalCSV(row []string) error

UnmarshalCSV decodes data from a single CSV record row into Image i. The CSV record row is expected to be in the following format:

image_path,image_id,label_id,label_text,label_raw,source

func (*Image) UnmarshalExample

func (i *Image) UnmarshalExample(example *protobuf.Example) error

UnmarshalExample decodes data from a TensorFlow example proto into Image i. This is the inverse of MarshalExample.

func (*Image) Write

func (i *Image) Write(w io.Writer) error

Write writes the raw Image data to w

type Reader

type Reader struct {
	// contains filtered or unexported fields
}

Reader implements a reader for TFRecords with Example protos

Example
package main

import (
	"fmt"
	"io"
	"log"
	"os"

	"github.com/markdicksonjr/terf"
)

func main() {
	// Open TFRecord file
	in, err := os.Open("train-000")
	if err != nil {
		log.Fatal(err)
	}
	defer in.Close()

	r := terf.NewReader(in)

	count := 0
	for {
		// example will be a TensorFlow Example proto
		example, err := r.Next()
		if err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}

		// Do something with example

		id := terf.ExampleFeatureInt64(example, "image/id")
		labelID := terf.ExampleFeatureInt64(example, "image/class/label")
		labelText := string(terf.ExampleFeatureBytes(example, "image/class/text"))

		fmt.Printf("Image: %d Label: %s (%d)\n", id, labelText, labelID)
		count++
	}

	fmt.Printf("Total records: %d\n", count)
}
Output:

Example (Compressed)
package main

import (
	"compress/zlib"
	"fmt"
	"io"
	"log"
	"os"

	"github.com/markdicksonjr/terf"
)

func main() {
	// Open TFRecord file
	in, err := os.Open("train-000")
	if err != nil {
		log.Fatal(err)
	}
	defer in.Close()

	// Create new zlib Reader
	zin, err := zlib.NewReader(in)
	if err != nil {
		log.Fatal(err)
	}
	defer zin.Close()

	r := terf.NewReader(zin)

	count := 0
	for {
		// example will be a TensorFlow Example proto
		example, err := r.Next()
		if err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}

		// Do something with example

		id := terf.ExampleFeatureInt64(example, "image/id")
		labelID := terf.ExampleFeatureInt64(example, "image/class/label")
		labelText := string(terf.ExampleFeatureBytes(example, "image/class/text"))

		fmt.Printf("Image: %d Label: %s (%d)\n", id, labelText, labelID)
		count++
	}

	fmt.Printf("Total records: %d\n", count)
}
Output:

func NewReader

func NewReader(r io.Reader) *Reader

NewReader returns a new Reader

func (*Reader) Next

func (r *Reader) Next() (*protobuf.Example, error)

Next reads the next Example from the TFRecords input

type Writer

type Writer struct {
	// contains filtered or unexported fields
}

Writer implements a writer for TFRecords with Example protos

Example
package main

import (
	"log"
	"os"

	"github.com/markdicksonjr/terf"
)

func main() {
	// Open output file
	out, err := os.Create("train-001")
	if err != nil {
		log.Fatal(err)
	}
	defer out.Close()

	// Create new terf Writer
	w := terf.NewWriter(out)

	// Read in image data from file
	reader, err := os.Open("image.jpg")
	if err != nil {
		log.Fatal(err)
	}
	defer reader.Close()

	// Create new terf Image with labels and source
	img, err := terf.NewImage(reader, 1, 12, 104, "Crystal", "image.jpg", 10)
	if err != nil {
		log.Fatal(err)
	}

	// Marshal image to Example proto
	example, err := img.MarshalExample()
	if err != nil {
		log.Fatal(err)
	}

	// Write Example proto
	err = w.Write(example)
	if err != nil {
		log.Fatal(err)
	}

	// Write any buffered data to the underlying writer
	w.Flush()
	if err := w.Error(); err != nil {
		log.Fatal(err)
	}
}
Output:

Example (Compressed)
package main

import (
	"compress/zlib"
	"log"
	"os"

	"github.com/markdicksonjr/terf"
)

func main() {
	// Open output file
	out, err := os.Create("train-001")
	if err != nil {
		log.Fatal(err)
	}
	defer out.Close()

	// Create zlib writer
	zout := zlib.NewWriter(out)
	defer zout.Close()

	// Create new terf Writer
	w := terf.NewWriter(zout)

	// Read in image data from file
	reader, err := os.Open("image.jpg")
	if err != nil {
		log.Fatal(err)
	}
	defer reader.Close()

	// Create new terf Image with labels and source
	img, err := terf.NewImage(reader, 1, 12, 104, "Crystal", "image.jpg", 10)
	if err != nil {
		log.Fatal(err)
	}

	// Marshal image to Example proto
	example, err := img.MarshalExample()
	if err != nil {
		log.Fatal(err)
	}

	// Write Example proto
	err = w.Write(example)
	if err != nil {
		log.Fatal(err)
	}

	// Write any buffered data to the underlying writer
	w.Flush()
	if err := w.Error(); err != nil {
		log.Fatal(err)
	}
}
Output:

func NewWriter

func NewWriter(w io.Writer) *Writer

NewWriter returns a new Writer

func (*Writer) Error added in v0.0.2

func (w *Writer) Error() error

Error reports any error that has occurred during a previous Write or Flush.

func (*Writer) Flush added in v0.0.2

func (w *Writer) Flush()

Flush writes any buffered data to the underlying io.Writer. To check if an error occurred during the Flush, call Error.

func (*Writer) Write

func (w *Writer) Write(ex *protobuf.Example) error

Write writes the Example in TFRecords format

Directories

Path Synopsis
cmd
Package terf is a generated protocol buffer package.
Package terf is a generated protocol buffer package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL