Documentation
¶
Overview ¶
Package inceptionv3 provides a pre-trained InceptionV3 model, or simply it's structure.
This library creates the model architecture and optionally loads the pre-trained weights from Google. It can be used with or without the top-layer.
Reference: - Rethinking the Inception Architecture for Computer Vision (CVPR 2016), http://arxiv.org/abs/1512.00567
Based on Keras implementation:
- Source: github.com/keras-team/keras/keras/applications/inception_v3.py(https://github.com/keras-team/keras/blob/v2.12.0/keras/applications/inception_v3.py) - Documentation: https://keras.io/api/applications/inceptionv3/
To use it, start with BuildGraph. If using the pre-trained weights, call once DownloadAndUnpackWeights -- it is a no-op if weights have already been downloaded and unpacked.
If using with transfer learning, be mindful it uses batch normalization, which has its own considerations, see discussion in https://pub.towardsai.net/batchnorm-for-transfer-learning-df17d2897db6 .
This model ¶
Transfer learning model example:
var ( flagDataDir = flag.String("data", "~/work/my_model", "Directory where to save and load model data.") flagInceptionPreTrained = flag.Bool("pretrained", true, "If using inception model, whether to use the pre-trained weights to transfer learn") flagInceptionFineTuning = flag.Bool("finetuning", true, "If using inception model, whether to fine-tune the inception model") ) func ModelGraph(ctx *context.Context, spec any, inputs []*Node) []*Node { _ = spec // Not needed. image := inputs[0] channelsConfig := images.ChannelsLast image = inceptionv3.PreprocessImage(image, channelsConfig) image = inceptionv3.ScaleImageValuesTorch(image) var preTrainedPath string if *flagInceptionPreTrained { preTrainedPath = *flagDataDir } logits := inceptionv3.BuildGraph(ctx, image). PreTrained(preTrainedPath). SetPooling(inceptionv3.MaxPooling). Trainable(*flagInceptionFineTuning).Done() logits = fnn.New(ctx, logits, 1).Done() return []*Node{logits} } func main() { … if *flagInceptionPreTrained { err := inceptionv3.DownloadAndUnpackWeights(*flagDataDir) AssertNoError(err) } … }
Index ¶
- Constants
- func DownloadAndUnpackWeights(baseDir string) (err error)
- func KidMetric(dataDir string, kidImageSize int, maxImageValue float64, ...) metrics.Interface
- func PathToTensor(baseDir, tensorName string) string
- func PoolingStrings() []string
- func PreprocessImage(image *Node, maxValue float64, channelsConfig images.ChannelsAxisConfig) *Node
- type Config
- func (cfg *Config) BatchNormScale(value bool) *Config
- func (cfg *Config) ChannelsAxis(channelsAxisConfig images.ChannelsAxisConfig) *Config
- func (cfg *Config) ClassificationTop(useTop bool) *Config
- func (cfg *Config) Done() (output *Node)
- func (cfg *Config) PreTrained(baseDir string) *Config
- func (cfg *Config) SetPooling(pooling Pooling) *Config
- func (cfg *Config) Trainable(trainable bool) *Config
- func (cfg *Config) WithAliases(useAliases bool) *Config
- type KidBuilder
- type Pooling
Constants ¶
const ( // WeightsURL is the URL for the whole model, including the top layer, a 1000-classes linear layer on top. WeightsURL = "https://storage.googleapis.com/tensorflow/keras-applications/inception_v3/inception_v3_weights_tf_dim_ordering_tf_kernels.h5" // WeightsH5Checksum is the SHA256 checksum of the weights file. WeightsH5Checksum = "00c9ea4e4762f716ac4d300d6d9c2935639cc5e4d139b5790d765dcbeea539d0" // WeightsH5Name is the name of the local ".h5" file with the weights. WeightsH5Name = "weights.h5" // UnpackedWeightsName is the name of the subdirectory that will hold the unpacked weights. UnpackedWeightsName = "gomlx_weights" )
const BuildScope = "InceptionV3"
BuildScope is used by BuildGraph as a new sub-scope for the InceptionV3 layers.
const ClassificationImageSize = 299
ClassificationImageSize if using the Inception V3's model for classification. The image should be 299 x 299.
const EmbeddingSize = 2048
EmbeddingSize output (it not using the top).
const MinimumImageSize = 75
MinimumImageSize for width and height required.
const NumberOfClasses = 1000
NumberOfClasses when using the top layer.
Variables ¶
This section is empty.
Functions ¶
func DownloadAndUnpackWeights ¶
DownloadAndUnpackWeights to the given baseDir. It only does the work if the files are not there yet (downloaded and unpacked).
It is verbose and uses a progressbar if downloading/unpacking. It is quiet if there is nothing to do, that is, if the files are already there.
func KidMetric ¶ added in v0.4.0
func KidMetric(dataDir string, kidImageSize int, maxImageValue float64, channelsConfig images.ChannelsAxisConfig) metrics.Interface
KidMetric returns a metric that takes a generated image and a label image and returns a measure of similarity.
[Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) was proposed as a replacement for the popular [Frechet Inception Distance (FID) metric](https://arxiv.org/abs/1706.08500) for measuring image generation quality. Both metrics measure the difference in the generated and training distributions in the representation space of an InceptionV3 network pretrained on ImageNet.
The implementation is based on the Keras one, described in https://keras.io/examples/generative/ddim/
To directly calculate KID, as opposed to using it as a metric, see NewKidBuilder below.
Parameters:
- `dataDir`: directory where to download and unpack the InceptionV3 weights. They are reused from there in subsequent calls.
- `kidImageSize`: resize input images (labels and predictions) to `kidImageSize x kidImageSize` before running the Kid metric calculation. It should be between 75 and 299. Smaller values make the metric faster.
- `maxImageValue`: Maximum value the images can take at any channel -- If set to 0 it doesn't rescale the pixel values, and the images are expected to have values between -1.0 and 1.0. Passed to `PreprocessImage` function.
- `channelsConfig`: informs what is the channels axis, commonly set to `images.ChannelsLast`. Passed to `PreprocessImage` function.
Note: `images` refers to package `github.com/gomlx/gomlx/types/tensor/image`.
func PathToTensor ¶
PathToTensor returns the path to tensorName (name within the h5 file).
func PoolingStrings ¶ added in v0.17.1
func PoolingStrings() []string
PoolingStrings returns a slice of all String values of the enum
func PreprocessImage ¶
func PreprocessImage(image *Node, maxValue float64, channelsConfig images.ChannelsAxisConfig) *Node
PreprocessImage makes the image in a format usable to InceptionV3 model.
It performs 3 tasks:
- Scales the values from -1.0 to 1.0: this is how it was originally trained. It requires `maxValue` to be carefully set to the maxValue of the images -- it is assumed the images are scaled from 0 to `maxValue`. Set `maxValue` to zero to skip this step.
- It removes the alpha channel, in case it is provided.
- The minimum image size accepted by InceptionV3 is 75x75. If any size is smaller than that, it will be resized accordingly, while preserving the aspect ratio.
Input `image` must have a batch dimension (rank=4), be either 3 or 4 channels, and its values must be scaled from 0 to maxValue (except if it is set to -1).
Types ¶
type Config ¶
type Config struct {
// contains filtered or unexported fields
}
Config for instantiating an InceptionV3 model. After the configuration is set, call Done, and it will build the InceptionV3 graph with the loaded variables.
See Build to construct a Config object and a usage example.
func BuildGraph ¶
BuildGraph for InceptionV3 model.
For a model with pre-trained weights, call Config.PreTrained.
It returns a Config object that can be further configured. Once the configuration is finished, call `Done` and it will return the embedding (or classification) of the given image.
See example in the package inceptionv3 documentation.
Parameters:
- ctx: context.Context where variables are created and loaded. Variables will be re-used if they were already created before in the current scope. That means one can call BuildGraph more than once, and have the same model be used for more than one input -- for instance, for 2-tower models. To instantiate more than one model with different weights, just use the context in a different scope.
- Image: image tensor (`*Node`) on which to apply the model. There must be 3 channels, and they must be scaled from -1.0 to 1.0 -- see PreprocessImage to scale image accordingly if needed. If using ClassificationTop(true), the images must be of size 299x299 (defined as a constant `ClassificationImageSize`). Otherwise the minimum image size is 75x75.
The original model has weights in `dtypes.Float32`. (TODO: If the image has a different `DType`, it will try to convert the weights and work the model fully on the image's `DType`. This hasn't been extensively tested, so no guarantees of quality.)
The implementation follows closely the definition in https://github.com/keras-team/keras/blob/v2.12.0/keras/applications/inception_v3.py
func (*Config) BatchNormScale ¶ added in v0.3.1
BatchNormScale sets whether to a scaling variable in BatchNorm. It defaults to false. If set to true, it is initialized with 1.0, so it has no impact if not fine-tuned.
The original model doesn't use it, but maybe handy if training from scratch.
func (*Config) ChannelsAxis ¶
func (cfg *Config) ChannelsAxis(channelsAxisConfig images.ChannelsAxisConfig) *Config
ChannelsAxis configures the axis for the channels (aka. "depth" or "features") dimension. The default is `images.ChannelsLast`, meaning the "channels" dimension comes last.
Note: `images` refers to package `github.com/gomlx/gomlx/types/tensor/image`.
It returns the modified Config object, so calls can be cascaded.
func (*Config) ClassificationTop ¶
ClassificationTop configures whether to use the very top classification layer at the top of the model.
Typically, if using only the embeddings, set this to false. If actually classifying Inception images, you can set this to true, and it will include a last linear layer, and it will return the logits layer for each of the Inception 1000 classes.
This is only useful if PreTrained weights are configured.
It returns the modified Config object, so calls can be cascaded.
func (*Config) Done ¶
func (cfg *Config) Done() (output *Node)
Done builds the graph based on the configuration set.
func (*Config) PreTrained ¶
PreTrained configures the graph to load the pre-trained weights. It takes as an argument `baseDir`, the directory where the weights have been downloaded with DownloadAndUnpackWeights -- use the same value used there.
The default is not to use the pre-trained weights, which will build an untrained InceptionV3 graph.
An empty value ("") indicates not to use any pre-trained weights (the default).
It returns the modified Config object, so calls can be cascaded.
func (*Config) SetPooling ¶
SetPooling configures whether to use a MaxPool at the very top of the model.
If set to NoPooling, the default, it returns a 4D tensor, with 2048 channels (see ChannelsAxis for order of axis). If set to MaxPooling or MeanPooling, it will pool the last spatial dimensions, either using Max or Mean.
This is only used if not using ClassificationTop.
It returns the modified Config object, so calls can be cascaded.
func (*Config) Trainable ¶
Trainable configures whether the variables created will be set as trainable or not -- see `context.Variable`.
If using pre-trained weights as frozen values, set this to false -- and considering using `StopGradient()` on the value returned by Done, to prevent any gradients from even propagating. It's an error to configure this to false if not using pre-trained weights (see PreTrained). The default is true, which allows for fine-tuning of the InceptionV3 model.
Notice that if `Trainable(false)`, it will also mark the batch normalization for inference only ¶
It returns the modified Config object, so calls can be cascaded.
func (*Config) WithAliases ¶ added in v0.17.0
WithAliases will create aliases to the output of each layer.
This facilitates capturing and manipulating those outputs for any purpose, for instance to do "style transferring" (https://arxiv.org/abs/1508.06576), where a losses are attached to various layers.
See more about graph nodes aliasing in Node.WithAlias, Graph.PushAliasScope, Graph.PopAliasScope and Graph.IterAliasedNodes.
Notice that if you call the model more than once -- on different inputs -- you will need to change the current scope with Graph.PushAliasScope before using the Inception model, so it doesn't create duplicate aliases.
type KidBuilder ¶ added in v0.4.0
type KidBuilder struct {
// contains filtered or unexported fields
}
KidBuilder builds the graph to calculate [Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) between two sets of images. See details in KidMetric.
func NewKidBuilder ¶ added in v0.4.0
func NewKidBuilder(dataDir string, kidImageSize int, maxImageValue float64, channelsConfig images.ChannelsAxisConfig) *KidBuilder
NewKidBuilder configures a KidBuilder.
KidBuilder builds the graph to calculate [Kernel Inception Distance (KID)](https://arxiv.org/abs/1801.01401) between `labels` and `predictions` batches of images. The metric is normalized by the `labels` images, so it's not symmetric.
See details in KidMetric.
- `dataDir`: directory where to download and unpack the InceptionV3 weights. They are reused from there in subsequent calls.
- `kidImageSize`: resize input images (labels and predictions) to `kidImageSize x kidImageSize` before running the Kid metric calculation. It should be between 75 and 299. Smaller values make the metric faster.
- `maxImageValue`: Maximum value the images can take at any channel -- If set to 0 it doesn't rescale the pixel values, and the images are expected to have values between -1.0 and 1.0. Passed to `PreprocessImage` function.
- `channelsConfig`: informs what is the channels axis, commonly set to `images.ChannelsLast`. Passed to `PreprocessImage` function.
Note: `images` refers to package `github.com/gomlx/gomlx/types/tensor/image`.
func (*KidBuilder) BuildGraph ¶ added in v0.4.0
func (builder *KidBuilder) BuildGraph(ctx *context.Context, labels, predictions []*Node) (output *Node)
BuildGraph returns the mean KID score of two batches, see KidMetric.
It returns a scalar with the mean distance of the images provided in labels and predictions. The images
type Pooling ¶
type Pooling int
Pooling to be used at the top of the model
func PoolingString ¶ added in v0.17.1
PoolingString retrieves an enum value from the enum constants string name. Throws an error if the param is not part of the enum.
func PoolingValues ¶ added in v0.17.1
func PoolingValues() []Pooling
PoolingValues returns all values of the enum
func (Pooling) IsAPooling ¶ added in v0.17.1
IsAPooling returns "true" if the value is listed in the enum definition. "false" otherwise