lens

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 11, 2018 License: Apache-2.0 Imports: 18 Imported by: 1

README

Lens

GoDoc codecov Build Status Go Report Card

Lens is an opt-in search engine and data collection tool to aid content discovery of the distributed web. Initially integrated with TEMPORAL, Lens will allow users to optionally have the data they upload be searched and indexed and be awarded with RTC for participating in the data collection process. Users can then search for "keywords" of content, such as "document" or "api". Lens will then use this keyword to retrieve all content which matched.

Searching through Lens will be facilitated through the TEMPORAL web interface. Optionally, we will have a service independent from TEMPORAL which users can submit content to have it be indexed. This however, is not compensated with RTC. In order to receive the RTC, you must participate through Lens indexing within the TEMPORAL web interface.

Supported Formats

Only IPFS CIDs are supported, and they must be plaintext files. We attempt to determine the content type via mime type sniffing, and use that to determine whether or not we can analyze the content.

Please see the following table for supported content types that we can index. Note if the type is listed as <type>/* it means that any "sub type" of that mime type is supported.

Mime Type Support Level Tested Types
text/* Alpha text/plain, text/html
image/* Alpha image/jpeg
application/pdf Alpha application/pdf

Processing

We support two types of processing, index and search requests

Indexing
  1. When receiving an index request, we check to make sure the object to be indexed is a supported data type.
  2. We then attempt to determine the mime type of whatever object is being indexed, and validate it to make sure its a supported format.
  3. We then extract consumable data from the object through an xtractor service.
  4. After extracting usable data, we then send it to an analyzer service which is responsible for analyzer content to create meta-data
  5. After the meta-data is generated, we then pass it onto the core of the lens service
  6. The lens service is responsible for creating lens objects, which are valid IPLD objects, and storing them within IPFS, and within a local badgerds instance

The following objects are created during an indexing request:

Keyword Object:

  • A keyword object contains all of the Lens Identifiers for content that can be searched for with this keyword

Object:

  • An object is content that was indexed, and includes a Lens Identifier for this content within the lens system (note, this is simply to enable easy lookup and is not valid outside of Lens)
  • Also includes are all the keywords that can be used to search for this particular content

For image indexing, we currently run the images against pre-trained InceptionV5 tensorflow models. In the future we will more than likely migrate to models we train ourselves, leveraging our extensive GPU computing infrastructure.

Searching

  1. When receiving a search request, we are simply provided with a list of keywords to search through.
  2. Using these keywords, we then search through badgerds to see if these keywords have been seen before. If they have, we then pull a list of all lens identifiers that can be matched by this keyword.
  3. After repeating step 2 for all keywords, we then search through badgerds to find the objects that the lens identifiers refer to
  4. The user is then sent a list of all object names (ie, ipfs content hashes) for which.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type APIOpts

type APIOpts struct {
	IP   string
	Port string
}

APIOpts defines options for the lens API

type ConfigOpts

type ConfigOpts struct {
	UseChainAlgorithm   bool
	DataStorePath       string
	ModelsPath          string
	TesseractConfigPath string
	API                 APIOpts
}

ConfigOpts are options used to configure the lens service

type Object

type Object struct {
	ContentHash string    `json:"lens_object_content_hash"`
	LensID      uuid.UUID `json:"lens_id"`
}

Object is the response from a successfuly lens indexing operation

type Service

type Service struct {
	// contains filtered or unexported fields
}

Service contains the various components of Lens

func NewService

NewService is used to generate our Lens service

func (*Service) Get

func (s *Service) Get(keyname string) ([]byte, error)

Get is used to search for an object identifier by key name

func (*Service) KeywordSearch

func (s *Service) KeywordSearch(keywords []string) ([]models.Object, error)

KeywordSearch is used to search by keyword

func (*Service) Magnify

func (s *Service) Magnify(hash string, reindex bool) (metadata *models.MetaData, err error)

Magnify is used to examine a given content hash, determine if it's parsable and returned the summarized meta-data. Returned parameters are in the format of: content type, meta-data, error

func (*Service) Store

func (s *Service) Store(name string, meta *models.MetaData) (*Object, error)

Store is used to store our collected meta data in a formatted object

func (*Service) Update

func (s *Service) Update(id uuid.UUID, name string, meta *models.MetaData) (*Object, error)

Update is used to update an object

Directories

Path Synopsis
analyzer
ocr
cmd
temporal-lens command
Code generated by counterfeiter.
Code generated by counterfeiter.
Package text handles analyzing textual data for the purpose of meta data extraction
Package text handles analyzing textual data for the purpose of meta data extraction
Package utils provides utility tools for lens
Package utils provides utility tools for lens
xtractor
planetary
Package planetary handles extraction of consumable data from IPLD objects
Package planetary handles extraction of consumable data from IPLD objects

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL