modware-import

module
v0.0.0-...-cfe9fea Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 13, 2025 License: BSD-2-Clause

README

modware-import

License
Build Last commit
Funding

CLI application suite for importing and managing dictyBase data, including feature annotations, gene data, and development tools.

Table of Contents

Overview

This repository contains a comprehensive suite of CLI tools for managing dictyBase data operations:

  • Feature Annotation CLI: Modern gRPC-based tools for loading, updating, and managing feature annotations from various data sources
  • Import CLI: Legacy data migration tools with S3 integration for importing dictyBase data during migration
  • Kubernetes Deployment CLI: Tools for deploying and running import commands in Kubernetes clusters
  • Mock gRPC Server: Development and testing tool that provides a complete implementation of the FeatureAnnotationService

Components

Feature Annotation CLI

A production CLI application for importing and managing dictyBase feature annotation data. The CLI provides six main commands for different data operations:

  • load-feature-annotation - Load feature annotations from ArangoDB to gRPC service
  • load-csv-to-arangodb - Update ArangoDB collection from CSV file with batch processing
  • gene-updater - Update gene annotations by stripping HTML and using gRPC
  • gene-product-updater - Update gene products from legacy database
  • load-gene-product-from-csv - Load gene products from CSV files
  • load-synonyms - Load synonyms from ArangoDB to gRPC service

For detailed usage instructions, command options, and examples, see the Feature Annotation CLI Reference.

Import CLI

A legacy command-line application for importing dictyBase data during migration. Supports importing various types of data including:

  • ArangoDB Management - Database schema and data management operations
  • Data File Processing - Handling and transformation of data files
  • Ontology Management - Import and management of ontological data
  • Stock Center Data - Loading of stock center information
  • UniProt ID Mapping - Protein identifier mapping operations

The tool supports S3 integration for file storage and can process CSV formatted data from various sources.

For complete command reference and usage examples, see the Importer Documentation.

Kubernetes Deployment CLI

A command-line tool for deploying and running import commands in Kubernetes clusters. Provides:

  • Cluster Deployment - Deploy import operations as Kubernetes jobs
  • S3 Integration - Configure S3 access for input files and log storage
  • Namespace Management - Target specific Kubernetes namespaces
  • Log Management - Centralized logging with S3 backup

For deployment guides and configuration details, see the Kubernetes Deployment.

Mock gRPC Server

A development and testing tool that implements a complete FeatureAnnotationService for integration testing of gRPC clients.

Features
  • Complete gRPC Implementation: All 8 FeatureAnnotationService methods (Create, Get, Update, Delete, AddTag, UpdateTag, RemoveTag, List operations)
  • Thread-Safe Storage: In-memory storage with indexes for efficient lookups
  • Mock Data: Pre-loaded with 5 realistic feature annotations (actA, myoB, pakA, rasG, discoidin1)
  • gRPC Reflection: Enabled for easy testing with grpcurl
  • Configurable: CLI options for port, logging, and TLS
Quick Start
# Build and run
cd cmd/mock-grpc-server
go build -o mock-grpc-server .
./mock-grpc-server --port 9000 --log-level info

# Test with grpcurl
grpcurl -plaintext localhost:9000 list
grpcurl -plaintext -d '{"id": "DDB_G0267398"}' \
  localhost:9000 dictybase.feature_annotation.FeatureAnnotationService/GetFeatureAnnotation
Testing

The server supports comprehensive testing scenarios:

  • Service Discovery: grpcurl -plaintext localhost:9000 list
  • Data Retrieval: Pre-loaded annotations for immediate testing
  • CRUD Operations: Full create, read, update, delete functionality
  • Validation Testing: Email formats, DOI patterns, required fields
Architecture

Thread-safe in-memory storage with indexed lookups, comprehensive validation, and graceful shutdown handling.

Project Status

Open Issues Open PRS

Directories

Path Synopsis
cmd
baserow command
import command
k8s command
loader command
logto command
internal
baserow/ontology
Package ontology provides functionality to interact with ontologies stored in a Baserow table.
Package ontology provides functionality to interact with ontologies stored in a Baserow table.
cli
datasource/csv/stockcenter
package stockcenter is the data source for stockcenter and related data
package stockcenter is the data source for stockcenter and related data
datasource/tsv/stockcenter
package stockcenter is the data source for stockcenter and related data
package stockcenter is the data source for stockcenter and related data
datasource/xls/phenotype
Package phenotype defines the structure and associated methods for handling phenotype annotations.
Package phenotype defines the structure and associated methods for handling phenotype annotations.
datasource/xls/strain
Package strain provides structures and functions to work with biological strain annotations.
Package strain provides structures and functions to work with biological strain annotations.
git

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL