#+TITLE: Sftppush Event Pipeline
#+SETUPFILE: ~/s3olmax/org/conf/setup.config
#+FILETAGS: :sftpush:golang:go-channels:cobra:viper:
[[https://github.com/olmax99/sftppush/workflows/Go/badge.svg][https://github.com/olmax99/sftppush/workflows/Go/badge.svg]]
The *sftppush* is a mini pipeline for =file write-close event > decompress > s3 archive=.
Initially, it was intended to replace i.e. low-compute serverless functions that
would simply push files from the Sftp server into an S3 Bucket location. Instead
of mounting an Sftp server's file system directly onto S3 FUSE this solution
seems to be more fit for production use cases.
* Prerequisites
- AWS cli ([[https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html][https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html]])
- Go 1.15.x
* Design
#+CAPTION: solution concept
#+attr_html: :width 100%
[[file:images/sftppush_solution_concept.png][file:./images/sftppush_solution_concept.png]]
#+CAPTION: concurrency
#+attr_html: :width 100%
[[file:images/sftppush_concurrency_design.png][file:./images/sftppush_concurrency_design.png]]
Most likely you want to run this project inside an Sftp server, which receives a
constant stream of data files.
The *sftppush* project is intended to run in a Linux (Ubuntu/Debian) VM. It
captures WRITE_CLOSE events for files on the file system based on a single or
multiple source directories.
The =watch --source= flag can read a single directory as well as a configuration
file containing multiple directories. In case of multiple directory targets
there will be a separate =go watch process= spawned for each target directory, respectively.
* Quick Launch
** 1. Install the sftppush tool locally
_NOTE:_ ONLY TESTED ON UBUNTU AND DEBIAN (this project relies on the UNIX
CLOSE_WRITE event)
*** Ubuntu/Debian
#+begin_src bash
$ git clone https://github.com/olmax99/sftppush.git
$ cd sftppush
$ make build
$ ./bin/sftppush-0.1.0-linux_amd64 help
#+end_src
This will create a new binary in =./bin/sftppush-0.2.2-linux_amd64=.
** 2. Create a configuration file
_Recommended_: Create =config.yaml= in project root and set flag =--config= or =-c=.
#+BEGIN_EXAMPLE
All source directories for fsnotify are determined by:
<defaults.userpath> + <watch.source.name> + <watch.source.paths>
#+END_EXAMPLE
=./config.yaml=
#+BEGIN_SRC yaml
defaults:
userpath: # Set by default, can be overwritten here or with environment variable
s3target: olmax-test-sftppush-126912
awsprofile: ***
awsregion: ***
# log:
# level: info
# location: "syslog" || <abs/path/to/logfile>
# format: json
watch:
source:
- name: sftpuser1
paths:
- /path/to/source/directory1
- /path/to/source/directory2
# - name: sftpuser2
# paths:
# - path/to/source/directory1
# s3target: olmax-test-sftppush-126912
#+END_SRC
By default (without =log:=) =Sftppush= will try to use =~/.sftppush/sftppush.log=.
- If the directory does not exist, it will use =Stderr=
- Optionally =syslog= can be used but requires =rsyslog= to be active.
- Log level is at =debug= by default, which is producing overhead.
** 3. Run the event watcher on a single local directory
If a config files is created there is no need to set the =--source= flags. Flags
will overwrite config file values.
Running it should be as simple as:
#+BEGIN_SRC bash
$ ./bin/sftppush-0.2.0-linux_amd64 --config config.yaml watch
# EXAMPLE 1: Run without config with two sources
$ SFTPPUSH_DEFAULTS_S3TARGET=*** SFTPPUSH_DEFAULTS_AWSPROFILE=*** \
./bin/sftppush-0.2.0-linux_amd64 watch \
--source="name=sftpuser1,paths=/device1/data /device2/data" \
--source="name=sftpuser2,paths=/device1/data /device2/data"
# EXAMPLE 2: Run with a custom User directory - needs trailing '/'
$ SFTPPUSH_DEFAULTS_USERPATH="/home/my_test_dir/" ./bin/sftppush-0.2.0-linux_amd64 -c config.yaml
#+END_SRC
* Testing
Some tests require the OS file system. You can choose to run the tests inside a
Docker container.
#+BEGIN_SRC
$ [DOCKER=1] make test
#+END_SRC
* General Instructions
** 1. Missing WRITE_CLOSE event
- WRITE_CLOSE events are not cross-platform, and currently only readily
accessible on Linux file systems. This restriction made the [[https://github.com/fsnotify/fsnotify][fsnotify]] creators
to keep the [[https://github.com/fsnotify/fsnotify/pull/313/commits/f8197a386f88a88067d4cea863694c73ab561c55][respective PR]] in pending state.
- Note that this =sftppush= project is relying heavily on this feature.
- One way of doing it is to fork the original [[https://github.com/fsnotify/fsnotify][fsnotify]], and accepting the
changes made on the [[https://github.com/fsnotify/fsnotify/pull/313][respective PR]].
* Useful Commands
#+BEGIN_SRC
$ aws s3api --profile *** list-objects --bucket *** --query 'Contents[?contains(Key,``)].{Key: Key, Size: Size}' --output table | wc -l
# install golangci-lint
curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | sh -s -- -b $(go env GOPATH)/bin v1.31.0
#+END_SRC
------------------------ Logger Setup------------------------ Default: - use output file with location ~/.sftppush/sftppush.log - log level: debug - format text
------------------------ Logger Setup------------------------ Default: - use output file with location ~/.sftppush/sftppush.log - log level: debug - format text