README
¶
GCS Coherency Validation Tool - User Manual
This tool facilitates comprehensive validation of gcsfuse consistency,
coherency, and caching behaviors. It supports various testing workflows,
including single-node testing and a distributed dual-node testing workflow.
Table of Contents
- Architecture & Buckets
- Quick Start: Infrastructure Provisioning
- Prerequisites & Setup
- Workflows Overview
- Getting Started
- Workflow 1: single_node_single_mount
- Workflow 2: single_node_dual_mounts
- Workflow 3: dual_node_mounts
- Scenario Management & Aliases
- File System Operations Reference
- Go Tools Reference
- Asynchronous & Interactive Operations
- Configuration & Environment Control
- Logging & Debugging
- Troubleshooting
Architecture & Buckets
This tool requires Two Distinct GCS Buckets:
1. Shared Code & State Bucket (SHARED_BUCKET)
- Purpose: Stores the tool's source code, shared configuration files, and the logs for dual-node tests.
- Mount Point: Must be mounted at
$HOME/work/sharedon all participating VMs. - Permissions: All test VMs need Read/Write access.
- Example Name:
<user>-coherency-work-shared-bucket-<region>(e.g.,gargnitin-coherency-work-shared-bucket-asiase1).
2. Test Target Bucket (TEST_BUCKET)
- Purpose: The actual bucket being tested for consistency. The tool will automatically mount and unmount this bucket during tests.
- Mount Point: The tool creates mount points at
$HOME/work/test_buckets/<bucket>-mountX. - Configuration: You must update the tool's config files with this bucket's name.
- Example Name:
<user>-test-hns-<region>(e.g.,gargnitin-test-hns-asiase1).
Quick Start: Infrastructure Provisioning
If you do not have buckets or VMs yet, use these commands (requires gcloud).
0. Variables Setup (Run on ALL terminals)
Define these variables once to make the following commands copy-pasteable.
Important: These variables are session-specific. It is highly recommended to
store them in a setup script (e.g., ~/.bashrc or a custom setup.sh) and
source it for persistence across VM disconnections or new terminal/SSH
sessions.
# Common Configuration
export REGION="us-west4" # Example: us-west4, asia-southeast1
export ZONE_1="${REGION}-a"
export ZONE_2="${REGION}-b"
export PROJECT_ID=$(gcloud config get-value project)
export USER_PREFIX="yourname" # Your username or unique prefix
# Bucket Names
export SHARED_BUCKET="${USER_PREFIX}-coherency-shared-${REGION}"
export TEST_BUCKET="${USER_PREFIX}-test-hns-${REGION}"
# VM Names
export VM1_NAME="${USER_PREFIX}-vm1-leader-${REGION}"
export VM2_NAME="${USER_PREFIX}-vm2-follower-${REGION}"
1. Create Buckets
Buckets are created with Hierarchical Namespace and Uniform Bucket-Level Access enabled.
# Shared Bucket (Infrastructure)
gcloud storage buckets create gs://${SHARED_BUCKET} \
--location=${REGION} \
--enable-hierarchical-namespace \
--uniform-bucket-level-access
# Test Target Bucket (The one under test)
gcloud storage buckets create gs://${TEST_BUCKET} \
--location=${REGION} \
--enable-hierarchical-namespace \
--uniform-bucket-level-access
2. Create VMs
Create two VMs (Leader/VM1 and Follower/VM2). Specs: Ubuntu 25.04, 40GB Boot Disk, Access: Full Cloud Platform scope (required for GCS Fuse and management).
# VM1 (Leader)
gcloud compute instances create ${VM1_NAME} \
--zone=${ZONE_1} \
--machine-type=e2-standard-8 \
--image-family=ubuntu-2504-amd64 --image-project=ubuntu-os-cloud \
--boot-disk-size=40GB \
--scopes=https://www.googleapis.com/auth/cloud-platform
# VM2 (Follower)
gcloud compute instances create ${VM2_NAME} \
--zone=${ZONE_2} \
--machine-type=e2-standard-8 \
--image-family=ubuntu-2504-amd64 --image-project=ubuntu-os-cloud \
--boot-disk-size=40GB \
--scopes=https://www.googleapis.com/auth/cloud-platform
Prerequisites & Setup
1. VM Provisioning
- Single Node Workflows: Require 1 VM.
- Dual Node Workflow: Requires 2 VMs (referred to as VM1/Leader and VM2/Follower).
- OS: Linux (Ubuntu/Debian recommended).
2. Software Installation (On All VMs)
You must install Go, GCS Fuse, and Python on all VMs involved in the testing.
a. Install Go (Version 1.24.10)
The tool uses Go for direct I/O operations (write.go, read.go).
a. Go installation (1.24.10)
# Remove any existing go installation
sudo rm -rf /usr/local/go
# Download and install go (Adjust OS/Arch if not linux-amd64)
wget https://go.dev/dl/go1.24.10.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.24.10.linux-amd64.tar.gz
# Add go binary to PATH (Add this to your ~/.bashrc for persistence)
echo "export PATH=\$PATH:/usr/local/go/bin" >> ~/.bashrc
source ~/.bashrc
# Verify
go version
b. Install GCS Fuse (Latest)
Follow the official GCS Fuse Installation Guide.
Option 1: Standard Installation (Ubuntu 24.04 and older)
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb https://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y gcsfuse
Option 2: Modern/Ubuntu 25.04+ (Keyring Method)
Use this if you encounter "NO_PUBKEY" errors or are running Ubuntu 25.04+.
# 1. Add the public key to the system keyring (Dearmor ensures binary format)
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/gcsfuse-keyring.gpg
# 2. Add the repo (forcing 'noble' codename for stability on newer releases)
echo "deb [signed-by=/usr/share/keyrings/gcsfuse-keyring.gpg] https://packages.cloud.google.com/apt gcsfuse-noble main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
# 3. Update and install
sudo apt-get update
sudo apt-get install -y gcsfuse
c. Install Python System Dependencies The tool requires Python 3. Install the system-level dependencies first.
# 1. Install System Python Dependencies
sudo apt-get update
sudo apt-get install -y python3 python3-pip python3-venv git
3. Setup the Shared Bucket (One-time Setup)
You need to populate your ${SHARED_BUCKET} with the tool code.
On one VM (e.g., VM1):
-
Mount the empty shared bucket:
mkdir -p $HOME/work/shared # Safely unmount if already mounted (Idempotent) (fusermount -uz $HOME/work/shared || true) # Mount the shared bucket gcsfuse --implicit-dirs ${SHARED_BUCKET} $HOME/work/shared -
Download and install the tool into the bucket: Run the following block to clone/update the repo and copy the validation tools into the mounted bucket.
cd /tmp if [ -d "gcsfuse-tools" ]; then echo "Updating existing repo..." cd gcsfuse-tools git pull origin main cd .. else echo "Cloning repo..." git clone https://github.com/GoogleCloudPlatform/gcsfuse-tools.git fi # Copy the python tools to the shared mount if [ ! -d "$HOME/work/shared/coherency-validation/python" ]; then mkdir -p $HOME/work/shared/coherency-validation/python fi # Copy contents (idempotent update) cp -rf gcsfuse-tools/coherency-validation/python/* $HOME/work/shared/coherency-validation/python/ echo "Tool code deployed/updated in shared bucket."
On the other VM (VM2): Simply mount the bucket to access the code deployed by VM1.
mkdir -p $HOME/work/shared
(fusermount -uz $HOME/work/shared || true)
gcsfuse --implicit-dirs ${SHARED_BUCKET} $HOME/work/shared
4. Setup Python Virtual Environment (On All VMs)
Now that the code is present in the shared mount, set up the virtual environment.
# 1. Navigate to the tool directory in the shared mount
cd $HOME/work/shared/coherency-validation/python
# 2. Setup the Virtual Environment
bash setup_venv.sh
# 3. Activate the Environment (Required before running tests)
source ~/.cache/coherency-validation/.venv/bin/activate
5. Configure Hostnames (For Dual Node)
If running the dual_node_mounts workflow, the tool needs to know which VM is
"Mount 1" and which is "Mount 2".
Run on VM1 (Leader) ONLY:
# Get the internal hostnames (what 'socket.gethostname()' sees)
# We assume VM1 is running this command.
VM1_HOSTNAME=$(hostname)
# You must manually set VM2's hostname if you are not running this via gcloud ssh
# or just look it up. For automation, if you know the naming convention:
# VM2_HOSTNAME="${VM2_NAME}.${ZONE_2}.c.${PROJECT_ID}.internal"
# Ideally, verify by running `hostname` on VM2.
echo "Configuring VM1: $VM1_HOSTNAME"
echo "Configuring VM2: $VM2_NAME (You might need the full FQDN)"
# Update VM1 (Leader) - Regex matches any existing hostname string
sed -i "s/if \".*\" in HOSTNAME:/if \"${VM1_NAME}\" in HOSTNAME:/" $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py
# Verify
grep -q "if \"${VM1_NAME}\" in HOSTNAME:" $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py || echo "Error: Failed to configure VM1 hostname. Please manually configure ${VM1_NAME} in $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py for MOUNT_NUMBER 1"
# Update VM2 (Follower)
sed -i "s/elif \".*\" in HOSTNAME:/elif \"${VM2_NAME}\" in HOSTNAME:/" $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py
# Verify
grep -q "elif \"${VM2_NAME}\" in HOSTNAME:" $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py || echo "Error: Failed to configure VM2 hostname. Please manually configure ${VM2_NAME} in $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py for MOUNT_NUMBER 2"
6. Create Workspace Directories
Create the following directories on all VMs (local state):
mkdir -p $HOME/work/tasks
mkdir -p $HOME/work/test_buckets
7. Configure the Test Bucket Name
You must tell the tool which bucket to use for the actual testing.
Run on VM1 (Leader) ONLY:
# Update Dual Node Config
sed -i "s/BUCKET_NAME *= *\".*\"/BUCKET_NAME = \"${TEST_BUCKET}\"/" $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py
grep -q "BUCKET_NAME *= *\"${TEST_BUCKET}\"" $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py || echo "Error: Failed to set bucket in dual_node_mounts. Please manually configure ${TEST_BUCKET} as BUCKET_NAME in $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.py."
# Update Single Node Single Mount Config
sed -i "s/BUCKET_NAME *= *\".*\"/BUCKET_NAME = \"${TEST_BUCKET}\"/" $HOME/work/shared/coherency-validation/python/single_node_single_mount/config.py
grep -q "BUCKET_NAME *= *\"${TEST_BUCKET}\"" $HOME/work/shared/coherency-validation/python/single_node_single_mount/config.py || echo "Error: Failed to set bucket in single_node_single_mount. Please manually configure ${TEST_BUCKET} as BUCKET_NAME in $HOME/work/shared/coherency-validation/python/single_node_single_mount/config.py."
# Update Single Node Dual Mounts Config
sed -i "s/BUCKET_NAME *= *\".*\"/BUCKET_NAME = \"${TEST_BUCKET}\"/" $HOME/work/shared/coherency-validation/python/single_node_dual_mounts/config.py
grep -q "BUCKET_NAME *= *\"${TEST_BUCKET}\"" $HOME/work/shared/coherency-validation/python/single_node_dual_mounts/config.py || echo "Error: Failed to set bucket in single_node_dual_mounts. Please manually configure ${TEST_BUCKET} as BUCKET_NAME in $HOME/work/shared/coherency-validation/python/single_node_dual_mounts/config.py."
Workflows Overview
| Workflow | Description | Use Case | Log Location |
|---|---|---|---|
| dual_node_mounts | 2 VMs, 1 Mount each. | Distributed coherency (does VM 2 see VM 1's changes?). | ~/work/shared/... (Preserved in Shared Bucket) |
| single_node_dual_mounts | 1 VM, 2 Mount points. | Local coherency (does Mount 2 see Mount 1's changes?). | ~/work/tasks/... |
| single_node_single_mount | 1 VM, 1 Mount point. | Basic sanity checks, concurrent reads/writes on same mount. | ~/work/tasks/... |
Getting Started
-
Navigate to the tool directory (in the shared mount):
cd $HOME/work/shared/coherency-validation/python -
Source the aliases:
source workflow_aliases.sh -
Select your workflow:
set_workflow # Select from the menu: # 1. dual_node_mounts (Distributed) # 2. single_node_dual_mounts (Local) # 3. single_node_single_mount (Local)- Important: You must run this on ALL participating VMs.
- Consistency: Ensure you select the SAME workflow ID (e.g., '1') on all VMs so they share the correct configuration and aliases.
- This loads the environment variables and aliases specific to that workflow.
Workflow 1: dual_node_mounts
Execution: Coordinated testing between VM1 (Leader) and VM2 (Follower). Logs are stored in the Shared Bucket, so they are automatically preserved even if VMs are deleted.
Important Logic: * Shared Log: Both VMs write to the same log file in
the shared bucket. * Safety Sleeps: To ensure metadata propagation across
the shared bucket (which relies on gcsfuse), the tool enforces a configurable
sleep (default 15s) after writing to shared state files.
Steps:
-
On VM1 (Leader):
execute_scenario <ID> # Example: execute_scenario 13- Initializes the test config in the shared bucket.
- Resets VM1's mount of the Test Bucket.
- Prints instructions.
-
On VM2 (Follower):
execute_scenario # No ID required.- Detects the active scenario from the shared config.
- Resets VM2's mount of the Test Bucket.
- Prints instructions.
-
Completion:
- Run
complete_scenario(on either VM) to clean up.
- Run
Workflow 2: single_node_dual_mounts
Execution: Mounts the Test Bucket twice locally (Mount 1 & Mount 2). Logs are stored locally.
- List Scenarios:
execute_scenario --list - Run (Step Mode):
execute_scenario_stepmode <ID>(Follow printed instructions). - Run (Auto Mode):
execute_scenario_complete <ID>
Workflow 3: single_node_single_mount
Execution: Automated tests on a single mount point. Logs are stored locally
in ~/work/tasks.
- List Scenarios:
execute_scenario --list - Run (Step Mode):
execute_scenario_stepmode <ID> - Run (Auto Mode):
execute_scenario_complete <ID>
Supported Scenarios: * Basic CRUD, Symlinks, Sync/Flush testing. * Concurrency: Reading/Writing large files from multiple threads (e.g., Scenario 25, 26).
Scenario Management & Aliases
These high-level aliases manage the lifecycle of a test scenario.
execute_scenario [ID]: Starts a scenario (in step mode) or joins an existing one. Resets mounts and prepares the environment.complete_scenario/mark_scenario_completed: Marks the current scenario as successfully finished. Cleans up temporary state files and finalizes the log. Must be run at the end of every scenario.abort_scenario/abort_current_scenario: Forcibly stops the current scenario without marking it as success. Useful if a test hangs or you want to restart.fail_scenario: Explicitly marks the scenario as FAILED in the log and cleans up.
File System Operations Reference
These aliases run the actual file system tests. They often have assertions
built-in (e.g., readfileandfail expects the read to fail).
IMPORTANT: These operations should be run only after starting a scenario
(execute_scenario) and from inside the mounted directory. The tool
typically switches your working directory to the mount automatically, but you
should verify you are in a path like .../test_buckets/<bucket>-mountX before
running them.
Basic File Operations
createfile: Createssample.txtwith default content.createfilewith2ndcontent: Createssample.txtwith "sample_content2".create2ndfile: Createssample2.txt.readfile: Readssample.txtand prints content.readfilehasoriginalcontent: Readssample.txtand asserts it contains "sample_content".readfilehasupdatedcontent: Readssample.txtand asserts it contains "sample_content2".updatefile: Overwritessample.txtwith new content.deletefile: Deletessample.txt.listfile: Checks ifsample.txtexists (vials).renamefile: Renamessample.txttosample2.txt.
Negative Testing (Expect Failure)
readfileandfail: Tries to readsample.txt, succeeds if the read FAILS.listfileandfail: Tries to listsample.txt, succeeds if it does NOT exist.read2ndfileandfail: Tries to readsample2.txt, succeeds if it FAILS.
Directory Operations
createdir: Createssample_dir.listdir: Checks ifsample_direxists.deletedir: Deletessample_dir.renamedir: Renamessample_dirtosample_dir2.listdirandfail: Checks ifsample_dirdoes NOT exist.
Symlink Operations
createsymlink: Createssample.lnkpointing tosample.txt.listsymlink: Checks if the symlink exists.readfromsymlink: Reads the target content via the symlink.deletesymlink: Deletes the symlink.listsymlinkandfail: Checks if symlink is gone.
Advanced I/O (Go-based)
writedirectfile: Writes usingO_DIRECT.readdirectfile: Reads usingO_DIRECT.writefilewithoutsync: Writes without callingfsync().writefilewithoutflush: Writes without callingclose()(or flush), holding the handle open.writebigfile: Writes a large (2GB) file.writebigfileconcurrently: Spawns multiple threads to write to the same large file simultaneously (stress test).
Go Tools Reference
The python framework relies on compiled Go programs for operations that require precise control over system calls (Direct I/O, Flush control, Threading) which are difficult to achieve in pure Python.
write.go
A robust file writing utility with low-level flags.
- Usage:
go run write.go [flags] <filepath> - Flags:
--content <str>: String content to write.--size <str>: File size to generate (e.g., "1G", "10M"). Overrides content.--direct: UsesO_DIRECT(bypasses kernel page cache). Writes are aligned to 4096 bytes.--no-sync: Skipsfile.Sync()(fsync).--no-flush: Skipsfile.Close(). Blocks execution until interrupted (Ctrl+C). Used to simulate open handles.--duplicate-writes <N>: SpawnsNconcurrent threads writing the same content to the same file. Used to test race conditions.
read.go
A simple file reader that supports Direct I/O.
- Usage:
go run read.go [flags] <filepath> - Flags:
--direct: UsesO_DIRECT.
read_concurrently.go
High-conformance threaded reader for stress testing.
- Usage:
go run read_concurrently.go [flags] <filepath> - Flags:
--size <str>: Expected file size (verifies file is not truncated).--threads <N>: Number of concurrent read threads.--verify: Verifies content matches the deterministic pattern generated bywrite.go.--direct: UsesO_DIRECT.
Asynchronous & Interactive Operations
Asynchronous Operations
Some operations, particularly those involving large file writes in stress tests, run in the background.
writebigfileasync: Starts writing a large file (2GB) in the background.writedirectbigfileasync: Starts writing a large file withO_DIRECTin the background.waitforbackgroundjobs: Blocks until all currently running background jobs (started by the above commands) have finished.
Usage Pattern:
# Start simultaneous writes (e.g., in a dual-node scenario)
mount1
writebigfileasync & # Start job 1
mount2
writebigfileasync & # Start job 2
waitforbackgroundjobs # Wait for both to finish
Interactive / Blocking Operations
These operations intentionally block execution to simulate specific file handle states (e.g., holding a file open without flushing). You must manually interrupt them.
writefilewithoutflush: Writes data but does NOT close the file descriptor. It hangs indefinitely to keep the handle open.writedirectfilewithoutflush: Same as above, but withO_DIRECT.writefilewithoutsyncorflush: Same as above, but also skipsfsync().
Usage Pattern:
- Run the command:
writefilewithoutflush - The terminal will show:
>> Waiting for interrupt signal (Ctrl+C) to exit... - Perform your check (e.g., verify file visibility from another mount).
- Press Ctrl+C to interrupt the process, forcing it to close the handle and (usually) flush data.
Configuration & Environment Control
You can inspect and modify the environment using these aliases:
Runtime Settings
set_sleep_seconds <N>: Sets the duration (in seconds) the tool waits after writing to a shared file (e.g., the shared log or config). Default is 15s. Increase this if you see "No scenario running" errors due to slow GCS Fuse metadata propagation.set_sleep_seconds 30enable_logging: Enables logging of command output to the log file.disable_logging: Disables ALL logging to the file. No commands, headers, or output will be written to the shared log. Useful for extreme latency sensitivity testing where even log I/O is undesirable.
Status & Inspection
current_config: Prints the content of the global workflow configuration (JSON).current_logfile: Prints the absolute path to the log file currently being used.current_scenario: Prints the name of the currently active scenario.current_mount: Prints whether the current shell is configured as Mount 1 or Mount 2 (based on hostname).
Logging & Debugging
Log Persistence:
- Dual Node: Logs are safe in the
SHARED_BUCKET. - Single Node: Logs are in
~/work/tasks(Local SSD). You must manually copy these to the shared bucket if you wish to preserve them after deleting the VM.
cp -r ~/work/tasks/coherency-validation/python/single_node_single_mount/exec_log_*.log ~/work/shared/saved_logs/
Manual Logging:
log_custom "Observation: File appeared after 3 seconds."
Troubleshooting
- "No scenario currently running":
- For
dual_node, ensure VM1 started the scenario first. - Check if
$HOME/work/sharedis mounted correctly on both VMs.
- For
- Git Errors / "Unable to read tree":
- Avoid running git commands inside the
~/work/sharedmount if it's a simple copy. Perform git operations in/tmpand copy files over.
- Avoid running git commands inside the
- Indentation/Syntax Errors:
- Check
execute_scenarios.py.
- Check
- "Invalid mount number (0)":
- Cause: The tool could not map your VM's hostname to a Mount Number (1 or 2).
- Fix: Ensure you ran the
sedcommands in Step 5 correctly. Verify by running:
The output must match your actual VM hostnames (check withgrep "in HOSTNAME" $HOME/work/shared/coherency-validation/python/dual_node_mounts/config.pyhostnamecommand).
Documentation
¶
There is no documentation for this package.