README
¶
Performance Benchmark
Toolset for measuring the performance of Orbs network
V1 Performance Optimization
Principles
-
Reproducible - the suite should be easily runnable by anyone and automatic as possible
-
Production oriented - use case under test should resemble a real production environment
-
Start simple - behavior of distributed systems is complex and nuanced, aim for the basics first
-
Pareto principle - focus efforts on the easiest 20% that bring 80% of results, ignore the long tail
-
Evidence based - focus on empirical data and numbers only, ignore gut feelings
KPIs
-
TPS - maximum number of simple token transfer transactions within a pool of 100K addresses
-
Confirmation time - time from reception of such transaction to reply of the receipt
-
Cost - infrastructure cost for node operation (mostly AWS machine price)
How to combine the multiple KPIs? Limit (2) and (3) to reasonable values (eg. 95% of transactions are committed under 5 seconds on a medium sized AWS machine) and measure the maximum value of (1)
General workflow
-
Measure a baseline (KPI value on a set git commit and set configuration profile)
-
Extract profiling information during test (from node metrics + golang pprof in production)
-
Analyze profile samples and identify top bottlenecks
-
Propose an optimization (in code or configuration) to improve one of the top bottlenecks
-
Measure the proposal (KPI value on same exact scenario but incorporating proposed change)
-
Accept proposed change if KPI improved
-
Rinse and repeat
Scenarios
Basic
-
Setup a new virtual chain
- No history (eg. no block persistence)
- No impact from other virtual chains (eg. prefer not to share a dispatcher)
- Number of nodes identical to the production scenario
- Nodes reside in 4-6 popular AWS regions (EU, US, around 100 ms ping between them)
- AWS machine type is predetermined
- Code base for production (eg. without Info logs)
-
Simulate client traffic
- Using one of the official client SDKs
- Generate a significant number of
BenchmarkToken.transfertransactions - All transactions are sent by the contract deployer (owner of all supply) where 1 token is transferred to a random address (from a pool of 100K addresses)
- Transactions are sent evenly to all gateways (all nodes)
- Transactions should be sent from multiple machines if the processing rate is faster than the send rate
-
Measure main KPIs periodically during the scenario
- TPS
- Rely on the metrics system TPS measure
- Double check with the metrics system measures of total committed transactions (over time)
- Confirmation time
- Rely on the metrics system confirmation time measure
- Double check with client view of this time
- Cost
- Predetermined
- TPS
-
Extract profiling information periodically during the scenario
- Separately from each node (it's enough to sample the nodes)
- All basic profiling types of Golang (cpu, heap, goroutines, locks, etc.)
- Core bottleneck metrics from the machines (cpu usage, network usage, etc.)
-
Stop the scenario once we have enough stable measurements
User Guide
Running performance test
- Nodes setup
config/nodes-config.json
The Stability network lets us test a long running network.
Each node exposes JSON metrics with the /metrics endpoint, and prints out metrics and errors to logs.
It can optionally print out all logs for the purpose of debugging.
Metrics collection and processing
Logging
- Enable logs: curl -XPOST http:///vchains/<vchain_id>/debug/logs/filter-off
- Disable logs: curl -XPOST http:///vchains/<vchain_id>/debug/logs/filter-off Logs should not be enabled for long, as they are very verbose.
Logs are sent to logz.io
Logz.io configuration
TBD how to send there logs
IP Addresses of the nodes (topology)
The IP addresses for the nodes is stored in the file /opt/galileo/testnet-configuration/benchmark/ips.json on the client machine.
These IPs must match the actual AWS IPs where the nodes are installed.
Tampering with IP addresses
For testing purposes, such as preventing a specific node from communicating with other nodes, you can do the following:
ssh ec2-user@34.216.213.19sudo sucd /opt/galileo/testnet-configuration/benchmark- Make a backup of the file
ips.json - Modify the file
ips.json - Redeploy the app from Slackbot:
deploy <commit> <vchain>- this will recreate each node's config file based onips.json, then redeploy and restart the nodes, applying the new IP configuration - When the test is done and you wish to re-enable all nodes, restore the
ips.jsonfile and redeploy.
Updating with new build
- Slackbot: ... not yet
go to performance_benchmark project cd galileo export API_ENDPOINT=http://18.219.170.177/vchains/2000/api/v1/ export BASE_URL=http://18.219.170.177/vchains/2000 export STRESS_TEST_NUMBER_OF_TRANSACTIONS=100 export VCHAIN=2000 ./extract.sh
Extracting measurements from live network
- Goroutine stack traces
- Performance metrics