db2viz
db2viz is a data pipeline project that demonstrates how to move data from an on-premises database (Postgres) to Google Cloud BigQuery for visualization in Looker Studio.

Project Structure
cmd/main.go: The entry point for the application.
config/config.go: Configuration loader.
internal/db/postgres_connector.go: Connects to the Postgres database.
internal/db/db.go: Contains logic for connecting to the database and loading data.
internal/data/loader.go: Loads data from Postgres.
internal/data/transformer.go: Transforms data before uploading.
internal/gcp/pubsub.go: Publishes data to Google Cloud Pub/Sub.
scripts/run_postgres_docker.sh: Script to run a Postgres container.
Getting Started
-
Run the Postgres Docker container
./scripts/run_postgres_docker.sh
-
Configure your config/config.yaml with the necessary details for Postgres and Google Cloud Pub/Sub.
postgres:
host: localhost
port: 5432
user: postgres
password: mysecretpassword
dbname: tfmv
sslmode: disable
tables:
- name: nation
schema: production
topic_id: nation
- name: regions
schema: production
topic_id: regions
concurrency: 2
pubsub:
project_id: tfmv-371720
credentials: /path/to/your/service-account.json
workers: 20
-
Build and run the Go application
docker build -t db2viz .
docker run --rm db2viz
Description
This project demonstrates a simple ETL (Extract, Transform, Load) pipeline:
- Extract: Data is extracted from a Postgres database.
- Transform: The data is transformed as necessary within the application.
- Load:
- The transformed data is published to a Google Cloud Pub/Sub topic.
- Pub/Sub subscriptions are used to directly populate the BigQuery tables as the data streams into Pub/Sub.
The data is ultimately loaded into Google BigQuery for visualization in Looker Studio.
Using Pub/Sub to BigQuery
To stream data from Pub/Sub to BigQuery, follow these steps:
- Ensure you have the necessary permissions and enable the required APIs for Pub/Sub and BigQuery in your Google Cloud project.
- Create Pub/Sub subscriptions that write directly to BigQuery:
- Define a subscription for each Pub/Sub topic associated with your tables.
- Configure the subscription to use a push endpoint that writes to BigQuery.
Visualizing Data in Looker Studio
Once the data is in BigQuery, you can use Looker Studio to visualize it:
- Go to Looker Studio (formerly Data Studio).
- Click on "Create" and select "Data Source".
- Choose "BigQuery" as the connector.
- Select your Google Cloud project, dataset, and table.
- Click "Connect".
- Create your report and add visualizations as needed.
Networking
To set up connectivity between your on-premises network and Google Cloud, you can use one of the following methods:
Cloud VPN
- Create a Cloud VPN Gateway: Establish a VPN gateway in GCP.
- Configure VPN Tunnels: Set up IPsec tunnels between your on-premises VPN gateway and the GCP VPN gateway.
- Use Cloud Router: Configure dynamic routing using BGP.
- Configure Firewall Rules: Allow traffic between on-premises and GCP.
Cloud Interconnect
For higher bandwidth and lower latency:
- Order Circuits: Arrange for dedicated or partner interconnect circuits.
- Create Interconnect Connection: Establish a physical connection to GCP.
- Configure VLAN Attachments: Set up VLANs for traffic routing.
- Use Cloud Router: Manage dynamic routing.
Networking Example
To set up an HA VPN:
- Create an HA VPN Gateway:
gcloud compute target-vpn-gateways create ha-vpn-gateway --region us-central1
