GCS2Postgres
Overview
GCS2Postgres is a Go-based solution designed to facilitate the loading of various open data formats stored in Google Cloud Storage (GCS) and BigQuery into a PostgreSQL database. This solution leverages Google BigQuery for data extraction and transformation, providing a seamless and scalable data pipeline.

Features
- Supports multiple file formats: CSV, JSON, Parquet, Avro, Iceberg.
- Utilizes Google BigQuery to create external tables for data processing.
- Implements connection pooling with
pgx/v5 for efficient database operations.
- Configurable via YAML file for flexibility.
- Retrieves PostgreSQL credentials securely from Google Secret Manager.
- Concurrent data processing to maximize performance.
Configuration
The configuration is managed through a config.yaml file. Below is an example configuration:
postgres:
host: "localhost"
port: 5432
user: "postgres"
dbname: "tfmv"
sslmode: "disable"
secret_name: "projects/your_project_number/secrets/your_secret_name/versions/latest"
gcs:
bucket_name: "your_bucket_name"
project_id: "your_gcp_project_id"
dataset: "your_bigquery_dataset"
files:
- name: "regions.parquet"
table: "regions"
- name: "cities.avro"
table: "cities"
concurrent_jobs: 3
bq:
project_id: "your_gcp_project_id"
dataset: "your_bigquery_dataset"
tables:
- name: "nation"
table: "nation"
Usage
- Set up the YAML configuration file with the necessary details, including PostgreSQL connection settings and GCS details.
- Ensure the necessary permissions are granted for accessing GCS, BigQuery, and Google Secret Manager.
- Build the project using the Go build tool:
go build -o GCS2Postgres
./GCS2Postgres
Example
To load data from multiple files in a GCS bucket into PostgreSQL, ensure your config.yaml is properly set up and run the application. The application will create external tables in BigQuery for each file, fetch the data, and load it into the specified PostgreSQL database.
Contributing
Contributions are welcome! Please fork this repository, make your changes, and submit a pull request.
License
This project is licensed under the MIT License. See the LICENSE file for more details.