Skip to Content
Kaiwa logoHomeDashboard
Quick Start

Quick Start

Setting Up the Schema

To generate queries, the system needs to understand your database schema. This includes:

  • A list of relevant tables (those you will query, and those needed for joins).
  • The relevant fields within each table.
  • Optionally, aliases for tables and fields to make legacy or unintuitive names more meaningful without changing the underlying database.

Relevant fields include those used in relations—such as foreign keys or join conditions—on both sides of the relationship.

Example Schema Definition

{ "tables": [ { "name": "users", "typing": "strict", "fields": [ { "name": "id", "type": "uuid" }, { "name": "name", "type": "string" }, { "name": "created_at", "type": "datetime" } ] } ] }

Schema Typing Modes

Each table must specify a typing mode that controls how the query generator accesses fields:

  • strict - The model can only access fields explicitly defined in the schema. This is the recommended mode for SQL databases where the schema is fixed.

A loose typing mode for schemaless databases (like MongoDB) is planned for a future release. This will allow the model to access any fields while still having type information for specified ones.

Supported Data Types

To generate type-safe queries, you must specify the data types for each field. The system supports both primitive and composite types.

Primitive Types

The following primitive types are currently supported:

  • string - Text data
  • integer - Signed 64-bit integers
  • unsignedinteger - Unsigned 64-bit integers
  • float - 64-bit floating point numbers
  • datetime - Date and time with timezone
  • date - Date only values
  • time - Time only values
  • bool - Boolean true/false values
  • oid - Database object identifiers
  • uuid - Universally unique identifiers (UUID v4)

Composite Types

  • nullable - Wraps any type to allow null values
  • enum - Predefined set of allowed values with optional aliases
  • object - Nested structure with typed fields
  • array - List of items of a single type

Nullable Fields

Wrap any type with nullable to allow null values:

{ "name": "deleted_at", "type": "nullable", "value": { "type": "datetime" } }

Enum Fields

Define a fixed set of allowed values. Enum variants can have aliases to map legacy or cryptic database values to human-readable names:

{ "name": "status", "type": "enum", "variants": [ { "value": "A", "alias": "Active", "description": "User is active" }, { "value": "I", "alias": "Inactive" }, { "value": "P", "alias": "Pending" } ] }

Enum values can be strings, integers, booleans, or floats:

{ "name": "priority", "type": "enum", "variants": [ { "value": 1, "alias": "Low" }, { "value": 2, "alias": "Medium" }, { "value": 3, "alias": "High" } ] }

Object Fields

Define nested structures with their own typed fields:

{ "name": "address", "type": "object", "fields": [ { "name": "street", "type": "string" }, { "name": "city", "type": "string" }, { "name": "zip", "type": "string" } ] }

Array Fields

Define lists of items of a single type:

{ "name": "tags", "type": "array", "item": { "type": "string" } }

Arrays can contain any type, including objects:

{ "name": "addresses", "type": "array", "item": { "type": "object", "fields": [ { "name": "street", "type": "string" }, { "name": "city", "type": "string" } ] } }

Example with Data Types

{ "tables": [ { "name": "products", "typing": "strict", "fields": [ { "name": "id", "type": "uuid" }, { "name": "name", "type": "string" }, { "name": "price", "type": "float" }, { "name": "in_stock", "type": "bool" }, { "name": "category", "type": "string" }, { "name": "created_date", "type": "date" }, { "name": "last_updated", "type": "datetime" } ] } ] }

Supported Database Engines

To generate accurate queries for your specific database, you need to specify which database engine you’re using. Currently supported engines include:

  • PostgreSQL - Full-featured relational database
  • ClickHouse - Column-oriented OLAP database for analytics
  • BigQuery - Google Cloud’s serverless data warehouse

Complete Schema Example

Here’s a complete schema configuration example:

{ "default_engine": { "type": "postgres", "version": "16.0.0" }, "schema": { "tables": [ { "name": "customers", "typing": "strict", "fields": [ { "name": "id", "type": "integer" }, { "name": "name", "type": "string" }, { "name": "email", "type": "string" }, { "name": "registered_at", "type": "datetime" }, { "name": "active", "type": "bool" } ] }, { "name": "orders", "typing": "strict", "fields": [ { "name": "id", "type": "uuid" }, { "name": "customer_id", "type": "integer" }, { "name": "total", "type": "float" }, { "name": "status", "type": "string" } ] } ] } }

With your schema properly configured, you’re ready to start generating type-safe queries for your database!

Adding Your Schema

You can add your schema using one of the following methods:

  • Using our REST API 
  • Through the dashboard, which currently supports importing the JSON schema format shown above. Future releases will feature a comprehensive GUI for schema management.

Tunnel Setup

The primary methods to use KaiwaDB are through our REST API or admin dashboard. For impatient users, we offer tunnels intended primarily for demo purposes.

The tunnel can be installed using one of the following methods.

Docker

A prebuilt multi-arch image (linux/amd64, linux/arm64) is published to GitHub Container Registry:

docker pull ghcr.io/kaiwadb/tunnel:latest docker run --rm --network=host ghcr.io/kaiwadb/tunnel:latest --token "<token>"

Use --network=host if you want the database discovery scan to find databases on the host. Without it the container only sees the Docker bridge subnet and the scan returns nothing. See Running in Docker below for details and alternatives.

Shell installer (macOS / Linux)

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/kaiwadb/tunnel/releases/download/v0.4.0/kaiwadb-tunnel-installer.sh | sh

PowerShell installer (Windows)

powershell -c "irm https://github.com/kaiwadb/tunnel/releases/download/v0.4.0/kaiwadb-tunnel-installer.ps1 | iex"

Once installed, the tunnel can be started under a VPN where it can reach the database. The admin can then add database connections in the admin dashboard and send queries directly from the web app without requiring any additional infrastructure setup.

You can create a tunnel token in the settings of the admin dashboard to authenticate the tunnel.

Basic CLI Usage

After installing the tunnel, you can use the CLI to interact with the KaiwaDB service:

kaiwadb-tunnel --token "<token>"

Replace <token> with your tunnel token generated from the admin dashboard.

Database Discovery

When the dashboard requests a scan, the tunnel enumerates likely database endpoints reachable from the host it runs on and reports back a list of candidates (engine, IP, port, version hint, where the hint came from). The operator confirms which ones to register; the tunnel never opens a connection beyond the protocol fingerprint.

The scan is wall-clock-budgeted (default 30s, configurable 1–120s) and concurrency-capped, so it stays bounded on any network.

What gets probed

Discovery runs in layers; each layer contributes candidate hosts, then a single TCP-probe pass fingerprints whatever answers on each (candidate × engine-default-port) tuple.

  • L0 — local signals. Network interfaces (if_addrs), /etc/hosts entries with database-like names, /proc/net/tcp listeners, ARP/NDP neighbour cache, default gateway (plus a few common offsets like .2, .10), and DB-related environment variables (DATABASE_URL, PGHOST, MONGODB_URI, …).
  • L1 — DNS hints. Resolves <label>.<search-domain> for each /etc/resolv.conf search domain against well-known DB labels (db, postgres, mysql, mongo, clickhouse, …). Picks up managed databases that are pure DNS endpoints (RDS, Atlas).
  • L2 — targeted TCP probe. Every L0+L1 host × engine default port.
  • L3 — subnet enumeration. Every host in each private NIC subnet, plus any non-default routes (WireGuard/OpenVPN AllowedIPs), plus any extra CIDRs you pass in. Subnets larger than max_prefix (default /22, configurable /16/32) are skipped to keep the budget bounded.
  • L4 — protocol fingerprint. Every open port gets a lightweight engine-specific handshake (PostgreSQL startup, MySQL greeting, ClickHouse HTTP /ping, etc.) to confirm the engine and extract a version hint.

Supported engines and the ports probed:

EnginePorts
PostgreSQL5432
MySQL / MariaDB3306, 3307
MSSQL1433
MongoDB27017, 27018, 27019
ClickHouse8123, 8443, 9000, 9440

Tuning a scan

The scan request accepts a few optional knobs:

FieldDefaultNotes
timeout_ms30000Wall-clock budget, clamped to 1000120000.
max_prefix22Smallest IPv4 prefix length enumerated. /16/32. Bigger numbers = smaller subnets = fewer hosts.
extra_cidrs[]Additional IPv4 CIDRs to enumerate, e.g. ["10.0.5.0/24"]. Useful when the host can route to a subnet that isn’t on any local NIC.
include_dns_labelsbuilt-in listOverride the DNS labels probed against each search domain.
include_search_domainsfrom /etc/resolv.confOverride the search domains used for L1.

Running in Docker

By default Docker puts the container in its own network namespace on the bridge network. The tunnel then only sees:

  • its own eth0 on the Docker bridge subnet (typically 172.17.0.0/16),
  • lo,
  • the container’s own /proc/net/tcp, ARP cache, routes, and /etc/hosts.

The bridge subnet is /16 — larger than the default max_prefix of /22 — so it gets skipped as “too large”, and the scan reports zero candidates even though the host has reachable databases.

Recommended fix — share the host network namespace (Linux only):

docker run --rm --network=host ghcr.io/kaiwadb/tunnel:latest --token "<token>"

The container now sees the host’s NICs, routes, ARP cache, /proc/net/tcp listeners, and default gateway. Discovery behaves as if the binary were running directly on the host.

--network=host is Linux-only. On Docker Desktop (macOS / Windows) the flag exists but the container still runs inside a Linux VM, so it sees the VM’s network, not the host’s. For those platforms either run the native binary directly, or use the extra_cidrs workaround below.

Alternative — supply extra_cidrs. If you can’t use host networking (Kubernetes, Swarm, hardened deployments), tell the scan which subnet to sweep:

{ "extra_cidrs": ["192.168.1.0/24"], "timeout_ms": 30000 }

This works in any network mode but gives up the local hints (ARP, /proc/net, gateway) — it’s a pure TCP sweep of the CIDR you specify.

Alternative — --add-host. If you already know the DB hostname, add it to the container’s /etc/hosts so the L0 hosts-file pass picks it up:

docker run --rm \ --add-host db.internal:10.0.0.5 \ ghcr.io/kaiwadb/tunnel:latest --token "<token>"

Interpreting the report

Each candidate carries the sources that contributed it (arp, etc_hosts, dns, proc_net, gateway, subnet_scan, env, extra_cidr, routed). A candidate hit via multiple sources is higher-confidence than one found purely by subnet sweep.

The report also lists subnets_scanned and subnets_skipped (with the reason: too_large, loopback, link_local, public, invalid) so you can tell whether a subnet you expected to be probed actually was, or whether you need to raise max_prefix or add an extra_cidrs entry.

Last updated on