Quick Start
Setting Up the Schema
To generate queries, the system needs to understand your database schema. This includes:
- A list of relevant tables (those you will query, and those needed for joins).
- The relevant fields within each table.
- Optionally, aliases for tables and fields to make legacy or unintuitive names more meaningful without changing the underlying database.
Relevant fields include those used in relations—such as foreign keys or join conditions—on both sides of the relationship.
Example Schema Definition
Schema JSON
{
"tables": [
{
"name": "users",
"typing": "strict",
"fields": [
{ "name": "id", "type": "uuid" },
{ "name": "name", "type": "string" },
{ "name": "created_at", "type": "datetime" }
]
}
]
}Schema Typing Modes
Each table must specify a typing mode that controls how the query generator accesses fields:
strict- The model can only access fields explicitly defined in the schema. This is the recommended mode for SQL databases where the schema is fixed.
A loose typing mode for schemaless databases (like MongoDB) is planned for a
future release. This will allow the model to access any fields while still
having type information for specified ones.
Supported Data Types
To generate type-safe queries, you must specify the data types for each field. The system supports both primitive and composite types.
Primitive Types
The following primitive types are currently supported:
string- Text datainteger- Signed 64-bit integersunsignedinteger- Unsigned 64-bit integersfloat- 64-bit floating point numbersdatetime- Date and time with timezonedate- Date only valuestime- Time only valuesbool- Boolean true/false valuesoid- Database object identifiersuuid- Universally unique identifiers (UUID v4)
Composite Types
nullable- Wraps any type to allow null valuesenum- Predefined set of allowed values with optional aliasesobject- Nested structure with typed fieldsarray- List of items of a single type
Nullable Fields
Wrap any type with nullable to allow null values:
{
"name": "deleted_at",
"type": "nullable",
"value": { "type": "datetime" }
}Enum Fields
Define a fixed set of allowed values. Enum variants can have aliases to map legacy or cryptic database values to human-readable names:
{
"name": "status",
"type": "enum",
"variants": [
{ "value": "A", "alias": "Active", "description": "User is active" },
{ "value": "I", "alias": "Inactive" },
{ "value": "P", "alias": "Pending" }
]
}Enum values can be strings, integers, booleans, or floats:
{
"name": "priority",
"type": "enum",
"variants": [
{ "value": 1, "alias": "Low" },
{ "value": 2, "alias": "Medium" },
{ "value": 3, "alias": "High" }
]
}Object Fields
Define nested structures with their own typed fields:
{
"name": "address",
"type": "object",
"fields": [
{ "name": "street", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "zip", "type": "string" }
]
}Array Fields
Define lists of items of a single type:
{
"name": "tags",
"type": "array",
"item": { "type": "string" }
}Arrays can contain any type, including objects:
{
"name": "addresses",
"type": "array",
"item": {
"type": "object",
"fields": [
{ "name": "street", "type": "string" },
{ "name": "city", "type": "string" }
]
}
}Example with Data Types
{
"tables": [
{
"name": "products",
"typing": "strict",
"fields": [
{ "name": "id", "type": "uuid" },
{ "name": "name", "type": "string" },
{ "name": "price", "type": "float" },
{ "name": "in_stock", "type": "bool" },
{ "name": "category", "type": "string" },
{ "name": "created_date", "type": "date" },
{ "name": "last_updated", "type": "datetime" }
]
}
]
}Supported Database Engines
To generate accurate queries for your specific database, you need to specify which database engine you’re using. Currently supported engines include:
- PostgreSQL - Full-featured relational database
- ClickHouse - Column-oriented OLAP database for analytics
- BigQuery - Google Cloud’s serverless data warehouse
Complete Schema Example
Here’s a complete schema configuration example:
Basic Configuration
{
"default_engine": { "type": "postgres", "version": "16.0.0" },
"schema": {
"tables": [
{
"name": "customers",
"typing": "strict",
"fields": [
{ "name": "id", "type": "integer" },
{ "name": "name", "type": "string" },
{ "name": "email", "type": "string" },
{ "name": "registered_at", "type": "datetime" },
{ "name": "active", "type": "bool" }
]
},
{
"name": "orders",
"typing": "strict",
"fields": [
{ "name": "id", "type": "uuid" },
{ "name": "customer_id", "type": "integer" },
{ "name": "total", "type": "float" },
{ "name": "status", "type": "string" }
]
}
]
}
}With your schema properly configured, you’re ready to start generating type-safe queries for your database!
Adding Your Schema
You can add your schema using one of the following methods:
- Using our REST API
- Through the dashboard, which currently supports importing the JSON schema format shown above. Future releases will feature a comprehensive GUI for schema management.
Tunnel Setup
The primary methods to use KaiwaDB are through our REST API or admin dashboard. For impatient users, we offer tunnels intended primarily for demo purposes.
The tunnel can be installed using one of the following methods.
Docker
A prebuilt multi-arch image (linux/amd64, linux/arm64) is published to GitHub Container Registry:
docker pull ghcr.io/kaiwadb/tunnel:latest
docker run --rm --network=host ghcr.io/kaiwadb/tunnel:latest --token "<token>"Use --network=host if you want the database discovery
scan to find databases on the host. Without it the
container only sees the Docker bridge subnet and the scan returns nothing.
See Running in Docker below for details and
alternatives.
Shell installer (macOS / Linux)
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/kaiwadb/tunnel/releases/download/v0.4.0/kaiwadb-tunnel-installer.sh | shPowerShell installer (Windows)
powershell -c "irm https://github.com/kaiwadb/tunnel/releases/download/v0.4.0/kaiwadb-tunnel-installer.ps1 | iex"Once installed, the tunnel can be started under a VPN where it can reach the database. The admin can then add database connections in the admin dashboard and send queries directly from the web app without requiring any additional infrastructure setup.
You can create a tunnel token in the settings of the admin dashboard to authenticate the tunnel.
Basic CLI Usage
After installing the tunnel, you can use the CLI to interact with the KaiwaDB service:
kaiwadb-tunnel --token "<token>"Replace <token> with your tunnel token generated from the admin dashboard.
Database Discovery
When the dashboard requests a scan, the tunnel enumerates likely database endpoints reachable from the host it runs on and reports back a list of candidates (engine, IP, port, version hint, where the hint came from). The operator confirms which ones to register; the tunnel never opens a connection beyond the protocol fingerprint.
The scan is wall-clock-budgeted (default 30s, configurable 1–120s) and concurrency-capped, so it stays bounded on any network.
What gets probed
Discovery runs in layers; each layer contributes candidate hosts, then a
single TCP-probe pass fingerprints whatever answers on each
(candidate × engine-default-port) tuple.
- L0 — local signals. Network interfaces (
if_addrs),/etc/hostsentries with database-like names,/proc/net/tcplisteners, ARP/NDP neighbour cache, default gateway (plus a few common offsets like.2,.10), and DB-related environment variables (DATABASE_URL,PGHOST,MONGODB_URI, …). - L1 — DNS hints. Resolves
<label>.<search-domain>for each/etc/resolv.confsearch domain against well-known DB labels (db,postgres,mysql,mongo,clickhouse, …). Picks up managed databases that are pure DNS endpoints (RDS, Atlas). - L2 — targeted TCP probe. Every L0+L1 host × engine default port.
- L3 — subnet enumeration. Every host in each private NIC subnet, plus
any non-default routes (WireGuard/OpenVPN
AllowedIPs), plus any extra CIDRs you pass in. Subnets larger thanmax_prefix(default/22, configurable/16–/32) are skipped to keep the budget bounded. - L4 — protocol fingerprint. Every open port gets a lightweight
engine-specific handshake (PostgreSQL startup, MySQL greeting,
ClickHouse HTTP
/ping, etc.) to confirm the engine and extract a version hint.
Supported engines and the ports probed:
| Engine | Ports |
|---|---|
| PostgreSQL | 5432 |
| MySQL / MariaDB | 3306, 3307 |
| MSSQL | 1433 |
| MongoDB | 27017, 27018, 27019 |
| ClickHouse | 8123, 8443, 9000, 9440 |
Tuning a scan
The scan request accepts a few optional knobs:
| Field | Default | Notes |
|---|---|---|
timeout_ms | 30000 | Wall-clock budget, clamped to 1000–120000. |
max_prefix | 22 | Smallest IPv4 prefix length enumerated. /16–/32. Bigger numbers = smaller subnets = fewer hosts. |
extra_cidrs | [] | Additional IPv4 CIDRs to enumerate, e.g. ["10.0.5.0/24"]. Useful when the host can route to a subnet that isn’t on any local NIC. |
include_dns_labels | built-in list | Override the DNS labels probed against each search domain. |
include_search_domains | from /etc/resolv.conf | Override the search domains used for L1. |
Running in Docker
By default Docker puts the container in its own network namespace on the
bridge network. The tunnel then only sees:
- its own
eth0on the Docker bridge subnet (typically172.17.0.0/16), lo,- the container’s own
/proc/net/tcp, ARP cache, routes, and/etc/hosts.
The bridge subnet is /16 — larger than the default max_prefix of /22
— so it gets skipped as “too large”, and the scan reports zero candidates
even though the host has reachable databases.
Recommended fix — share the host network namespace (Linux only):
docker run --rm --network=host ghcr.io/kaiwadb/tunnel:latest --token "<token>"The container now sees the host’s NICs, routes, ARP cache, /proc/net/tcp
listeners, and default gateway. Discovery behaves as if the binary were
running directly on the host.
--network=host is Linux-only. On Docker Desktop (macOS / Windows) the
flag exists but the container still runs inside a Linux VM, so it sees
the VM’s network, not the host’s. For those platforms either run the
native binary directly, or use the extra_cidrs workaround below.
Alternative — supply extra_cidrs. If you can’t use host networking
(Kubernetes, Swarm, hardened deployments), tell the scan which subnet to
sweep:
{
"extra_cidrs": ["192.168.1.0/24"],
"timeout_ms": 30000
}This works in any network mode but gives up the local hints (ARP,
/proc/net, gateway) — it’s a pure TCP sweep of the CIDR you specify.
Alternative — --add-host. If you already know the DB hostname, add it
to the container’s /etc/hosts so the L0 hosts-file pass picks it up:
docker run --rm \
--add-host db.internal:10.0.0.5 \
ghcr.io/kaiwadb/tunnel:latest --token "<token>"Interpreting the report
Each candidate carries the sources that contributed it
(arp, etc_hosts, dns, proc_net, gateway, subnet_scan, env,
extra_cidr, routed). A candidate hit via multiple sources is
higher-confidence than one found purely by subnet sweep.
The report also lists subnets_scanned and subnets_skipped (with the
reason: too_large, loopback, link_local, public, invalid) so you
can tell whether a subnet you expected to be probed actually was, or
whether you need to raise max_prefix or add an extra_cidrs entry.