robust-config: Type-Safe Configuration

Every sufficiently long-lived application eventually develops a configuration problem. It starts innocuously: a JSON file, maybe an .env file, a YAML blob checked into the repo. A few lookups at startup and everything is fine. Then the app grows. Multiple environments. Multiple services. A new engineer who does not know which keys are required. A deployment that fails at 2am because someone put "8080" (a string) where 8080 (an integer) was expected.

The configuration problem is ultimately a typing problem. The data has structure -- it has always had structure -- but nothing enforces it. I built robust-config to fix that, using a combination of Apache Thrift for the schema and a small Python DSL for per-environment values.

The pieces

There are four moving parts:

A .thrift file declaring the config structure with types and required/optional semantics
A .cconf file with per-environment values, written in a minimal Python-style DSL
A compiler that validates the .cconf against the schema and emits a clean JSON file
Generated C++ headers and Python dataclasses that load the JSON without any raw dictionary access

The JSON file is the contract between the build system and the application. The schema is the contract between the engineer writing config and the engineer writing the application. The compiler enforces both.

Defining the schema

Config structure lives in a .thrift file. If you already use Thrift for RPC, the syntax is identical. Here is a representative schema for a small web application:

struct DatabaseConfig {
    1: required string  host
    2: required i32     port            = 5432
    3: optional string  username
    4: optional string  password
    5: optional string  database_name   = "myapp"
    6: optional i32     max_connections = 20
    7: optional bool    ssl_enabled     = false
}

struct ServerConfig {
    1: required i32          port               = 8080
    2: optional string       bind_address       = "0.0.0.0"
    3: optional i32          max_connections    = 1000
    4: optional list<string> allowed_origins
}

struct AppConfig {
    1: required DatabaseConfig database
    2: required ServerConfig   server
    3: optional CacheConfig    cache
    4: optional LogLevel       log_level      = LogLevel.INFO
    5: optional bool           enable_metrics = false
}

required fields must appear in the .cconf file. optional fields that are absent are omitted from the output JSON entirely -- the application falls back to compiled-in defaults. Defaults declared in the schema are baked into the generated stubs as native C++ initializers and Python dataclass defaults, so they never need to be duplicated.

Writing a .cconf file

A .cconf file names the schema and declares a single config expression. The struct classes are automatically in scope -- you do not import anything:

schema = "../schema.thrift"

config = AppConfig(
    database=DatabaseConfig(
        host="db.prod.internal",
        port=5432,
        username="myapp",
        database_name="myapp_prod",
        max_connections=50,
        ssl_enabled=True,
    ),
    server=ServerConfig(
        port=8080,
        max_connections=2000,
        allowed_origins=[
            "https://example.com",
            "https://app.example.com",
        ],
    ),
    log_level="WARNING",
    enable_metrics=True,
)

The development config for the same app is much shorter -- optional fields and fields with schema defaults are simply omitted. You only write what differs from the defaults.

The struct classes are not imported from anywhere. The compiler parses the schema first, then injects a class for every struct into the execution namespace before running your file. This means you cannot use the struct classes outside of a .cconf file, which is intentional -- it also means there is no import os, no subprocess.run, no way to call out to the network. A .cconf file is a configuration declaration, not a program.

Compiling

Running the compiler against the production config above:

$ robust-config compile configs/production.cconf -o build/production.json
Compiled: configs/production.cconf -> build/production.json

The output JSON is fully expanded -- all defaults are filled in, all types validated:

{
  "database": {
    "host": "db.prod.internal",
    "port": 5432,
    "username": "myapp",
    "database_name": "myapp_prod",
    "max_connections": 50,
    "ssl_enabled": true
  },
  "server": {
    "port": 8080,
    "bind_address": "0.0.0.0",
    "max_connections": 2000,
    "allowed_origins": [
      "https://example.com",
      "https://app.example.com"
    ]
  },
  "log_level": "WARNING",
  "enable_metrics": true
}

You can validate without writing output:

$ robust-config validate configs/production.cconf
Valid: configs/production.cconf

What the compiler catches

Type errors fail immediately with a precise field path:

$ robust-config compile /tmp/bad.cconf -o /tmp/bad.json
Error: AppConfig.database.port: expected int, got str

Missing required fields:

Error: AppConfig.database: required field is not set

Unknown fields, which catches renames and typos:

Error: AppConfig.database: unknown field(s) ['db_host'] (not in 'DatabaseConfig')

Invalid enum values:

Error: AppConfig.log_level: 'VERBOSE' is not a valid LogLevel member.
Valid: ['DEBUG', 'ERROR', 'INFO', 'WARNING']

All errors exit with code 1. In a CI pipeline, compilation failure means no deployment. That last class of error -- the unknown field -- is what catches the rename case. If you change db_host to host in the schema, every stale .cconf file fails loudly on the next build instead of silently ignoring the field it no longer recognizes.

Generating stubs

Once you have a schema, generate the language-specific accessor code:

$ robust-config generate-stubs schema.thrift --cpp myapp_config.h --python myapp_config.py
C++ stubs: myapp_config.h
Python stubs: myapp_config.py

The generated C++ header turns every struct into a native struct with std::optional<T> for optional fields and value-initialized members for required ones. Enum types become enum class. A load() static method handles JSON deserialization:

// Generated by robust-config from schema.thrift. Do not edit.
#pragma once
#include <optional>
#include <string>
#include <vector>
#include <nlohmann/json.hpp>

namespace myapp { namespace config {

enum class LogLevel : int { DEBUG = 0, INFO = 1, WARNING = 2, ERROR = 3 };

struct DatabaseConfig {
    std::string host{};
    int32_t port{5432};
    std::optional<std::string> username{};
    std::optional<std::string> database_name{"myapp"};
    std::optional<int32_t>     max_connections{20};
    std::optional<bool>        ssl_enabled{false};

    static DatabaseConfig from_json(const nlohmann::json& j);
    static DatabaseConfig load(const std::string& path);
};

struct AppConfig {
    DatabaseConfig database{};
    ServerConfig   server{};
    std::optional<CacheConfig> cache{};
    std::optional<LogLevel>    log_level{LogLevel::INFO};
    std::optional<bool>        enable_metrics{false};

    static AppConfig from_json(const nlohmann::json& j);
    static AppConfig load(const std::string& path);
};

}} // namespace myapp::config

Using it in C++:

#include "myapp_config.h"

using namespace myapp::config;

int main(int argc, char* argv[]) {
    auto cfg = AppConfig::load(argv[1]);

    // required fields -- no optional unwrap needed
    std::cout << "Connecting to " << cfg.database.host
              << ":" << cfg.database.port << "\n";

    // optional field -- value_or() for the fallback
    bool ssl = cfg.database.ssl_enabled.value_or(false);

    // optional nested struct -- check before dereferencing
    if (cfg.cache) {
        start_cache(cfg.cache->host, cfg.cache->port);
    }

    // optional enum
    if (cfg.log_level == LogLevel::DEBUG) {
        enable_verbose_logging();
    }
}

If someone renames database_name to db_name in the schema and regenerates the header, every callsite using .database_name is a compile error -- not a runtime surprise discovered in production.

The Python stubs are dataclasses with Optional[T] annotations and enum members as a str, Enum subclass:

# Generated by robust-config from schema.thrift. Do not edit.
from dataclasses import dataclass
from enum import Enum
from typing import Optional
import json

class LogLevel(str, Enum):
    DEBUG   = "DEBUG"
    INFO    = "INFO"
    WARNING = "WARNING"
    ERROR   = "ERROR"

@dataclass
class DatabaseConfig:
    host:            str
    port:            int = 5432
    username:        Optional[str]  = None
    database_name:   Optional[str]  = 'myapp'
    max_connections: Optional[int]  = 20
    ssl_enabled:     Optional[bool] = False

    @classmethod
    def load(cls, path: str) -> "DatabaseConfig": ...

@dataclass
class AppConfig:
    database:       DatabaseConfig
    server:         ServerConfig
    cache:          Optional[CacheConfig]  = None
    log_level:      Optional[LogLevel]     = LogLevel.INFO
    enable_metrics: Optional[bool]         = False

    @classmethod
    def load(cls, path: str) -> "AppConfig": ...

And in use:

from myapp_config import AppConfig, LogLevel

cfg = AppConfig.load("production.json")

print(cfg.database.host)   # str -- always present
print(cfg.database.port)   # int -- always present

if cfg.log_level == LogLevel.DEBUG:
    enable_verbose_logging()

if cfg.cache:
    start_cache(cfg.cache.host, cfg.cache.port)

mypy and most editors will catch accesses to fields that do not exist. The generated stubs carry all the schema defaults, so even if you never look at the .thrift file the dataclass tells you what is optional and what the fallback values are.

Running the application

Putting it together -- the Python consumer loading the compiled production config:

$ python src/main.py build/production.json
=== MyApp Configuration ===

[database]
  host:            db.prod.internal
  port:            5432
  username:        myapp
  database:        myapp_prod
  max_connections: 50
  ssl:             yes

[server]
  port:            8080
  bind:            0.0.0.0
  max_connections: 2000
  allowed_origins:
    - https://example.com
    - https://app.example.com

[app]
  metrics:         enabled

Meson integration

The natural home for config compilation is the build system. Here is the complete meson.build for the example project:

project('myapp', 'cpp',
  version: '0.1.0',
  meson_version: '>=0.63.0',
  default_options: ['cpp_std=c++17'],
)

robust_config = find_program('robust-config', required: true)
schema = files('schema.thrift')

# Stubs: regenerate whenever schema.thrift changes
myapp_config_h = custom_target('myapp_config_h',
  input:            schema,
  output:           'myapp_config.h',
  command:          [robust_config, 'generate-stubs', '@INPUT@', '--cpp', '@OUTPUT@'],
  build_by_default: true,
  install:          true,
  install_dir:      get_option('includedir') / 'myapp',
)

# Config files: recompile when DSL or schema changes
production_json = custom_target('production_json',
  input:        'configs/production.cconf',
  output:       'production.json',
  depend_files: schema,
  command:      [robust_config, 'compile', '@INPUT@', '-o', '@OUTPUT@'],
  build_by_default: true,
  install:          true,
  install_dir:      get_option('datadir') / 'myapp' / 'config',
)

development_json = custom_target('development_json',
  input:        'configs/development.cconf',
  output:       'development.json',
  depend_files: schema,
  command:      [robust_config, 'compile', '@INPUT@', '-o', '@OUTPUT@'],
  build_by_default: true,
)

myapp_config_dep = declare_dependency(
  sources:             myapp_config_h,
  include_directories: include_directories('.'),
)
meson.override_dependency('myapp-config', myapp_config_dep)

nlohmann_json = dependency('nlohmann_json',
  fallback: ['nlohmann_json', 'nlohmann_json_dep'],
)

myapp_exe = executable('myapp',
  sources:      'src/main.cpp',
  dependencies: [myapp_config_dep, nlohmann_json],
)

run_target('run',
  command: [myapp_exe, production_json],
  depends: [myapp_exe, production_json],
)

Three things worth pointing out:

depend_files: schema on the config compile targets means that changing schema.thrift automatically invalidates and recompiles every .cconf file. You cannot silently ship a stale config because the build will not succeed with one.

declare_dependency(sources: myapp_config_h) guarantees that any target using myapp_config_dep waits for the header to be generated before compiling, even in a fully parallel build. This is easy to get wrong by hand and tedious to debug when you do.

meson.override_dependency allows a downstream project that pulls this in as a subproject to call dependency('myapp-config') and get everything -- include paths, build ordering, and all -- in a single line.

Building and running:

$ meson setup build
$ meson compile -C build
$ meson compile -C build run
Connecting to db.prod.internal:5432
Cache: cache.prod.internal:6379
Metrics: enabled

The run target passes the compiled production.json as an argument to the binary. Meson handles the dependency ordering, so the JSON is always up to date before the binary runs it.

Consuming as a subproject

If another project wants to depend on this one:

myproject/
  meson.build
  subprojects/
    myapp/        <-- git submodule or wrap file
  src/
    consumer.cpp

# myproject/meson.build
project('my-consumer', 'cpp', default_options: ['cpp_std=c++17'])

myapp_dep    = subproject('myapp').get_variable('myapp_config_dep')
nlohmann_dep = dependency('nlohmann_json')

executable('consumer', 'src/consumer.cpp',
  dependencies: [myapp_dep, nlohmann_dep],
)

// src/consumer.cpp
#include "myapp_config.h"   // path injected by myapp_config_dep

int main(int argc, char* argv[]) {
    auto cfg = myapp::config::AppConfig::load(argv[1]);
    // ...
}

#include "myapp_config.h" resolves to the generated file without any manual path plumbing. The subproject handles it.

What it is not

It is not a secrets manager. .cconf files should not be committed with real passwords in them. Leave sensitive fields unset and populate them at deploy time by injecting values into the JSON, or use a secrets backend that produces JSON compatible with the schema.

It is not a runtime config system. The JSON is loaded once at startup. No file watching, no hot reload, no config service. This is intentional -- startup configuration and runtime configuration are different problems and mixing them adds complexity that usually is not worth it.

It is not a templating system. There is no f"{env}-db.internal" in a .cconf file. If you need per-environment string construction, write a generator script that calls the compiler programmatically.

The core idea

Configuration has a schema whether you declare it or not. Every application has a mental model of what fields exist, what their types are, which are required, and what the defaults should be. robust-config makes that model explicit -- in a Thrift file -- and then enforces it throughout the toolchain: in the compiler that validates .cconf files, in the generated C++ stubs that give you type-checked access at compile time, and in the generated Python dataclasses that give you IDE completion and mypy coverage.

The result is that configuration mistakes become build failures rather than production incidents.