Every sufficiently long-lived application eventually develops a configuration problem. It starts
innocuously: a JSON file, maybe an .env file, a YAML blob checked into the repo. A
few lookups at startup and everything is fine. Then the app grows. Multiple environments. Multiple
services. A new engineer who does not know which keys are required. A deployment that fails at 2am
because someone put "8080" (a string) where 8080 (an integer) was
expected.
The configuration problem is ultimately a typing problem. The data has structure -- it has always had structure -- but nothing enforces it. I built robust-config to fix that, using a combination of Apache Thrift for the schema and a small Python DSL for per-environment values.
There are four moving parts:
.thrift file declaring the config structure with types and required/optional semantics.cconf file with per-environment values, written in a minimal Python-style DSL.cconf against the schema and emits a clean JSON fileThe JSON file is the contract between the build system and the application. The schema is the contract between the engineer writing config and the engineer writing the application. The compiler enforces both.
Config structure lives in a .thrift file. If you already use Thrift for RPC, the
syntax is identical. Here is a representative schema for a small web application:
struct DatabaseConfig {
1: required string host
2: required i32 port = 5432
3: optional string username
4: optional string password
5: optional string database_name = "myapp"
6: optional i32 max_connections = 20
7: optional bool ssl_enabled = false
}
struct ServerConfig {
1: required i32 port = 8080
2: optional string bind_address = "0.0.0.0"
3: optional i32 max_connections = 1000
4: optional list<string> allowed_origins
}
struct AppConfig {
1: required DatabaseConfig database
2: required ServerConfig server
3: optional CacheConfig cache
4: optional LogLevel log_level = LogLevel.INFO
5: optional bool enable_metrics = false
}
required fields must appear in the .cconf file. optional
fields that are absent are omitted from the output JSON entirely -- the application falls back to
compiled-in defaults. Defaults declared in the schema are baked into the generated stubs as native
C++ initializers and Python dataclass defaults, so they never need to be duplicated.
A .cconf file names the schema and declares a single config expression. The struct
classes are automatically in scope -- you do not import anything:
schema = "../schema.thrift"
config = AppConfig(
database=DatabaseConfig(
host="db.prod.internal",
port=5432,
username="myapp",
database_name="myapp_prod",
max_connections=50,
ssl_enabled=True,
),
server=ServerConfig(
port=8080,
max_connections=2000,
allowed_origins=[
"https://example.com",
"https://app.example.com",
],
),
log_level="WARNING",
enable_metrics=True,
)
The development config for the same app is much shorter -- optional fields and fields with schema defaults are simply omitted. You only write what differs from the defaults.
The struct classes are not imported from anywhere. The compiler parses the schema first, then
injects a class for every struct into the execution namespace before running your file. This means
you cannot use the struct classes outside of a .cconf file, which is intentional --
it also means there is no import os, no subprocess.run, no way to call
out to the network. A .cconf file is a configuration declaration, not a program.
Running the compiler against the production config above:
$ robust-config compile configs/production.cconf -o build/production.json Compiled: configs/production.cconf -> build/production.json
The output JSON is fully expanded -- all defaults are filled in, all types validated:
{
"database": {
"host": "db.prod.internal",
"port": 5432,
"username": "myapp",
"database_name": "myapp_prod",
"max_connections": 50,
"ssl_enabled": true
},
"server": {
"port": 8080,
"bind_address": "0.0.0.0",
"max_connections": 2000,
"allowed_origins": [
"https://example.com",
"https://app.example.com"
]
},
"log_level": "WARNING",
"enable_metrics": true
}
You can validate without writing output:
$ robust-config validate configs/production.cconf Valid: configs/production.cconf
Type errors fail immediately with a precise field path:
$ robust-config compile /tmp/bad.cconf -o /tmp/bad.json Error: AppConfig.database.port: expected int, got str
Missing required fields:
Error: AppConfig.database: required field is not set
Unknown fields, which catches renames and typos:
Error: AppConfig.database: unknown field(s) ['db_host'] (not in 'DatabaseConfig')
Invalid enum values:
Error: AppConfig.log_level: 'VERBOSE' is not a valid LogLevel member. Valid: ['DEBUG', 'ERROR', 'INFO', 'WARNING']
All errors exit with code 1. In a CI pipeline, compilation failure means no deployment. That last
class of error -- the unknown field -- is what catches the rename case. If you change
db_host to host in the schema, every stale .cconf file fails
loudly on the next build instead of silently ignoring the field it no longer recognizes.
Once you have a schema, generate the language-specific accessor code:
$ robust-config generate-stubs schema.thrift --cpp myapp_config.h --python myapp_config.py C++ stubs: myapp_config.h Python stubs: myapp_config.py
The generated C++ header turns every struct into a native struct with std::optional<T>
for optional fields and value-initialized members for required ones. Enum types become
enum class. A load() static method handles JSON deserialization:
// Generated by robust-config from schema.thrift. Do not edit.
#pragma once
#include <optional>
#include <string>
#include <vector>
#include <nlohmann/json.hpp>
namespace myapp { namespace config {
enum class LogLevel : int { DEBUG = 0, INFO = 1, WARNING = 2, ERROR = 3 };
struct DatabaseConfig {
std::string host{};
int32_t port{5432};
std::optional<std::string> username{};
std::optional<std::string> database_name{"myapp"};
std::optional<int32_t> max_connections{20};
std::optional<bool> ssl_enabled{false};
static DatabaseConfig from_json(const nlohmann::json& j);
static DatabaseConfig load(const std::string& path);
};
struct AppConfig {
DatabaseConfig database{};
ServerConfig server{};
std::optional<CacheConfig> cache{};
std::optional<LogLevel> log_level{LogLevel::INFO};
std::optional<bool> enable_metrics{false};
static AppConfig from_json(const nlohmann::json& j);
static AppConfig load(const std::string& path);
};
}} // namespace myapp::config
Using it in C++:
#include "myapp_config.h"
using namespace myapp::config;
int main(int argc, char* argv[]) {
auto cfg = AppConfig::load(argv[1]);
// required fields -- no optional unwrap needed
std::cout << "Connecting to " << cfg.database.host
<< ":" << cfg.database.port << "\n";
// optional field -- value_or() for the fallback
bool ssl = cfg.database.ssl_enabled.value_or(false);
// optional nested struct -- check before dereferencing
if (cfg.cache) {
start_cache(cfg.cache->host, cfg.cache->port);
}
// optional enum
if (cfg.log_level == LogLevel::DEBUG) {
enable_verbose_logging();
}
}
If someone renames database_name to db_name in the schema and regenerates
the header, every callsite using .database_name is a compile error -- not a runtime
surprise discovered in production.
The Python stubs are dataclasses with Optional[T] annotations and enum members as a
str, Enum subclass:
# Generated by robust-config from schema.thrift. Do not edit.
from dataclasses import dataclass
from enum import Enum
from typing import Optional
import json
class LogLevel(str, Enum):
DEBUG = "DEBUG"
INFO = "INFO"
WARNING = "WARNING"
ERROR = "ERROR"
@dataclass
class DatabaseConfig:
host: str
port: int = 5432
username: Optional[str] = None
database_name: Optional[str] = 'myapp'
max_connections: Optional[int] = 20
ssl_enabled: Optional[bool] = False
@classmethod
def load(cls, path: str) -> "DatabaseConfig": ...
@dataclass
class AppConfig:
database: DatabaseConfig
server: ServerConfig
cache: Optional[CacheConfig] = None
log_level: Optional[LogLevel] = LogLevel.INFO
enable_metrics: Optional[bool] = False
@classmethod
def load(cls, path: str) -> "AppConfig": ...
And in use:
from myapp_config import AppConfig, LogLevel
cfg = AppConfig.load("production.json")
print(cfg.database.host) # str -- always present
print(cfg.database.port) # int -- always present
if cfg.log_level == LogLevel.DEBUG:
enable_verbose_logging()
if cfg.cache:
start_cache(cfg.cache.host, cfg.cache.port)
mypy and most editors will catch accesses to fields that do not exist. The generated
stubs carry all the schema defaults, so even if you never look at the .thrift file
the dataclass tells you what is optional and what the fallback values are.
Putting it together -- the Python consumer loading the compiled production config:
$ python src/main.py build/production.json
=== MyApp Configuration ===
[database]
host: db.prod.internal
port: 5432
username: myapp
database: myapp_prod
max_connections: 50
ssl: yes
[server]
port: 8080
bind: 0.0.0.0
max_connections: 2000
allowed_origins:
- https://example.com
- https://app.example.com
[app]
metrics: enabled
The natural home for config compilation is the build system. Here is the complete
meson.build for the example project:
project('myapp', 'cpp',
version: '0.1.0',
meson_version: '>=0.63.0',
default_options: ['cpp_std=c++17'],
)
robust_config = find_program('robust-config', required: true)
schema = files('schema.thrift')
# Stubs: regenerate whenever schema.thrift changes
myapp_config_h = custom_target('myapp_config_h',
input: schema,
output: 'myapp_config.h',
command: [robust_config, 'generate-stubs', '@INPUT@', '--cpp', '@OUTPUT@'],
build_by_default: true,
install: true,
install_dir: get_option('includedir') / 'myapp',
)
# Config files: recompile when DSL or schema changes
production_json = custom_target('production_json',
input: 'configs/production.cconf',
output: 'production.json',
depend_files: schema,
command: [robust_config, 'compile', '@INPUT@', '-o', '@OUTPUT@'],
build_by_default: true,
install: true,
install_dir: get_option('datadir') / 'myapp' / 'config',
)
development_json = custom_target('development_json',
input: 'configs/development.cconf',
output: 'development.json',
depend_files: schema,
command: [robust_config, 'compile', '@INPUT@', '-o', '@OUTPUT@'],
build_by_default: true,
)
myapp_config_dep = declare_dependency(
sources: myapp_config_h,
include_directories: include_directories('.'),
)
meson.override_dependency('myapp-config', myapp_config_dep)
nlohmann_json = dependency('nlohmann_json',
fallback: ['nlohmann_json', 'nlohmann_json_dep'],
)
myapp_exe = executable('myapp',
sources: 'src/main.cpp',
dependencies: [myapp_config_dep, nlohmann_json],
)
run_target('run',
command: [myapp_exe, production_json],
depends: [myapp_exe, production_json],
)
Three things worth pointing out:
depend_files: schema on the config compile targets means that changing
schema.thrift automatically invalidates and recompiles every .cconf file.
You cannot silently ship a stale config because the build will not succeed with one.
declare_dependency(sources: myapp_config_h) guarantees that any target using
myapp_config_dep waits for the header to be generated before compiling, even in a
fully parallel build. This is easy to get wrong by hand and tedious to debug when you do.
meson.override_dependency allows a downstream project that pulls this in as a
subproject to call dependency('myapp-config') and get everything -- include paths,
build ordering, and all -- in a single line.
Building and running:
$ meson setup build $ meson compile -C build $ meson compile -C build run Connecting to db.prod.internal:5432 Cache: cache.prod.internal:6379 Metrics: enabled
The run target passes the compiled production.json as an argument to the
binary. Meson handles the dependency ordering, so the JSON is always up to date before the binary
runs it.
If another project wants to depend on this one:
myproject/
meson.build
subprojects/
myapp/ <-- git submodule or wrap file
src/
consumer.cpp
# myproject/meson.build
project('my-consumer', 'cpp', default_options: ['cpp_std=c++17'])
myapp_dep = subproject('myapp').get_variable('myapp_config_dep')
nlohmann_dep = dependency('nlohmann_json')
executable('consumer', 'src/consumer.cpp',
dependencies: [myapp_dep, nlohmann_dep],
)
// src/consumer.cpp
#include "myapp_config.h" // path injected by myapp_config_dep
int main(int argc, char* argv[]) {
auto cfg = myapp::config::AppConfig::load(argv[1]);
// ...
}
#include "myapp_config.h" resolves to the generated file without any manual path
plumbing. The subproject handles it.
It is not a secrets manager. .cconf files should not be committed with real passwords
in them. Leave sensitive fields unset and populate them at deploy time by injecting values into the
JSON, or use a secrets backend that produces JSON compatible with the schema.
It is not a runtime config system. The JSON is loaded once at startup. No file watching, no hot reload, no config service. This is intentional -- startup configuration and runtime configuration are different problems and mixing them adds complexity that usually is not worth it.
It is not a templating system. There is no f"{env}-db.internal" in a
.cconf file. If you need per-environment string construction, write a generator
script that calls the compiler programmatically.
Configuration has a schema whether you declare it or not. Every application has a mental model of
what fields exist, what their types are, which are required, and what the defaults should be.
robust-config makes that model explicit -- in a Thrift file -- and then enforces it throughout the
toolchain: in the compiler that validates .cconf files, in the generated C++ stubs
that give you type-checked access at compile time, and in the generated Python dataclasses that
give you IDE completion and mypy coverage.
The result is that configuration mistakes become build failures rather than production incidents.