How to test config before restarting service?

What happened?

I see no (apparent) way to test a changed configuration before restarting pomerium.
I tried running a second instance of the service on a different port, to see if it would start without errors.
Existing instance creates a socket file and some other stuff under tmp. I’m guessing this blocks creation of a second service, unless there’s some way to override the path of these in the config…?

What did you expect to happen?

Either a command line option (ie. -verify_config) or being able to interactively start pomerium on a different TCP port to verify that config is ok

How’d it happen?

  1. Copied working config.yaml to configtest.yaml
  2. Changed address to 0.0.0.0:4443
  3. Ran ‘pomerium -config configtest.yaml’

What’s your environment like?

  • Pomerium version (retrieve with pomerium --version): 0.20.0-1668445494+9413123c
  • Server Operating System/Architecture/Cloud: RHEL 8.7

What’s your config.yaml?

# This is the test config file. It's similar to the original config, except that 443 has been changed to 4443
address: 0.0.0.0:4443

authenticate_service_url: https://auth.my_internal_service.com

https://www.pomerium.com/docs/reference/certificates.html
autocert: false
certificates:
  - cert: /etc/pki/http/my_configured_cert.crt
    key: /etc/pki/http/my_configured_cert.key

shared_secret: <generated secret>
cookie_secret: <another generated secret>
idp_provider: oidc
idp_provider_url: https://sso.my_internal_service.com/auth/realms/pomerium
idp_client_id: pomerium-client-001
idp_client_secret: <sso client secret>

routes:
- from: https://test.my_internal_service.com
  to: https://internal_test.my_internal_service.com
  tls_skip_verify: true
  policy:
  - allow:
      or:
      - domain:
          is: my_internal_domain.com

What did you see in the logs?

{"service":"envoy","name":"envoy","time":"2023-01-10T12:47:13+01:00","message":"unable to bind domain socket with base_id=86667480, id=0, errno=98 (see --base-id option)"}
{"level":"info","address":"127.0.0.1:34647","time":"2023-01-10T12:47:13+01:00","message":"grpc: dialing"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"time":"2023-01-10T12:47:13+01:00","message":"enabled authorize service"}
{"level":"info","Algorithm":"ES256","KeyID":"<REDACTED>","Public Key":{"use":"sig","kty":"EC","kid":"<REDACTED>","crv":"P-256","alg":"ES256","x":"<REDACTED>","y":"<REDACTED>"},"time":"2023-01-10T12:47:13+01:00","message":"authorize: signing key"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"time":"2023-01-10T12:47:13+01:00","message":"enabled databroker service"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"address":"127.0.0.1:34647","time":"2023-01-10T12:47:13+01:00","message":"grpc: dialing"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"time":"2023-01-10T12:47:13+01:00","message":"enabled proxy service"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"addr":"127.0.0.1:40931","time":"2023-01-10T12:47:13+01:00","message":"starting control-plane gRPC server"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"addr":"127.0.0.1:33583","time":"2023-01-10T12:47:13+01:00","message":"starting control-plane http server"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"addr":"127.0.0.1:42417","time":"2023-01-10T12:47:13+01:00","message":"starting control-plane debug server"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"addr":"127.0.0.1:39099","time":"2023-01-10T12:47:13+01:00","message":"starting control-plane metrics server"}
{"level":"info","name":"identity_manager","duration":30000,"time":"2023-01-10T12:47:13+01:00","message":"acquire lease"}
{"level":"info","time":"2023-01-10T12:47:13+01:00","message":"using in-memory store"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"service":"identity_manager","syncer_id":"identity_manager","syncer_type":"","time":"2023-01-10T12:47:13+01:00","message":"initial sync"}
{"level":"info","type":"","time":"2023-01-10T12:47:13+01:00","message":"sync latest"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"service":"identity_manager","syncer_id":"identity_manager","syncer_type":"","time":"2023-01-10T12:47:13+01:00","message":"listening for updates"}
{"level":"info","config_file_source":"/etc/pomerium/configtest.yaml","bootstrap":true,"service":"identity_manager","sessions":0,"users":0,"time":"2023-01-10T12:47:13+01:00","message":"initial sync complete"}
{"level":"info","server_version":16683166791686608810,"record_version":0,"time":"2023-01-10T12:47:13+01:00","message":"sync"}
{"level":"fatal","pid":1059873,"time":"2023-01-10T12:47:14+01:00","message":"envoy: subprocess exited"}

Additional context

We use an external configuration management system that edits config.yaml and restarts the pomerium service if the yaml contents have changed. I would like to be able to verify that no breaking changes have been introduced to the file, so we can elect to restart the service or roll back the changes.

I don’t know if changing the socket path is sufficient to be able to run two instances of the service on one server, but in /tmp/pomerium-envoyxxxxxxxx/envoy-config.yaml I see the settings for it, though I’ve been unable to modify them with the bootstrap options (Envoy Bootstrap Options | Pomerium)

Pomerium provides a programmatic config API that should be used instead for that purposes; as it supports concurrent modifications, config validation and more.

HI, and thanks for the response.

Isn’t the API reserved for the enterprise version? Is there any way to handle this on a purely open source installation?

Also, won’t the enterprise version require a config.yaml for some sort of base configuration?

Kubernetes deployment naturally gives you declarative configuration updates part of open source. You may monitor the /status of the Pomerium CRD to see whether a config update was applied or not.

In the enterprise deployment you have a very minimal config.yaml that would only have bootstrap parameters that are largely static and almost never change. Routes and access policies and external data sources are configured via the console API or UI.

Ok, so if I understand correctly, there’s no way to handle this on a standard open source RPM installation? And there’s no config parameter to change the socket path that’s shown in the envoy-config.yaml?

"admin" : {
      "address" : {
         "pipe" : {
            "mode" : 384,
            "path" : "/tmp/pomerium-envoy-admin.sock"
         }
      },
      "profilePath" : "/dev/null"
   },

(provided that would be sufficient to allow a second pomerium instance to run)

you may designate a different temp dir via standard TMPDIR environment variable.

This also bothered me, and I reported it here but nothing ever came of it.

I ended up writing a Bash script that
(1) Parses the YAML to make sure there were no syntax errors that would cause pomerium to crash
(2) Backs up the running config
(3) Updates the config and restarts pomerium
(4) Monitors for crash
(5) If a crash is detected, rollback the backup config and restart pomerium again

This took care of 75% of my issues. I’m on Docker Swarm and not k8s, but perhaps such an approach would help you.

Unfortunately the Enterprise Console crashes unless config is 100% valid, too… I’ve also learned the release notes are unreliable and don’t list all breaking changes. At this point if I can deploy an upgrade/config change with less than a dozen attempts/crashes I call it a win :smiley:

Thanks, but setting TMPDIR didn’t change the path of the socket file, just the envoy-config folder

Hmm… Seems like my goal is less obtainable than I hoped for. Anyway, thanks for the input. I guess a solution like yours may be the best approach atm. :slight_smile:

Pomerium core would reject invalid config on first launch.

If you supply an invalid config after Pomerium started, it would be just rejected when a config change would be detected.

Pomerium Enterprise has a very minimal config that is not watched for updates and is expected to be correct, as once you have console up, there’s pretty much nothing else to configure via the config file: you perform subsequent system configuration via the UI or API.

If you need perform routine mutations of the config (alter routes and policies), you should:

  • configure core and enterprise bootstrap config and do not update it
  • use existing programmatic APIs either via Kubernetes Ingress or Enterprise API to perform config mutation.

Please help us understand your config validation needs better - do you store your config and secrets externally like git and secret management solution, how do you propagate it to pomerium currently?

Would something like running pomerium with additional option —dry-run work for your purposes?

We use Puppet to build the config.yaml from several sources (variables for most settings and a dictionary/hash for the routes). So when the the config is changed somehow (most likely after somebody changes or adds a route), Puppet restarts the service. Then if the config is invalid, Pomerium won’t start again.

If you supply an invalid config after Pomerium started, it would be just rejected when a config change would be detected.

Do I understand you correctly that Pomerium will re-read the config.yaml without a restart, but only if the configuration is ok?

Would something like running pomerium with additional option —dry-run work for your purposes?

If the dry-run option would check the config and return a 0 or 1 exit code depending on if it was good or bad, then yes, that would be useful in this scenario.

By the way, is it possible (in Pomerium core) to split the config into multiple files, like putting the routes in a separate one?

Yes pomerium watches not just the config file but also all other referenced files for changes.

Please help us understand your config validation needs better - do you store your config and secrets externally like git and secret management solution, how do you propagate it to pomerium currently?

Note: I have no affiliation to torch.

I’ve got maybe 40 services running on Docker Swarm, of which half are in-house and half are external. Basically all config is stored in git, using text files + Docker volume mounts, envvars, or Swarm secrets as appropriate. This works great. Pomerium replaced for us a homegrown auth + nginx solution which was configured with text files. The initial setup was frustrating, because pomerium crashes on any invalid config with arcane error messages. Some of it is user error I could resolve by reading the documentation more carefully, but sometimes pomerium’s documentation or code was wrong. Nginx requires magical incantations, too, but
(1) nginx -t to test config works very well
(2) StackOverflow and friends supplement the official docs
(3) if e.g. a single route of nginx is misconfigured the rest remains running. Pomerium is all-or-nothing

Once I get everything setup life is good, and pomerium can fade into the background for a few months. The breaking changes are rough, though. I’ve experienced 4 inthe past year or so. They are undocumented in the Release Notes, there’s basically no deprecation period, they break the whole install in one go, and it’s difficult to see them coming since they’re discussed in secret on the internal bugtracker.

We’re a small shop and don’t have the scale to need Enterprise features. I’m happy to pay for an Enterprise license as a support contract and stay on the OSS Core. Doing everything in the GUI would mean losing git, losing the ability to rollback bad config easily, maintaining another database (omewhat incompatible with the Core PSQL), using much sparser documentation, and worrying about recovery when the console breaks. Maybe I’m a stubborn fool, of course :slight_smile:

Would something like running pomerium with additional option —dry-run work for your purposes?

Yes, this would be great. It’s basically what #3222 is asking for and would allow me to test quickly before deploying. Even better if it gives useful error messages and has an output (or error code) that’s easily scriptable. Thanks for your consideration.

Yes pomerium watches not just the config file but also all other referenced files for changes.

In my experience, the hot-reload of config has never worked with volume mounts on Docker. (It’s probably a Docker problem where the underlying filesystem change events are not coming all the way up.)