No healthy upstream with all in one container

What happened?

I would like to setup Pomerium to handle SSO for several docker based services. The goal is to have it serve as a proxy to the other services and protect with simple SSO where each service is running via separate docker compose deployments and connect through bridge network.

What did you expect to happen?

  1. SSO via https://site.domain.com
  2. Access services through URI prefixes such as /qbt
  3. Service accessed

How’d it happen?

  1. Go to https://site.domain.com/qbt
  2. Get SSO prompt and authenticate
  3. Saw error no healthy upstream

What’s your environment like?

  • Pomerium version (retrieve with pomerium --version): v0.19.1
  • Server Operating System/Architecture/Cloud: Docker 20.10.18 on Fedora Core 35

What’s your config.yaml?

# See detailed configuration settings : https://www.pomerium.com/docs/reference/


# this is the domain the identity provider will callback after a user authenticates
authenticate_service_url: https://auth.domain.com:8443/

####################################################################################
# Certificate settings:  https://www.pomerium.com/docs/reference/certificates.html #
# The example below assumes a certificate and key file will be mounted to a volume #
# available to the  Docker image.                                                  #
####################################################################################
certificate_file: /pomerium/cert.pem
certificate_key_file: /pomerium/privkey.pem


##################################################################################
# Identity provider settings : https://www.pomerium.com/docs/identity-providers/ #
# The keys required in this section vary depending on your IdP. See the          #
# appropriate docs for your IdP to configure Pomerium accordingly.               #
##################################################################################
idp_provider: google
idp_client_id: supersecret
idp_client_secret: supersecret
idp_service_account: supersecret

# Generate 256 bit random keys  e.g. `head -c32 /dev/urandom | base64`
cookie_secret: supersecret

# https://pomerium.com/reference/#routes
routes:
  - from: https://auth.domain.com:8443
    to: http://verify
    policy:
      - allow:
          or:
            - domain:
                is: foo.com
    pass_identity_headers: true    
  - from: https://site.domain.com:8443
    to: http://web1
    policy:
      - allow:
          or:
            - domain:
                is: chipnick.com
    pass_identity_headers: true
    allow_websockets: true
  - from: https://site.domain.com:8443
    to: http://192.168.178.01:8080/qbt
    prefix: /qbt
    policy:
      - allow:
          or:
            - domain:
                is: foo.com
    pass_identity_headers: true
  - from: https://site.domain.com:8443
    to: http://192.168.178.01:8989/sonarr
    prefix: /sonarr
    policy:
      - allow:
          or:
            - domain:
                is: foo.com
    pass_identity_headers: true

What did you see in the logs?

pomerium    | {"level":"info","service":"envoy","upstream-cluster":"","method":"GET","authority":"site.domain.com:8443","path":"/qbt/","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","referer":"","forwarded-for":"192.168.178.01","request-id":"b7ca0787-e7c0-4fc2-a689-06a00f307120","duration":26.18886,"size":19,"response-code":503,"response-code-details":"no_healthy_upstream","time":"2022-10-03T20:44:39+02:00","message":"http-request"}

Additional context

I tried adding a simple hello world web server container and received the same error. docker-compose file below:

version: "3"
services:
  pomerium:
    image: pomerium/pomerium:latest
    container_name: pomerium
    network_mode: bridge
    environment:
      - PUID=960
      - PGID=1002
      - TZ=Europe/Berlin
    volumes:
      - /var/opt/ssl/certificates/site.domain.com.crt:/pomerium/cert.pem:ro
      - /var/opt/ssl/certificates/site.domain.com.key:/pomerium/privkey.pem:ro
      - ./config/config.yaml:/pomerium/config.yaml:ro
    ports:
      - 8443:443
    restart: unless-stopped
  verify:
    image: pomerium/verify:latest
    expose:
      - 8000
  web1:
    image: strm/helloworld-http
    network_mode: bridge
    expose:
      - 80

you are overwriting a route to your authentication service. you do not need to do that. this route would be handled internally by Pomerium in the all in one mode.

authenticate_service_url: https://auth.domain.com:8443/

thus if you want to see results of verify, you need define it as a separate route - i.e.

routes:
  - from: https://verify.domain.com:8443
    to: http://verify
    policy:
      - allow:
          or:
            - domain:
                is: foo.com
    pass_identity_headers: true    

I wasn’t aware of that, I started with the default config. I’m still getting the no healthy upstream when trying to access https://site.domain.com:8443.

the error below indicates indeed that there are no healthy upstream. that usually happens under two scenarios:

  • there is a DNS error: site.domain.com cannot be resolved into an IP address, or
  • the upstream was excluded due to either active or passive health check.
pomerium    | {"level":"info","service":"envoy","upstream-cluster":"","method":"GET","authority":"site.domain.com:8443","path":"/qbt/","user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36","referer":"","forwarded-for":"192.168.178.01","request-id":"b7ca0787-e7c0-4fc2-a689-06a00f307120","duration":26.18886,"size":19,"response-code":503,"response-code-details":"no_healthy_upstream","time":"2022-10-03T20:44:39+02:00","message":"http-request"}

as you do not have any of the health checks configured, it is most likely a DNS issue.

Please try creating the following simple route and see if you can make it work:

routes: 
  - from: https://httpbin.domain.com:8443
     to: https://httpbin.org
     allow_public_unauthenticated_access: true

and see if you have it working.

if it does not, please enable

log_level: debug

and look for the errors related to your cluster.