Massive RAM usage while using K8s CRD

raicio · January 20, 2023, 9:11am

Hi everyone!

What happened?

We have switched to pomerium all-in-one with kubernetes CRD for both HTTPS and TCP rules.
Since then, RAM usage has skyrocketed and pods take a long time to come online.
Persistence is configured with postgres.

If the TCP rules are removed from the CRD, RAM usage is fine and startup times are also fine.

What’s your environment like?

Pomerium version (retrieve with pomerium --version):
Server Operating System/Architecture/Cloud:

What’s your config.yaml?

using CRD

What did you see in the logs?

During runtime, this error message is printed in several occurences:

“Deprecated field: type envoy.type.matcher.v3.RegexMatcher Using deprecated option \‘envoy.type.matcher.v3.RegexMatcher.google_re2\’ from file regex.proto. This configuration will be removed from Envoy soon. Please see Version history — envoy 1.29.0-dev-be410d documentation for details. If continued use of this field is absolutely necessary, see Runtime — envoy 1.29.0-dev-be410d documentation for how to apply a temporary and highly discouraged override.”

no other errors are printed.

Additional context

Please find attached the monitoring screenshots.

raicio · January 20, 2023, 10:00am

Here is the log after a pod crash:

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

10

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

9

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

8

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

7

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

6

{"level":"error","syncer_id":"databroker","syncer_type":"type.googleapis.com/pomerium.config.Config","error":"error receiving sync record: rpc error: code = Unavailable desc = error reading from server: EOF","time":"2023-01-20T09:52:00Z","message":"sync"}

5

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

4

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

3

{"level":"error","error":"rpc error: code = Canceled desc = context canceled","time":"2023-01-20T09:52:00Z","message":"access log stream error, disconnecting"}

2

{"level":"error","syncer_id":"databroker","syncer_type":"type.googleapis.com/pomerium.config.Config","error":"error calling sync: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:36961: connect: connection refused\"","time":"2023-01-20T09:52:01Z","message":"sync"}

1

{"level":"fatal","pid":19,"time":"2023-01-20T09:52:01Z","message":"envoy: subprocess exited"}

This happens sometimes on startup and sometimes when adding or editing a rule via CRD.

denis · January 20, 2023, 3:27pm

Hi,

Could you please elaborate which release did you upgraded from and to?

Could you please give us some estimate wrt amount of your Ingress resources and how many of them use TCP? I assume you mean Ingress objects annotated with tcp_upstream: true Ingress Configuration | Pomerium ?

denis · January 20, 2023, 3:44pm

this is not a crash but rather various Pomerium internal services winding down (note the context cancelled). there must be some other errors earlier in the log that actually caused the issue.
When you say CRD do you mean Ingress resource or Pomerium global settings? I’m a bit confused here.

There’s a lot happening here, would you join our Slack channel to try troubleshoot it synchronously?

raicio · January 20, 2023, 3:58pm

Hi @denis ,

we were running 0.19.1 via helm chart, and swiched to latest version via all in one deployment.

Sure thing, i already am in the slack channel.

Thank you

raicio · January 20, 2023, 3:59pm

I confirm that we are using ingress TCP resources:

447 ingresses, of which 341 of them are TCP.

thank you

raicio · January 20, 2023, 4:11pm

Here is the config:

pomerium.yaml

apiVersion: ingress.pomerium.io/v1
kind: Pomerium
metadata:
name: global
spec:
secrets: pomerium/bootstrap
authenticate:
url: https://authenticate.redacted
identityProvider:
provider: oidc
url: https://keycloak.redacted
secret: pomerium/idp
certificates:
- pomerium/pomerium-wildcard-tls
storage:
postgres:
secret: pomerium/dbsecret
jwtClaimHeaders:
additionalProperties: email, groups, user, preferred_username

example tcp ingress

apiVersion: v1
kind: Service
metadata:
name: redacted-ssh-service-tcp
spec:
type: ExternalName
externalName: ‘redacted’
ports:
- protocol: TCP
name: ssh
port: 22

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: jente-demo-ssh-ssh-ingress-tcp
namespace: pomerium
annotations:
ingress.pomerium.io/tcp_upstream: ‘true’
ingress.pomerium.io/allowed_idp_claims: |
groups:
- redacted
preferred_username:
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
- redacted
spec:
ingressClassName: pomerium
tls:
- hosts:
- redacted
secretName: pomerium-wildcard-tls
rules:
- host: redacted
http:
paths:
- pathType: ImplementationSpecific
backend:
service:
name: redacted
port:
name: ssh

Topic		Replies	Views
Databroker stuck getting no_helathy_upstream errors Support	1	333	September 30, 2022
Pomerium-databroker pod CrashLoopBackOff Support k8s	2	290	June 2, 2022
Getting connection_failure errors Support	12	1592	March 23, 2022
404 response for all routes after upgrade - probable config error Support k8s	3	1099	August 9, 2023
Upstream connect error with Pomerium Ingress Controller Support k8s	2	389	December 21, 2021

Massive RAM usage while using K8s CRD

What happened?

What’s your environment like?

What’s your config.yaml?

What did you see in the logs?

Additional context

apiVersion: v1 kind: Service metadata: name: redacted-ssh-service-tcp spec: type: ExternalName externalName: ‘redacted’ ports: - protocol: TCP name: ssh port: 22

Related topics

apiVersion: v1
kind: Service
metadata:
name: redacted-ssh-service-tcp
spec:
type: ExternalName
externalName: ‘redacted’
ports:
- protocol: TCP
name: ssh
port: 22