GKE container startup using cloud-sql-proxy sidecar

We are using the cloud-sql-proxy as a sidecar for securely connecting to Postgres Cloud SQL instance from GKE. During pod startup if the proxy isn’t fully ready and accepting connections pega-web will have issues with startup. We are talking milliseconds here.

Error from container log:
SEVERE [main] com.pega.pegarules.internal.bootstrap.PRBootstrapDataSource. Unable to connect to database.
java.sql.SQLException: Cannot create PoolableConnectionFactory (Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.)

At this point the container does not fully start and waits for the startupProbe to timeout prior to restarting the container. On the next start the container comes up without issue since the cloud-sql-proxy sidecar is running and accepting connections.

This has a huge impact on scaling workloads and take much longer than needed.

To Reproduce
Configure a sidecar container to use for connections to cloud sql from the web tier.

tier:

  • name: “web”
    custom:
    sidecarContainers:
  • name: cloud-sql-proxy
    image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.7.0
    args:
  • “–private-ip”
  • “–port=5432”
  • “<INSTANCE_CONNECTION_NAME>”
    securityContext:
    runAsNonRoot: true
    resources:
    requests:
    memory: “2Gi”
    cpu: “1”
    nodeType: “WebUser”

Expected behavior
It would be nice if there was some retry logic in the startup for connecting to the PRBootstrapDataSource or modifying the Pega images to delay startup until the sidecar is ready.

The cloud-sql-proxy has health checks. Those could be enabled and then the pega image could check to make sure the proxy is online prior to start.

https://github.com/GoogleCloudPlatform/cloud-sql-proxy/tree/main/examples/k8s-health-check#cloud-sql-proxy-health-checks

until $(curl --output /dev/null --silent --head --fail http://127.0.0.1:9090/liveness); do
    printf '.'
    sleep .1
done

Has anyone run into the this issue with GKE. Are there other potential workarounds to the issue I am running into?

@JohnB16838424 I’m assuming that this issue has since been logged with our gcs team via the MSP, but if not please do so now.

@MarijeSchillern

This issue remains unresolved despite multiple support tickets and opening issues on the Helm chart repository. Unfortunately, there has been minimal response or progress from Pega.

As it stands, we are experiencing slow startup times while scaling, which is far from ideal for our operations. We would greatly appreciate more immediate attention and guidance to address this.

@JohnB16838424 regarding ‘there has been minimal response from pega’ comment, please can you give me the INC id of the ticket so that I can chase this?

@MarijeSchillern

INC-A17115 - Pega-web-tomcat container startup using cloud-sql-proxy sidecar

@JohnB16838424 thanks for that.

I see that INC-A17115 (closed a year ago…) that you had informed GCS that you had already created an issue on the helm chart (Pega-web-tomcat container startup using cloud-sql-proxy sidecar · Issue #633 · pegasystems/pega-helm-charts · GitHub).

The Helm team stated that you could probably work around this issue by modifying the Pega images to delay startup until the sidecar is read:

“…the workaround would be to modify the images to delay startup until the sidecar is ready. That health check that you posted looks like potentially an effective tool for implementing such a delay. We do not currently have any plans to enhance the images to add a generic delay mechanism, so you’ll have to implement it for yourself. Making changes to the bootstrap process is unfortunately out of scope for this project. To request a change to that, you’ll have to reach out to Pega’s customer support to request an enhancement to the Pega Platform itself”.

You were asked to contact GCS simply to have them log an enhancement request.

In INC-A17115 they informed you that ultimately your issue was a Pega Consulting matter but they created an enhancement request FDBK-105941 (Pega-web-tomcat container startup using cloud-sql-proxy sidecar) for future consideration within the product.

GCS confirmed that since this feature is not available at the moment it is out of Pega Support scope if you want to get this prioritized please reach out to your Account Executive and present him the feedback item.

If you really want to implement this as mentioned earlier your best option would be Pega Consulting until Engineering work on this enhancement request.