This website use cookies to help you have a superior and more admissible browsing experience on the website.
Loading...
The “no healthy upstream” error in vCenter usually hits at the worst time — often right after a reboot or during maintenance, and almost always paired with a 503 Service Unavailable response.
At its core, the error means the Envoy proxy can’t reach a backend service like vpxd, vsphere-ui, or vapi-endpoint. The cause is usually one of two things: disk space exhaustion or an expired certificate.
This guide walks you through 5 troubleshooting fixes to find the root cause and get your vSphere environment back online quickly.
Here are the most common technical causes for this vCenter no healthy upstream error.
Once you understand the underlying reasons for the vCenter no healthy upstream error, you can take a systematic approach to remediate the services. This error generally indicates that the Envoy proxy cannot find a functional backend service to route your request to.
Whether you are experiencing no healthy upstream after reboot or during normal operations, following these technical steps will help you identify and resolve the failure.
Before modifying any configurations, ensure the VCSA has the “breathing room” to run its processes. Disk exhaustion is a primary reason services fail to initialize.
df -h
What to look for: Closely examine the output for any partition at 100% capacity. In a VCSA environment, /storage/log and /storage/core are the most frequent culprits.
The Fix: If /storage/log is full, you may need to clear old compressed logs manually. If the environment has outgrown its current allocation, expand the virtual disk in the ESXi settings. Afterward, run the following to trigger an internal resize and service refresh:
service-control --stop --all && service-control --start --all
Certificate expiration is the main possible technical failure leading to the vCenter no healthy upstream message. If the underlying Security Token Service (STS) or Machine SSL certificates are invalid, services cannot mutually authenticate.
Always take a file-based backup or a VM snapshot before performing a certificate reset. For a more secure and efficient backup solution, consider using i2Backup.
Run this command to check expiration dates across all VCSA stores:
for i in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list); do echo "STORE: $i"; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $i --text | grep -ie "Not After"; done
The Fix: If any certificates are listed as expired, you need to launch the VMware Certificate Manager utility:
/usr/lib/vmware-vmca/bin/certificate-manager
In most “No Healthy Upstream” scenarios caused by total certificate failure, Option 8 (Reset all Certificates) is the most effective choice for a full recovery.
If disks are clear and certificates are valid, the “upstream” (the service itself) may simply be stopped or crashed.
To identify which specific service is unhealthy:
service-control --status --all
What to look for: Check if vsphere-ui (the HTML5 client) or vpxd (the core vCenter service) is in a “Stopped” state.
The Fix: If critical services are stopped, attempt a clean manual restart of the entire service stack:
service-control --stop --all
service-control --start --all
vSphere services rely heavily on precise timing and name resolution. If the VCSA cannot “see” itself or if its clock has drifted, tokens will be rejected.
nslookup <vcenter-fqdn>
date
The Fix: If the time is off by more than a few minutes, the Security Token Service will reject authentication requests, triggering the upstream error. Correct the time via the vCenter Management Interface (VAMI) or the date command and ensure NTP is synchronized.
If your services appear to be running according to service-control, but you still encounter the “no healthy upstream” error, the issue likely lies within the Java runtime of the vSphere Client. When the service starts but cannot fully initialize, the “Final Clue” is hidden in the wrapper logs.
Access the shell and tail the vSphere Client log:
tail -n 100 /var/log/vmware/vsphere-ui/logs/vsphere_client_virgo.log
What to look for:
lang.OutOfMemoryError: This confirms that the VCSA does not have enough physical RAM allocated to support the Java Heap size required by the UI service. This is a common culprit for vCenter no healthy upstream after reboot following an upgrade.lang.NullPointerException: Often indicates a corrupted plugin or a failure to communicate with the lookup service.The Fix: If you identify an OutOfMemoryError, you need to shut down the VCSA and increase the memory allocation in the vSphere settings.
If you see plugin-related exceptions, you may need to clear the Serenity database or unregister stale extensions using the Managed Object Browser (MOB).
Q1: Why does vCenter show “no healthy upstream” even after restarting services?
A: Most likely, the root cause (expired certificates, low RAM, or DNS issues) wasn’t fixed. Check the logs or run the quick troubleshooting commands to find the issue.
Q2: Can I fix vCenter no healthy upstream without SSH access?
A: Yes. Use the ESXi DCUI to access the VCSA shell, or use the VAMI to fix time/DNS issues. For certificate resets, SSH is easier but not always required.
Q3: Is it safe to use a VM snapshot to fix the error?
A: Snapshot rollback can work for temporary fixes, but it’s not recommended long-term. Always take a file-based backup before rolling back, and fix the root cause (e.g., expired certificates) afterward.
Q4: Why does df -h show disk space normal but I still get the error?
A: The issue is likely DNS, time sync, or a service deadlock. Check DNS resolution and time first, then restart all services.
Q5: How do I check if Envoy proxy is running in vCenter 8.0?
A: Run this command: service-control –status envoyproxy. If it’s stopped, restart it with service-control –start envoyproxy.
Q6: Do I need to restart vCenter after fixing certificates?
A: Yes. After resetting certificates, restart all services with service-control –stop –all && service-control –start –all to apply changes.
Q7: Why does the error happen only after a vCenter upgrade?
A: Upgrades often increase RAM requirements or don’t auto-renew certificates. Check RAM allocation and certificate expiration first.
Q8: Can plugin issues cause vCenter no healthy upstream?
A: Yes. Corrupted or outdated plugins can crash the vsphere-ui service. Check the UI logs for NullPointerException and unregister stale plugins.
Q9: What’s the minimum RAM for vCenter 8.0 to avoid this error?
A: 16GB RAM is recommended. 12GB is the minimum, but it’s more likely to trigger OOM Killer and the no healthy upstream error.
Q10: How long should I wait after restarting vCenter before troubleshooting?
A: Wait 10–20 minutes. Services can take time to fully start, especially after a reboot or upgrade. If the error still shows after 20 minutes, start troubleshooting.
The vCenter no healthy upstream error is essentially a “smoke signal” indicating that the VCSA’s core services are unable to communicate. While the 503 message is a generic response from the reverse proxy, identifying the root cause – be it expired certificates, exhausted disk space, or a service crash – is straightforward when using a systematic CLI approach.
To effectively fix no healthy upstream error, always prioritize checking your STS certificates and partition health first. By maintaining proper resource overhead and monitoring your service logs, you can prevent no healthy upstream after reboot scenarios and ensure your vSphere environment remains stable. Always remember to take a file-based backup or VM snapshot before performing significant certificate or service remediation.