Kubernetes Troubleshooting
invalid host in "tcp://<ip>:8080" of the "listen" directive
Kubernetes auto-injects service addresses as env vars (SERVICENAME_PORT=tcp://...), which the nginx then tries to bind to.
Disable service-link env vars on the pod:
apiVersion: apps/v1
kind: Deployment
metadata:
name: romm
namespace: romm
spec:
template:
spec:
enableServiceLinks: false # ← this line
Large uploads rejected with 413 Request Entity Too Large
The ingress controller is capping request body size. Default for nginx-ingress is 1 MB, which won't survive a single ROM upload.
Add the annotation:
Traefik equivalent:
For Cloudflare, check plan limits, as the free tier caps uploads at 100 MB regardless of what your cluster allows.
WebSockets disconnect immediately
Ingress isn't forwarding the WebSocket upgrade.
nginx-ingress:
metadata:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
permission denied writing to /romm/resources
The container is running as a non-root user but the PVC came up with wrong ownership. Two fixes:
- Init container that chowns the PV on first start:
initContainers:
- name: fix-permissions
image: busybox
command:
[
"sh",
"-c",
"chown -R 1000:1000 /romm/resources /romm/assets /romm/config /redis-data",
]
volumeMounts:
- { name: resources, mountPath: /romm/resources }
- { name: assets, mountPath: /romm/assets }
- { name: config, mountPath: /romm/config }
- { name: redis-data, mountPath: /redis-data }
securityContext:
runAsUser: 0
- Storage class that supports
fsGroup: addfsGroup: 1000to the pod'ssecurityContext. Works on most CSI drivers but not all.
Pod can reach the DB but crashes with ConnectionRefused
RomM starts before the DB is ready, fails, and crashlooped-restarts forever because the restart is too fast for the DB to catch up.
Fix: add an init container that waits, or a readinessProbe + generous startupProbe on the DB StatefulSet so the app pod doesn't start until the DB is reachable.
Scheduler tasks don't run
If Redis is an external service and the pod can't reach it, scheduled tasks silently don't fire.
Check:
kubectl exec -n romm deploy/romm -- redis-cli -h $REDIS_HOST -p $REDIS_PORT -a $REDIS_PASSWD ping
# should print "PONG"
If that fails, the network policy is blocking the pod or the Redis service name doesn't resolve.
OOMKilled during large scans
Scans of big libraries with hash calculation spike memory. Default pod memory limits are often too tight.
Raise the limit:
Or disable hashing on the Scan page to cut memory use by ~80% (you lose RetroAchievements + Hasheous matching, see Metadata Providers).
Still stuck?
- Full install reference: Kubernetes
- Discord
#romm-supportchannel