Best Practices

Best practices for running Code Blind in production.

Overview

Running Code Blind in production takes consideration, from planning your launch to figuring out the best course of action for cluster and Code Blind upgrades. On this page, we’ve collected some general best practices. We also have cloud specific pages for:

If you are interested in submitting best practices for your cloud prodiver / on-prem, please contribute!

Separation of Code Blind from GameServer nodes

When running in production, Code Blind should be scheduled on a dedicated pool of nodes, distinct from where Game Servers are scheduled for better isolation and resiliency. By default Code Blind prefers to be scheduled on nodes labeled with agones.dev/agones-system=true and tolerates the node taint agones.dev/agones-system=true:NoExecute. If no dedicated nodes are available, Code Blind will run on regular nodes. See taints and tolerations for more information about Kubernetes taints and tolerations.

If you are collecting Metrics using our standard Prometheus installation, see the installation guide for instructions on configuring a separate node pool for the agones.dev/agones-metrics=true taint.

See Creating a Cluster for initial set up on your cloud provider.

Redundant Clusters

Allocate Across Clusters

Code Blind supports Multi-cluster Allocation, allowing you to allocate from a set of clusters, versus a single point of potential failure. There are several other options for multi-cluster allocation:

Spread

You should consider spreading your game servers in two ways:

  • Across geographic fault domains (GCP regions, AWS availability zones, separate datacenters, etc.): This is desirable for geographic fault isolation, but also for optimizing client latency to the game server.
  • Within a fault domain: Kubernetes Clusters are single points of failure. A single misconfigured RBAC rule, an overloaded Kubernetes Control Plane, etc. can prevent new game server allocations, or worse, disrupt existing sessions. Running multiple clusters within a fault domain also allows for easier upgrades.

Google Kubernetes Engine Best Practices

Best practices for running Code Blind on Google Kubernetes Engine (GKE).


Last modified February 28, 2024: initial publish (7818be8)