Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource.


This post will discuss at a high level, different load-balancing algorithms provided by NGINX, a popular web server which can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache.


The NGINX web server can act as a very capable software load balancer, in addition to its more traditional roles serving static content over HTTP and dynamic content using FastCGI handlers for scripts. Because NGINX uses a non-threaded, event-driven architecture, it is able to outperform web servers like Apache under many circumstances. This is particularly true in deployments that receive heavy loads.


For the sake of this discussion we will focus mainly on HTTP load balancing which you configure in the http context.

HTTP Load Balncing in NGINX

You enable load balancing with two configuration blocks, which we’ll show in their basic form, without optional parameters:

Server Block

The server block defines a virtual server that listens for traffic with the characteristics you define, and proxies it to a named group of upstream servers. In our examples, the virtual server listens on the default port (80) for the HTTP traffic sent to www.example.com, and proxies it to the upstream server group called backend. This block is the same in all our examples.

server {
    server_name www.example.com;

    location / {
       proxy_pass http://backend;
    }
}

Upstream Block

The upstream block names an upstream group and lists the servers that belong to it, identified by hostname, IP address, or UNIX‑domain socket path. In our examples, the upstream group called backend includes three servers: web1, web2, and web3.

upstream backend {
    server web1;
    server web2;
    server web3;
}

Load Balancing Methods

Round-Robin

The default load-balancing method, which distributes requests in order of the list of servers in the upstream pool. The load balancer runs through the list of upstream servers in sequence, assigning the next connection request to each one in turn. As this is the default, our above configuration would not need to change:

upstream backend {
    server web1;
    server web2;
    server web3;
}

server {
    server_name www.example.com;

    location / {
       proxy_pass http://backend;
    }
}

Weighted-Round-Robin

Weight can be taken into consideration for a weighted round robin, which could be used if the capacity of the upstream servers varies. The higher the integer value for the weight, the more favored the server will be in the round robin. The algorithm behind weight is simply statistical probability of a weighted average. The weight parameter to the server directive sets the weight of a server; the default is 1.

upstream backend {
    server web1 weight=4;
    server web2 weight=3;
    server web3;
    server web4 backup;
}

In this example, web1 has weight of 4, web2 a weight of 3, web3 the default weight of 1 the other two servers have the default weight (1), but web4 is marked as a backup server and does not receive requests unless all of the other servers are unavailable. With this configuration of weights, out of every 8 requests, 4 are sent to web1, 3 to web2 and 1 to web3.

Least Connections

Another load-balancing method provided by NGINX. This method balances load by proxying the current request to the upstream server with the least number of open connections proxied through NGINX. Least-connected allows controlling the load on application instances more fairly in a situation when some of the requests take longer to complete. Least connections, like round robin, also takes weights into account when deciding to which server to send the connection. The directive name is least_conn:

upstream backend {
    least_conn;
    server web1;
    server web2;
    server web3;
}

Least Time (available in NGINX Plus)

Similar to least connections in that it proxies to the upstream server with the least number of current connections but favors the servers with the lowest average response times. This method is one of the most sophisticated load-balancing algorithms out there and fits the need of highly performant web applications. This algorithm is a value add over least connections because a small number of connections does not necessarily mean the quickest response. The directive name is least_time.

IP Hash

Only supported for HTTP. IP hash uses the client IP address as the hash. Slightly different from using the remote variable in a generic hash, this algorithm uses the first three octets of an IPv4 address or the entire IPv6 address. This method ensures that clients get proxied to the same upstream server as long as that server is available, which is extremely helpful when the session state is of concern and not handled by shared memory of the application. This method also takes the weight parameter into consideration when distributing the hash. The directive name is ip_hash.

upstream backend {
    ip_hash;
    server web1;
    server web2;
    server web3;
}

If one of the servers needs to be temporarily removed from the load‑balancing rotation, it can be marked with the down parameter in order to preserve the current hashing of client IP addresses. Requests that were to be processed by this server are automatically sent to the next server in the group:

upstream backend {
    ip_hash;
    server web1;
    server web2;
    server web3 down;
}

Generic Hash

The administrator defines a hash with the given text, variables of the request or runtime, or both. NGINX distributes the load amongst the servers by producing a hash for the current request and placing it against the upstream servers. This method is very useful when you need more control over where requests are sent or determining what upstream server most likely will have the data cached. Note that when a server is added or removed from the pool, the hashed requests will be redistributed. This algorithm has an optional parameter, consistent , to minimize the effect of redistribution. The optional consistent parameter to the hash directive enables ketama consistent‑hash load balancing. Requests are evenly distributed across all upstream servers based on the user‑defined hashed key value. If an upstream server is added to or removed from an upstream group, only a few keys are remapped which minimizes cache misses in the case of load‑balancing cache servers or other applications that accumulate state. The directive name is hash.

upstream backend {
    hash;
    server web1;
    server web2;
    server web3;
}

Random

Each request will be passed to a randomly selected server. If the two parameter is specified, first, NGINX randomly selects two servers taking into account server weights, and then chooses one of these servers using the specified method. The Random load balancing method should be used for distributed environments where multiple load balancers are passing requests to the same set of backends. For environments where the load balancer has a full view of all requests, use other load balancing methods, such as round robin, least connections and least time.

Closing

Each of these load-balancing methods also has a multitude of other parameters which can be paired in different combinations in one or several of the load balancing algorithms mentioend above. This allwos for advanced custom tailoring for more complex needs.