Elasticsearch yellow health status when using routing.allocation.same_shard.host

January 17, 2024
Tomasz Dzierżanowski

1. Introduction

When routing.allocation.same_shard.host:true setting is setup and you starting two Elasticsearch instances on same physical host then it causing YELLOW cluster health status.

This is because .security index and other system indexes have setting”auto_expand_replicas”: “0-all” which makes number of replicas equal to (number of nodes – 1) so one node will have primary shard and all the rest will have replica shard. Because routing.allocation.same_shard.host do not allow to allocate replica on same physical machine then index health is yellow and therefore cluster health is yellow.

In this article I am going to show you how to reproduce this issue and how to fix it.

2. Start Elasticsearch cluster

In order to reproduce issue you have to run 2 ELK instances with same IP address. If you run docker containers using bridge network then each will obtain different IP address and experiment will not work. If you run ‘Host’ docker network driver then your Elasticsearch nodes will share same IP value.

Running docker with host network driver require Linux as host operating system therefore you have to run it on Linux virtual machine or just Linux.

2.1 Running first node – cluster still green

Please notice that in below command network is set as host and environment variable setting cluster.routing.allocation.same_shard.host as true.

				
					docker run --rm \
--name mac01 \
--net host \
-e cluster.routing.allocation.same_shard.host="true" \
-e ES_JAVA_OPTS="-Xms1g -Xmx1g" \
-e node.name="mac01" \
docker.elastic.co/elasticsearch/elasticsearch:8.11.1

Run command to set password for elastic user. This will also create .security index. As for now cluster is still green from health perspective.

				
					docker exec -it mac01 bash -c "(mkfifo pipe1); ( (elasticsearch-reset-password -u elastic -i < pipe1) & ( echo $'y\n123456\n123456' > pipe1) );sleep 5;rm pipe1"

Checking status of cluster health:

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/_cluster/health?pretty"

response:

				
					{
  "cluster_name" : "docker-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 1,
  "active_shards" : 1,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

2.2 Running second node – cluster become yellow

Running second node will trigger increase number of replicas for .security index to 1 (number of nodes equal to 2 minus 1) and that an issue because replica and primary shard cannot stay on the same physical host due to setting you added “cluster.routing.allocation.same_shard.host”

				
					token=`docker exec -it mac01 elasticsearch-create-enrollment-token -s node | tr -d '\r\n'`

docker run --rm \
--name mac02 \
--net host \
-d \
-e ENROLLMENT_TOKEN=$token \
-e cluster.routing.allocation.same_shard.host="true" \
-e node.name="mac01" \
-m 1GB docker.elastic.co/elasticsearch/elasticsearch:8.11.1

checking status:

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/_cluster/health?pretty"

response:

				
					{
  "cluster_name" : "docker-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 1,
  "active_shards" : 1,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

3. Status of .security index shards allocation

Checking allocation status will tell you that replica is not assigned to any Elasticsearch node which makes cluster yellow.

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/_cat/shards?v&s=state:asc&index=.security"

response:

				
					index       shard prirep state      docs  store dataset ip          node
.security-7 0     r      UNASSIGNED                                 
.security-7 0     p      STARTED       2 12.8kb  12.8kb 10.211.55.9 mac01

You can get further details by asking for reason

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/_cluster/allocation/explain" \
-H 'content-type: application/json' -d'
{
  "index": ".security-7",
  "shard": 0,
  "primary": false
}'

example response:

				
					{
    "index": ".security-7",
    "shard": 0,
    "primary": false,
    "current_state": "unassigned",
    "unassigned_info": {
        "reason": "REPLICA_ADDED",
        "at": "2023-11-30T01:39:42.285Z",
        "last_allocation_status": "no_attempt"
    },
    "can_allocate": "no",
    "allocate_explanation": "Elasticsearch isn't allowed to allocate this shard to any of the nodes in the cluster. Choose a node to which you expect this shard to be allocated, find this node in the node-by-node explanation, and address the reasons which prevent Elasticsearch from allocating this shard there.",
    "node_allocation_decisions": [
        {
            "node_id": "7x2VkrDxSCaqrCnVdPPETQ",
            "node_name": "mac01",
            "transport_address": "10.211.55.9:9300",
            "node_attributes": {
                "ml.allocated_processors": "2",
                "ml.allocated_processors_double": "2.0",
                "ml.max_jvm_size": "1073741824",
                "ml.config_version": "11.0.0",
                "xpack.installed": "true",
                "transform.config_version": "10.0.0",
                "ml.machine_memory": "3760881664"
            },
            "roles": [
                "data",
                "data_cold",
                "data_content",
                "data_frozen",
                "data_hot",
                "data_warm",
                "ingest",
                "master",
                "ml",
                "remote_cluster_client",
                "transform"
            ],
            "node_decision": "no",
            "deciders": [
                {
                    "decider": "same_shard",
                    "decision": "NO",
                    "explanation": "a copy of this shard is already allocated to this node [[.security-7][0], node[7x2VkrDxSCaqrCnVdPPETQ], [P], s[STARTED], a[id=bzL4580kRsOiNxoxuyciIA], failed_attempts[0]]"
                }
            ]
        },
        {
            "node_id": "YmghqXcQSxS4DUGGmLSaAA",
            "node_name": "mac01",
            "transport_address": "10.211.55.9:9301",
            "node_attributes": {
                "ml.allocated_processors": "2",
                "ml.allocated_processors_double": "2.0",
                "ml.max_jvm_size": "536870912",
                "ml.config_version": "11.0.0",
                "transform.config_version": "10.0.0",
                "xpack.installed": "true",
                "ml.machine_memory": "1073741824"
            },
            "roles": [
                "data",
                "data_cold",
                "data_content",
                "data_frozen",
                "data_hot",
                "data_warm",
                "ingest",
                "master",
                "ml",
                "remote_cluster_client",
                "transform"
            ],
            "node_decision": "no",
            "deciders": [
                {
                    "decider": "same_shard",
                    "decision": "NO",
                    "explanation": "cannot allocate to node [YmghqXcQSxS4DUGGmLSaAA] because a copy of this shard is already allocated to node [7x2VkrDxSCaqrCnVdPPETQ] with the same host address [10.211.55.9] and [cluster.routing.allocation.same_shard.host] is [true] which forbids more than one node on each host from holding a copy of this shard"
                }
            ]
        }
    ]
}

Important is explanation

				
					"cannot allocate to node [YmghqXcQSxS4DUGGmLSaAA] because a copy of this shard is already allocated to node [7x2VkrDxSCaqrCnVdPPETQ] with the same host address [10.211.55.9] and [cluster.routing.allocation.same_shard.host] is [true] which forbids more than one node on each host from holding a copy of this shard"

which prove the point. Now you can check current settings of .security index.

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/.security/_settings?pretty"

response will tell you that replica count is 1. Reasonable will be to update this setting somehow and make it 0.

				
					{
    ".security-7": {
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "hidden": "true",
                "number_of_shards": "1",
                "auto_expand_replicas": "0-1",
                "provided_name": ".security-7",
                "format": "6",
                "creation_date": "1701308345949",
                "analysis": {
                    "filter": {
                        "email": {
                            "type": "pattern_capture",
                            "preserve_original": "true",
                            "patterns": [
                                "([^@]+)",
                                "(\\p{L}+)",
                                "(\\d+)",
                                "@(.+)"
                            ]
                        }
                    },
                    "analyzer": {
                        "email": {
                            "filter": [
                                "email",
                                "lowercase",
                                "unique"
                            ],
                            "tokenizer": "uax_url_email"
                        }
                    }
                },
                "priority": "1000",
                "number_of_replicas": "1",
                "uuid": "Iz73k0DlTFeH3va3L_s69Q",
                "version": {
                    "created": "8500003"
                }
            }
        }
    }
}

4. Update system Index failure

To update/delete system indices you cannot use elastic user because it does not have privilege allow_restricted_indices.

4.1 Creating role for update system index

				
					curl -k -XPOST -u elastic:123456 "https://localhost:9200/_security/role/sec" -H 'Content-Type: application/json' -d'
{
        "indices": [
            {
                "names": [
                    "*",".*"
                ],
                "privileges": [
                    "all"
                ],
                "allow_restricted_indices": true
            }
        ]
}'

4.2 Create user with new role

				
					curl -k -XPOST -u elastic:123456 "https://localhost:9200/_security/user/sec" -H 'Content-Type: application/json' -d'
{
    "password":"123456",
    "roles": ["sec"]
}'

4.3 Trying to update system index settings directly

				
					curl -k -u sec:123456 -XPUT "https://localhost:9200/.security-7/_settings" -H 'Content-Type: application/json' -d'
{
  "index" : {
    "auto_expand_replicas": "0-0"
  }
}'

will give you an error

				
					{
    "error": {
        "root_cause": [
            {
                "type": "illegal_state_exception",
                "reason": "Cannot override settings on system indices: [.security-[0-9]+*] -> [index.auto_expand_replicas]"
            }
        ],
        "type": "illegal_state_exception",
        "reason": "Cannot override settings on system indices: [.security-[0-9]+*] -> [index.auto_expand_replicas]"
    },
    "status": 500
}

This will not work so time to call dedicated API to change system index settings

5. Using special API

				
					curl -k -u elastic:123456 -XPUT "https://localhost:9200/_security/settings" -H 'Content-Type: application/json' -d'
{
    "security": {
        "index.auto_expand_replicas": "0-0"
    }
}'

you can run again command to get current settings and confirm that auto_expand_replicas is “0-0” right now.

				
					{
    ".security-7": {
        "settings": {
            "index": {
                "routing": {
                    "allocation": {
                        "include": {
                            "_tier_preference": "data_content"
                        }
                    }
                },
                "hidden": "true",
                "number_of_shards": "1",
                "auto_expand_replicas": "0-0",
                "provided_name": ".security-7",
                "format": "6",
                "creation_date": "1701308345949",
                "analysis": {
                    "filter": {
                        "email": {
                            "type": "pattern_capture",
                            "preserve_original": "true",
                            "patterns": [
                                "([^@]+)",
                                "(\\p{L}+)",
                                "(\\d+)",
                                "@(.+)"
                            ]
                        }
                    },
                    "analyzer": {
                        "email": {
                            "filter": [
                                "email",
                                "lowercase",
                                "unique"
                            ],
                            "tokenizer": "uax_url_email"
                        }
                    }
                },
                "priority": "1000",
                "number_of_replicas": "0",
                "uuid": "Iz73k0DlTFeH3va3L_s69Q",
                "version": {
                    "created": "8500003"
                }
            }
        }
    }
}

6. Update additional indices

It might be needed to update additional indexes like .ds-ilm-history-* and .ds-.logs-deprecation.elasticsearch-default-* but these you can update using standard API call.

First list all shards to see which are not allocated

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/_cat/shards?v&s=state:asc"

example response:

				
					index                                                         shard prirep state      docs  store dataset ip          node
.ds-ilm-history-5-2023.11.30-000001                           0     r      UNASSIGNED                                 
.ds-.logs-deprecation.elasticsearch-default-2023.11.30-000001 0     r      UNASSIGNED                                 
.ds-ilm-history-5-2023.11.30-000001                           0     p      STARTED       6 18.9kb  18.9kb 10.211.55.9 mac01
.security-7                                                   0     p      STARTED       4 23.4kb  23.4kb 10.211.55.9 mac01
.ds-.logs-deprecation.elasticsearch-default-2023.11.30-000001 0     p      STARTED       1 10.5kb  10.5kb 10.211.55.9 mac01

Update settings:

				
					curl -k -u sec:123456 -XPUT "https://localhost:9200/.ds-ilm-history-5-2023.11.30-000001/_settings" -H 'Content-Type: application/json' -d'
{
  "index" : {
    "auto_expand_replicas": "0-0"
  }
}'

curl -k -u sec:123456 -XPUT "https://localhost:9200/.ds-.logs-deprecation.elasticsearch-default-2023.11.30-000001/_settings" -H 'Content-Type: application/json' -d'
{
  "index" : {
    "auto_expand_replicas": "0-0"
  }
}'

7. Elasticsearch cluster is green back again

After all above steps you reach point when your cluster is green now.

You can check shards allocation:

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/_cat/shards?v&s=state:asc"

Example response:

				
					index                                                         shard prirep state   docs  store dataset ip          node
.ds-ilm-history-5-2023.11.30-000001                           0     p      STARTED    6 18.9kb  18.9kb 10.211.55.9 mac01
.ds-.logs-deprecation.elasticsearch-default-2023.11.30-000001 0     p      STARTED    1 10.5kb  10.5kb 10.211.55.9 mac01
.security-7                                                   0     p      STARTED    4 23.4kb  23.4kb 10.211.55.9 mac01

Finally check Elasticsearch health status

				
					curl -k -u elastic:123456 -XGET "https://localhost:9200/_cluster/health?pretty"

response:

				
					{
  "cluster_name" : "docker-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 3,
  "active_shards" : 3,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Help toughcoding

If this article help you somehow please share with your friends and consider saying me "Thank you". Small gesture for You but big for Us.

Say Thank You!

Enjoy Free Useful Amazing Content

How to use node enroll API in Elasticsearch

http://TODO Table of Contents 1. Intro If you went through Elasticsearch documentation and find out enroll node API without idea

Optimizing Apache Spark Joins with Bloom Filters: Math, Code, and Benchmark

Table of Contents 1. Introduction 1.1. What Is a Bloom Filter? You may already heard about this term. It is

Elasticsearch yellow health status when using routing.allocation.same_shard.host

Table of Contents

1. Introduction

2. Start Elasticsearch cluster

2.1 Running first node – cluster still green

2.2 Running second node – cluster become yellow

3. Status of .security index shards allocation

4. Update system Index failure

4.1 Creating role for update system index

4.2 Create user with new role

4.3 Trying to update system index settings directly

5. Using special API

6. Update additional indices

7. Elasticsearch cluster is green back again

Help toughcoding

Leave a Reply Cancel reply

Enjoy Free Useful Amazing Content

Related Posts

How to use node enroll API in Elasticsearch

Optimizing Apache Spark Joins with Bloom Filters: Math, Code, and Benchmark