Search
Close this search box.

Search Exact Text Values with Elasticsearch

Table of Contents

18+ Secrets from Elasticsearch Golden Contributor no shadow

FREE course

This FREE tutorial is part of the video course 18+ Secrets From Elasticsearch Golden Contributor where I am giving you best tips and tricks about ELK stack that I have collected during my professional carrier. Check it out in Course tab.

1. Introduction

Imagine you want to find documents by exact value and you have text field. Technically you can do term or match queries. Decision depends on the mapping and data you want to search.

In this tutorial I want to show you how to find exact match in text fields.

2. Start ELK cluster

Please execute below commands in terminal to start Elasticsearch cluster

				
					docker network create elkai

docker run --rm \
--name elk \
--net elkai \
-v "es-config:/usr/share/elasticsearch/config:rw" \
-e ES_JAVA_OPTS="-Xms4g -Xmx4g" \
-d \
-p 9200:9200 \
docker.elastic.co/elasticsearch/elasticsearch:8.15.2

docker exec -it elk bash -c "(mkfifo pipe1); ( (elasticsearch-reset-password -u elastic -i < pipe1) & ( echo $'y\n123456\n123456' > pipe1) );sleep 5;rm pipe1"

RESPONSE_JSON=`curl -k -XGET -u elastic:123456 "https://localhost:9200/_security/enroll/kibana"`
kibanatoken=$(echo "$RESPONSE_JSON" | jq -r '.token.value')

docker run --rm \
--name kibana \
--net elkai \
--volume "es-config:/es-config:ro" \
-d \
-p 5601:5601 \
-e ELASTICSEARCH_SSL_VERIFICATIONMODE=certificate \
-e ELASTICSEARCH_HOSTS=https://elk:9200 \
-e ELASTICSEARCH_SERVICEACCOUNTTOKEN=$kibanatoken \
-e ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=/es-config/certs/http_ca.crt \
docker.elastic.co/kibana/kibana:8.15.2
				
			

Note that http_ca created by elasticsearch container is reused by kibana container.

3. Load example data

Open up Kibana and visit Dev Console, over there you can execute below commands.

				
					PUT products
{
  "mappings": {
    "properties": {
      "description": {
        "type": "text"
      },
      "product_id": {
        "type": "text"
      }
    }
  }
}
				
			
				
					POST _bulk
{"index":{"_index":"products","_id":1}}
{"description":"Running Shoes, Red","product_id":"RS-001"}
{"index":{"_index":"products","_id":2}}
{"description":"Red Running Shoes","product_id":"RS-002"}
{"index":{"_index":"products","_id":3}}
{"description":"Blue Hiking Boots","product_id":"HB-001"}
{"index":{"_index":"products","_id":4}}
{"description":"Hiking Boots, Leather, Brown","product_id":"HB-002"}
{"index":{"_index":"products","_id":5}}
{"description":"Running-Shoes Special Edition","product_id":"RS-003"}
				
			

4. Using term search

People say avoid using term query for text fields.
Because during indexing, analyzer tokenize fields and they will be hard to find by particular term.

What if field always contain single word?

still can be tokenized
like `id-1111` can be changed into `[id, 1111]`
and will give you both results when id-1111 and for example id-2222

Let’s say you want to find product with ID RS-002, running term query with “RS-002” will return zero matches for you

				
					GET products/_search?pretty
{
  "query": {
    "term": {
      "product_id": "RS-002"
    }
  }
}
				
			

this because during indexing the default standard analyzer has removed punctuation, split value into tokens and lowercase them.

Therefore even below query will not work as you are looking for uppercase.

				
					GET products/_search?pretty
{
  "query": {
    "term": {
      "product_id": "RS"
    }
  }
}
				
			

calling same request with lowercase on another hand will return all 3 products starting with id “rs”

				
					GET products/_search?pretty
{
  "query": {
    "term": {
      "product_id": "rs"
    }
  }
}
				
			

Instead you can use terms set and predict possible tokens

				
					GET /products/_search
{
  "query": {
    "terms_set": {
      "product_id": {
        "terms": [ "rs","002"],
        "minimum_should_match": 2
      }
    }
  }
}
				
			

means 2 tokens rs & 002. This returns 1 record. Although there is a catch which I will explain later in this tutorial.

5. Using match queries(for text)

5.1. match

Simple match query will not work, because it searching for existence of 2 tokens rs or 002. This is because standard analyzer is used against “RS-002” and then text field “product_id” is search for resulting tokens instead of exact phrase.

 

				
					GET /products/_search
{
  "query": {
    "match": {
      "product_id": "RS-002"
    }
  }
}
				
			

4 documents will be returned

To fix that you can change query operator from default ‘or’ to ‘and’

				
					GET /products/_search
{
  "query": {
    "match": {
      "product_id": {
        "query": "RS-002",
        "operator": "and"
      }
    }
  }
}
				
			

or write explicitly boolean query

				
					GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "product_id": "RS"
          }
        },
        {
          "match": {
            "product_id": "002"
          }
        }
      ]
    }
  }
}
				
			

5.2. catchy product

Like mentioned before there is a catch … Just insert new record

				
					POST _bulk
{"index":{"_index":"products","_id":6}}
{"description":"Catchy Product to make you fail","product_id":"002-RS"}
				
			

And now all your previous “good” queries will return 2 records even if you are asking exactly for “RS-002”

What to do?

5.3. match_phrase

More suitable for matching exact string value is phrase matching call.
This will let you get exact match.

				
					GET /products/_search
{
  "query": {
    "match_phrase": {
      "product_id": "RS-002"
    }
  }
}
				
			

but this is not the final solution …

5.4. catchy product continue

if you insert new record

				
					POST _bulk
{"index":{"_index":"products","_id":7}}
{"description":"Is there a hope","product_id":"dd RS-002 aaaa"}
				
			

now previous query do not exclude possibility of surrounding tokens. Phrase must exist within the text, that’s it.

				
					GET /products/_search
{
  "query": {
    "match_phrase": {
      "product_id": {
        "query": "RS-002",
        "slop": 0
      }
    }
  }
}
				
			

And it’s returning

				
					    "hits": [
      {
        "_index": "products",
        "_id": "2",
        "_score": 1.0012583,
        "_source": {
          "description": "Red Running Shoes",
          "product_id": "RS-002"
        }
      },
      {
        "_index": "products",
        "_id": "7",
        "_score": 0.7270006,
        "_source": {
          "description": "Is there a hope",
          "product_id": "dd RS-002 aaaa"
        }
      }
    ]
				
			

6. Solutions of the problem

6.1. keyword

Keyword type is not analysed and stays as it is.
If you do not want to reindex. Use runtime mapping

				
					PUT products/_mapping
{
    "runtime": {
      "product_id": {
        "type": "keyword"
      }
    }
}
				
			

Now both terms and match queries will return exact match.

				
					GET products/_search?pretty
{
  "query": {
    "term": {
      "product_id": "RS-002"
    }
  },
  "fields": ["*"
  ],
  "_source": false
}
				
			
				
					GET /products/_search
{
  "query": {
    "match_phrase": {
      "product_id": {
        "query": "RS-002",
        "slop": 0
      }
    }
  },
  "fields": [
    "*"
  ],
  "_source": false
}
				
			

will return

				
					    "hits": [
      {
        "_index": "products",
        "_id": "2",
        "_score": 1,
        "fields": {
          "description": [
            "Red Running Shoes"
          ],
          "product_id": [
            "RS-002"
          ],
          "product_id.raw": [
            "RS-002"
          ]
        }
      }
    ]
				
			

6.2. multi fields

product_id can be mapped as both text and keyword type with independent settings.

				
					PUT products/_mapping
{
  "runtime": {
    "product_id": null
  },
  "properties": {
    "description": {
      "type": "text"
    },
    "product_id": {
      "type": "text",
      "fields": {
        "raw": {
          "type": "keyword",
          "doc_values": true,
          "index": true
        }
      }
    }
  }
}
				
			

Note that you removed runtime mapping by assigning null as it won’t be needed anymore.

Update mapping in place

				
					POST products/_update_by_query?refresh&conflicts=proceed
				
			

Finally both below queries will return exactly one document

				
					GET products/_search?pretty
{
  "query": {
    "term": {
      "product_id.raw": "RS-002"
    }
  }
}

GET /products/_search
{
  "query": {
    "match_phrase": {
      "product_id.raw": "RS-002"
    }
  }
}
				
			

Noticed you are referring to ‘.raw’ field which is keyword type.

7. Summary

In this tutorial you have learned how to perform exact match over text fields. You have practiced proper mapping creation and you saw hidden issues that can confuse you.

FREE course

This FREE tutorial is part of the video course 18+ Secrets From Elasticsearch Golden Contributor where I am giving you best tips and tricks about ELK stack that I have collected during my professional carrier. Check it out in Course tab.

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow me on LinkedIn
Share the Post:

Enjoy Free Useful Amazing Content

Related Posts