Table of Contents
FREE course
This FREE tutorial is part of the video course 18+ Secrets From Elasticsearch Golden Contributor where I am giving you best tips and tricks about ELK stack that I have collected during my professional carrier. Check it out in Course tab.
1. Introduction
Imagine you want to find documents by exact value and you have text field. Technically you can do term or match queries. Decision depends on the mapping and data you want to search.
In this tutorial I want to show you how to find exact match in text fields.
2. Start ELK cluster
Please execute below commands in terminal to start Elasticsearch cluster
docker network create elkai
docker run --rm \
--name elk \
--net elkai \
-v "es-config:/usr/share/elasticsearch/config:rw" \
-e ES_JAVA_OPTS="-Xms4g -Xmx4g" \
-d \
-p 9200:9200 \
docker.elastic.co/elasticsearch/elasticsearch:8.15.2
docker exec -it elk bash -c "(mkfifo pipe1); ( (elasticsearch-reset-password -u elastic -i < pipe1) & ( echo $'y\n123456\n123456' > pipe1) );sleep 5;rm pipe1"
RESPONSE_JSON=`curl -k -XGET -u elastic:123456 "https://localhost:9200/_security/enroll/kibana"`
kibanatoken=$(echo "$RESPONSE_JSON" | jq -r '.token.value')
docker run --rm \
--name kibana \
--net elkai \
--volume "es-config:/es-config:ro" \
-d \
-p 5601:5601 \
-e ELASTICSEARCH_SSL_VERIFICATIONMODE=certificate \
-e ELASTICSEARCH_HOSTS=https://elk:9200 \
-e ELASTICSEARCH_SERVICEACCOUNTTOKEN=$kibanatoken \
-e ELASTICSEARCH_SSL_CERTIFICATEAUTHORITIES=/es-config/certs/http_ca.crt \
docker.elastic.co/kibana/kibana:8.15.2
Note that http_ca created by elasticsearch container is reused by kibana container.
3. Load example data
Open up Kibana and visit Dev Console, over there you can execute below commands.
PUT products
{
"mappings": {
"properties": {
"description": {
"type": "text"
},
"product_id": {
"type": "text"
}
}
}
}
POST _bulk
{"index":{"_index":"products","_id":1}}
{"description":"Running Shoes, Red","product_id":"RS-001"}
{"index":{"_index":"products","_id":2}}
{"description":"Red Running Shoes","product_id":"RS-002"}
{"index":{"_index":"products","_id":3}}
{"description":"Blue Hiking Boots","product_id":"HB-001"}
{"index":{"_index":"products","_id":4}}
{"description":"Hiking Boots, Leather, Brown","product_id":"HB-002"}
{"index":{"_index":"products","_id":5}}
{"description":"Running-Shoes Special Edition","product_id":"RS-003"}
4. Using term search
People say avoid using term query for text fields.
Because during indexing, analyzer tokenize fields and they will be hard to find by particular term.
What if field always contain single word?
still can be tokenized
like `id-1111` can be changed into `[id, 1111]`
and will give you both results when id-1111 and for example id-2222
Let’s say you want to find product with ID RS-002, running term query with “RS-002” will return zero matches for you
GET products/_search?pretty
{
"query": {
"term": {
"product_id": "RS-002"
}
}
}
this because during indexing the default standard analyzer has removed punctuation, split value into tokens and lowercase them.
Therefore even below query will not work as you are looking for uppercase.
GET products/_search?pretty
{
"query": {
"term": {
"product_id": "RS"
}
}
}
calling same request with lowercase on another hand will return all 3 products starting with id “rs”
GET products/_search?pretty
{
"query": {
"term": {
"product_id": "rs"
}
}
}
Instead you can use terms set and predict possible tokens
GET /products/_search
{
"query": {
"terms_set": {
"product_id": {
"terms": [ "rs","002"],
"minimum_should_match": 2
}
}
}
}
means 2 tokens rs & 002. This returns 1 record. Although there is a catch which I will explain later in this tutorial.
5. Using match queries(for text)
5.1. match
Simple match query will not work, because it searching for existence of 2 tokens rs or 002. This is because standard analyzer is used against “RS-002” and then text field “product_id” is search for resulting tokens instead of exact phrase.
GET /products/_search
{
"query": {
"match": {
"product_id": "RS-002"
}
}
}
4 documents will be returned
To fix that you can change query operator from default ‘or’ to ‘and’
GET /products/_search
{
"query": {
"match": {
"product_id": {
"query": "RS-002",
"operator": "and"
}
}
}
}
or write explicitly boolean query
GET /products/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"product_id": "RS"
}
},
{
"match": {
"product_id": "002"
}
}
]
}
}
}
5.2. catchy product
Like mentioned before there is a catch … Just insert new record
POST _bulk
{"index":{"_index":"products","_id":6}}
{"description":"Catchy Product to make you fail","product_id":"002-RS"}
And now all your previous “good” queries will return 2 records even if you are asking exactly for “RS-002”
What to do?
5.3. match_phrase
More suitable for matching exact string value is phrase matching call.
This will let you get exact match.
GET /products/_search
{
"query": {
"match_phrase": {
"product_id": "RS-002"
}
}
}
but this is not the final solution …
5.4. catchy product continue
if you insert new record
POST _bulk
{"index":{"_index":"products","_id":7}}
{"description":"Is there a hope","product_id":"dd RS-002 aaaa"}
now previous query do not exclude possibility of surrounding tokens. Phrase must exist within the text, that’s it.
GET /products/_search
{
"query": {
"match_phrase": {
"product_id": {
"query": "RS-002",
"slop": 0
}
}
}
}
And it’s returning
"hits": [
{
"_index": "products",
"_id": "2",
"_score": 1.0012583,
"_source": {
"description": "Red Running Shoes",
"product_id": "RS-002"
}
},
{
"_index": "products",
"_id": "7",
"_score": 0.7270006,
"_source": {
"description": "Is there a hope",
"product_id": "dd RS-002 aaaa"
}
}
]
6. Solutions of the problem
6.1. keyword
Keyword type is not analysed and stays as it is.
If you do not want to reindex. Use runtime mapping
PUT products/_mapping
{
"runtime": {
"product_id": {
"type": "keyword"
}
}
}
Now both terms and match queries will return exact match.
GET products/_search?pretty
{
"query": {
"term": {
"product_id": "RS-002"
}
},
"fields": ["*"
],
"_source": false
}
GET /products/_search
{
"query": {
"match_phrase": {
"product_id": {
"query": "RS-002",
"slop": 0
}
}
},
"fields": [
"*"
],
"_source": false
}
will return
"hits": [
{
"_index": "products",
"_id": "2",
"_score": 1,
"fields": {
"description": [
"Red Running Shoes"
],
"product_id": [
"RS-002"
],
"product_id.raw": [
"RS-002"
]
}
}
]
6.2. multi fields
product_id can be mapped as both text and keyword type with independent settings.
PUT products/_mapping
{
"runtime": {
"product_id": null
},
"properties": {
"description": {
"type": "text"
},
"product_id": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"doc_values": true,
"index": true
}
}
}
}
}
Note that you removed runtime mapping by assigning null as it won’t be needed anymore.
Update mapping in place
POST products/_update_by_query?refresh&conflicts=proceed
Finally both below queries will return exactly one document
GET products/_search?pretty
{
"query": {
"term": {
"product_id.raw": "RS-002"
}
}
}
GET /products/_search
{
"query": {
"match_phrase": {
"product_id.raw": "RS-002"
}
}
}
Noticed you are referring to ‘.raw’ field which is keyword type.
7. Summary
In this tutorial you have learned how to perform exact match over text fields. You have practiced proper mapping creation and you saw hidden issues that can confuse you.
FREE course
This FREE tutorial is part of the video course 18+ Secrets From Elasticsearch Golden Contributor where I am giving you best tips and tricks about ELK stack that I have collected during my professional carrier. Check it out in Course tab.