Painless JSON parsing in Elasticsearch

January 30, 2024
Tomasz Dzierżanowski

1. Introduction

When you have JSON documents as string in one of your input fields you can then parse it and store in structured form. All with help of ingest processors accessible in Painless.

2. Start Elasticsearch

Please start your Elastic instance by running

				
					docker run --rm \
--name animals \
-d \
-p 9200:9200 \
docker.elastic.co/elasticsearch/elasticsearch:8.12.0

then set password for ‘elastic’ user

				
					docker exec -it animals bash -c "(mkfifo pipe1); ( (elasticsearch-reset-password -u elastic -i < pipe1) & ( echo $'y\n123456\n123456' > pipe1) );sleep 5;rm pipe1"

3. Dry run of Painless script

Like you read in article you test painless script by simulating ingestion. This helps with debugging and makes sure you will have right results.

Document that you can use is

				
					{
  "classification": {
    "kingdom": "Animalia",
    "phylum": "Chordata",
    "class": "Aves",
    "order": "Sphenisciformes",
    "family": "Spheniscidae",
    "genus": "Aptenodytes",
    "species": "Aptenodytes forsteri",
    "commonName": "Emperor Penguin"
  }
}

And code to test pipeline is

				
					curl -k -u elastic:123456 -XPOST "https://localhost:9200/_ingest/pipeline/_simulate?pretty" \
-H 'content-type: application/json' -d'
{
    "pipeline": {
        "processors": [
            {
                "script": {
                    "lang": "painless",
                    "source": "ctx.details = Processors.json(ctx.details);"
                }
            }
        ]
    },
    "docs": [
        {
            "_source": { "animal":"Penguin",
                "details": "{\"classification\": {\"kingdom\": {\"phylum\":\"Chordata\",\"class\":\"Aves\",\"order\":\"Sphenisciformes\",\"family\":\"Spheniscidae\",\"genus\":\"Aptenodytes\",\"species\":\"Aptenodytes forsteri\",\"commonName\":\"Emperor Penguin\"}}}"
            }
        }
    ]
}'

your result of execution will be

				
					{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_version" : "-3",
        "_id" : "_id",
        "_source" : {
          "animal" : "Penguin",
          "details" : {
            "classification" : {
              "kingdom" : {
                "commonName" : "Emperor Penguin",
                "phylum" : "Chordata",
                "genus" : "Aptenodytes",
                "species" : "Aptenodytes forsteri",
                "family" : "Spheniscidae",
                "class" : "Aves",
                "order" : "Sphenisciformes"
              }
            }
          }
        },
        "_ingest" : {
          "timestamp" : "2024-01-30T01:54:31.389949888Z"
        }
      }
    }
  ]
}

4. Create Pipeline

Your pipeline is rather simple. You could make this script stored optionally but is so simple that can be skipped.

				
					curl -k -u elastic:123456 -XPUT "https://localhost:9200/_ingest/pipeline/parsejson" \
-H 'content-type: application/json' -d'
{
    "description": "Parse JSON document",
    "processors": [
        {
            "script": {
                "source": "ctx.details = Processors.json(ctx.details);"
            }
        }
    ]
}'

5. Load test data through pipeline

Here 5 animals to load with their classification

				
					curl -k -u elastic:123456 -XPOST "https://localhost:9200/animals/_bulk?pipeline=parsejson" \
-H 'content-type: application/json' -d'
{"index":{"_id":"1"}}
{"animal": "Polar Bear","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Carnivora\",\"family\":\"Ursidae\",\"genus\":\"Ursus\",\"species\":\"maritimus\"}}"}
{"index":{"_id":"2"}}
{"animal": "Arctic Fox","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Carnivora\",\"family\":\"Canidae\",\"genus\":\"Vulpes\",\"species\":\"lagopus\"}}"}
{"index":{"_id":"3"}}
{"animal": "Snowy Owl","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Aves\",\"order\":\"Strigiformes\",\"family\":\"Strigidae\",\"genus\":\"Bubo\",\"species\":\"scandiacus\"}}"}
{"index":{"_id":"4"}}
{"animal": "Narwhal","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Cetacea\",\"family\":\"Monodontidae\",\"genus\":\"Monodon\",\"species\":\"monoceros\"}}"}
{"index":{"_id":"5"}}
{"animal": "Arctic Hare","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Lagomorpha\",\"family\":\"Leporidae\",\"genus\":\"Lepus\",\"species\":\"arcticus\"}}"}
'

6. Check results

After successful load you can check results by searching _source of docs.

				
					curl -k -u elastic:123456 -XPOST "https://localhost:9200/animals/_search?pretty&filter_path=hits.hits._source"

After executing above command you should get below result

				
					{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "animal" : "Polar Bear",
          "details" : {
            "classification" : {
              "phylum" : "Chordata",
              "genus" : "Ursus",
              "species" : "maritimus",
              "family" : "Ursidae",
              "kingdom" : "Animalia",
              "class" : "Mammalia",
              "order" : "Carnivora"
            }
          }
        }
      },
      {
        "_source" : {
          "animal" : "Arctic Fox",
          "details" : {
            "classification" : {
              "phylum" : "Chordata",
              "genus" : "Vulpes",
              "species" : "lagopus",
              "family" : "Canidae",
              "kingdom" : "Animalia",
              "class" : "Mammalia",
              "order" : "Carnivora"
            }
          }
        }
      },
      {
        "_source" : {
          "animal" : "Snowy Owl",
          "details" : {
            "classification" : {
              "phylum" : "Chordata",
              "genus" : "Bubo",
              "species" : "scandiacus",
              "family" : "Strigidae",
              "kingdom" : "Animalia",
              "class" : "Aves",
              "order" : "Strigiformes"
            }
          }
        }
      },
      {
        "_source" : {
          "animal" : "Narwhal",
          "details" : {
            "classification" : {
              "phylum" : "Chordata",
              "genus" : "Monodon",
              "species" : "monoceros",
              "family" : "Monodontidae",
              "kingdom" : "Animalia",
              "class" : "Mammalia",
              "order" : "Cetacea"
            }
          }
        }
      },
      {
        "_source" : {
          "animal" : "Arctic Hare",
          "details" : {
            "classification" : {
              "phylum" : "Chordata",
              "genus" : "Lepus",
              "species" : "arcticus",
              "family" : "Leporidae",
              "kingdom" : "Animalia",
              "class" : "Mammalia",
              "order" : "Lagomorpha"
            }
          }
        }
      }
    ]
  }
}

7. Conclusion

You have practiced how to parse JSON object if encounter in input data. Noticed that final results make it structured JSON instead of plain string. Now you can use this functionality in your project.