How to delete multiple documents that match a specific condition in Elasticsearch? - Big Data In Real World

How to delete multiple documents that match a specific condition in Elasticsearch?

How to fix could not connect to the endpoint URL issue in S3?
April 6, 2022
How to find the number of partitions in a DataFrame?
April 20, 2022
How to fix could not connect to the endpoint URL issue in S3?
April 6, 2022
How to find the number of partitions in a DataFrame?
April 20, 2022

Deleting a single document is pretty straightforward in Elasticsearch. We can simply issue a DELETE on the document id and the document will be deleted from the index.

 $ curl -XDELETE localhost:9200/account/_doc/10?pretty
 {
   "_index" : "account",
   "_type" : "_doc",
   "_id" : "10",
   "_version" : 2,
   "result" : "deleted",
   "_shards" : {
     "total" : 2,
     "successful" : 1,
     "failed" : 0
   },
   "_seq_no" : 1000,
   "_primary_term" : 1
 }
 $ curl -X GET localhost:9200/account/_doc/10?pretty
 {
   "_index" : "account",
   "_type" : "_doc",
   "_id" : "10",
   "found" : false
 } 

What if we want to delete all documents from account index matching account_number greater than or equal to 15 and less than or equal to 20.

Delete by query

We have a total of 6 documents in the account index with account_number greater than or equal to 15 and less than or equal to 20.

 curl -X GET "localhost:9200/account/_search?pretty" -H 'Content-Type: application/json' -d'
 {
   "query": {
     "range": {
       "account_number": {
         "gte": 15,
         "lte": 20
       }
     }
   }
 }'
 {
   "took" : 3,
   "timed_out" : false,
   "_shards" : {
     "total" : 1,
     "successful" : 1,
     "skipped" : 0,
     "failed" : 0
   },
   "hits" : {
     "total" : {
       "value" : 6,
       "relation" : "eq"
     } 

We will issue a POST on _delete_by_query API with the query and range indicating  account_number greater than or equal to 15 and less than or equal to 20. And we can see from the output that matching 6 documents were deleted from index.

 curl -X POST "localhost:9200/account/_delete_by_query?pretty" -H 'Content-Type: application/json' -d'
 {
   "query": {
     "range": {
       "account_number": {
         "gte": 15,
         "lte": 20
       }
     }
   }
 }'
 {
   "took" : 80,
   "timed_out" : false,
   "total" : 6,
   "deleted" : 6,
   "batches" : 1,
   "version_conflicts" : 0,
   "noops" : 0,
   "retries" : {
     "bulk" : 0,
     "search" : 0
   },
   "throttled_millis" : 0,
   "requests_per_second" : -1.0,
   "throttled_until_millis" : 0,
   "failures" : [ ]
 } 

Let’s make sure the documents are truly gone by issuing the same initial search again and this time we see 0 hits which is what we expect to see.

 curl -X GET "localhost:9200/account/_search?pretty" -H 'Content-Type: application/json' -d'
 {
   "query": {
     "range": {
       "account_number": {
         "gte": 15,
         "lte": 20
       }
     }
   }
 }'
 {
   "took" : 3,
   "timed_out" : false,
   "_shards" : {
     "total" : 1,
     "successful" : 1,
     "skipped" : 0,
     "failed" : 0
   },
   "hits" : {
     "total" : {
       "value" : 0,
       "relation" : "eq"
     },
     "max_score" : null,
     "hits" : [ ]
   }
 } 
Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to delete multiple documents that match a specific condition in Elasticsearch?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X