Elastic search 7.3.0 translation - "Aggregations" measures aggregation maximum clicks aggregation top hits aggregation

Links to the original text: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html#_supported_per_hit_features

top hits aggregation


The top_hits metric aggregator tracks the most relevant documents being aggregated. This aggregator is intended to be used as a sub-aggregator so that each bucket can aggregate top-matched documents.
Top_Hits aggregator can be effectively used to group result sets by specific fields through Bucket aggregator. One or more bucket aggregators determine which attribute to slice the result set into.

Option parameters

  • fron - The offset from the first result to be obtained.
  • size - Maximum number of matched hits returned per bucket. By default, return the first three matching clicks.
  • Sort - How to sort the highest matching clicks. By default, Click to sort by the score of the main query.

Support per click function

Top_Hits aggregation returns the number of regular search hits because it supports many functions per hit:

List

In the following example, we group sales by type and display the last sale by type. For each sale, only the date and price fields are included in the source.

POST /sales/_search?size=0
{
    "aggs": {
        "top_tags": {
            "terms": {
                "field": "type",
                "size": 3
            },
            "aggs": {
                "top_sales_hits": {
                    "top_hits": {
                        "sort": [
                            {
                                "date": {
                                    "order": "desc"
                                }
                            }
                        ],
                        "_source": {
                            "includes": [ "date", "price" ]
                        },
                        "size" : 1
                    }
                }
            }
        }
    }
}

Possible responses:

{
  ...
  "aggregations": {
    "top_tags": {
       "doc_count_error_upper_bound": 0,
       "sum_other_doc_count": 0,
       "buckets": [
          {
             "key": "hat",
             "doc_count": 3,
             "top_sales_hits": {
                "hits": {
                   "total" : {
                       "value": 3,
                       "relation": "eq"
                   },
                   "max_score": null,
                   "hits": [
                      {
                         "_index": "sales",
                         "_type": "_doc",
                         "_id": "AVnNBmauCQpcRyxw6ChK",
                         "_source": {
                            "date": "2015/03/01 00:00:00",
                            "price": 200
                         },
                         "sort": [
                            1425168000000
                         ],
                         "_score": null
                      }
                   ]
                }
             }
          },
          {
             "key": "t-shirt",
             "doc_count": 3,
             "top_sales_hits": {
                "hits": {
                   "total" : {
                       "value": 3,
                       "relation": "eq"
                   },
                   "max_score": null,
                   "hits": [
                      {
                         "_index": "sales",
                         "_type": "_doc",
                         "_id": "AVnNBmauCQpcRyxw6ChL",
                         "_source": {
                            "date": "2015/03/01 00:00:00",
                            "price": 175
                         },
                         "sort": [
                            1425168000000
                         ],
                         "_score": null
                      }
                   ]
                }
             }
          },
          {
             "key": "bag",
             "doc_count": 1,
             "top_sales_hits": {
                "hits": {
                   "total" : {
                       "value": 1,
                       "relation": "eq"
                   },
                   "max_score": null,
                   "hits": [
                      {
                         "_index": "sales",
                         "_type": "_doc",
                         "_id": "AVnNBmatCQpcRyxw6ChH",
                         "_source": {
                            "date": "2015/01/01 00:00:00",
                            "price": 150
                         },
                         "sort": [
                            1420070400000
                         ],
                         "_score": null
                      }
                   ]
                }
             }
          }
       ]
    }
  }
}

Field Folding Display

Field collapse or result grouping is a function that can logically group result sets and return each group to a top-level document. The order of groups is determined by the relevance of the first document in the group. In Elastic Search, this can be achieved through a bucket aggregator that wraps the top_hits aggregator as a sub-aggregator.

In the following example, we search for Web pages that have been crawled. For each web page we store, its body and domain belong to. By defining terminology aggregators on domain fields, we group the result sets of Web pages by domain. The top hit aggregator is then defined as a sub-aggregator so that each bucket collects top matched hits.

In addition, a max aggregator is defined, and the order function of the term aggregator is used to return the bucket in the order of relevance of the most relevant documents in the bucket.

POST /sales/_search
{
  "query": {
    "match": {
      "body": "elections"
    }
  },
  "aggs": {
    "top_sites": {
      "terms": {
        "field": "domain",
        "order": {
          "top_hit": "desc"
        }
      },
      "aggs": {
        "top_tags_hits": {
          "top_hits": {}
        },
        "top_hit" : {
          "max": {
            "script": {
              "source": "_score"
            }
          }
        }
      }
    }
  }
}

Currently, a max (or min) aggregator is needed to ensure that the buckets in the terminology aggregator are sorted according to the scores of the most relevant pages in each domain. Unfortunately, the top_hits aggregator is not yet available in the order option for the term aggregator.

top_hits supports nested or reverse nested aggregators

If the top_hits aggregator is wrapped in a nested or reverse_nested aggregator, the nested hit is returned. Nested hits are hidden mini-documents in a sense. They are part of regular documents, and nested field types are configured in the mapping. If the top hit aggregator is wrapped in a nested or reverse nested aggregator, it can cancel hiding these documents. Reading is about Details of nesting in nested type mapping.

If nested types are configured, a single document is actually indexed into multiple Lucene documents, and they share the same ID. In order to determine the identity of nested hit, not only ID is needed, so nested hit also includes its nested identity. The nested identifier is stored under the nested field of the search hit, including the array field and the offset in the array field to which the nested hit belongs. The offset is based on zero.

Let's see how it works with real samples. Consider the following mappings:

PUT /sales
{
    "mappings": {
        "properties" : {
            "tags" : { "type" : "keyword" },
            "comments" : {  <1>
                "type" : "nested",
                "properties" : {
                    "username" : { "type" : "keyword" },
                    "comment" : { "type" : "text" }
                }
            }
        }
    }
}
  1. Annotations are an array that stores nested documents under the product object.

There are also some documents:

PUT /sales/_doc/1?refresh
{
    "tags": ["car", "auto"],
    "comments": [
        {"username": "baddriver007", "comment": "This car could have better brakes"},
        {"username": "dr_who", "comment": "Where's the autopilot? Can't find it"},
        {"username": "ilovemotorbikes", "comment": "This car has two extra wheels"}
    ]
}

Now you can perform the following top-hit aggregation (encapsulated in nested aggregation):

POST /sales/_search
{
    "query": {
        "term": { "tags": "car" }
    },
    "aggs": {
        "by_sale": {
            "nested" : {
                "path" : "comments"
            },
            "aggs": {
                "by_user": {
                    "terms": {
                        "field": "comments.username",
                        "size": 1
                    },
                    "aggs": {
                        "by_nested": {
                            "top_hits":{}
                        }
                    }
                }
            }
        }
    }
}

A top hit response fragment with a nested hit in the first slot of the array field comment:

{
  ...
  "aggregations": {
    "by_sale": {
      "by_user": {
        "buckets": [
          {
            "key": "baddriver007",
            "doc_count": 1,
            "by_nested": {
              "hits": {
                "total" : {
                   "value": 1,
                   "relation": "eq"
                },
                "max_score": 0.3616575,
                "hits": [
                  {
                    "_index": "sales",
                    "_type" : "_doc",
                    "_id": "1",
                    "_nested": {
                      "field": "comments",  <1>
                      "offset": 0  <2>
                    },
                    "_score": 0.3616575,
                    "_source": {
                      "comment": "This car could have better brakes", <3>
                      "username": "baddriver007"
                    }
                  }
                ]
              }
            }
          }
          ...
        ]
      }
    }
  }
}
  1. Name of an array field containing nested hits
  2. If nested hits contain arrays, locate
  3. Sources of nested hits

If _source is requested, only part of the nested object source is returned, not the entire source of the document. Storage fields at the level of nested internal objects can also be accessed by the Top_Hits aggregator in nested or reverse nested aggregators.

Only _nested hits will have nested fields in hits, and no _nested fields will exist in non-nested (regular) hits.

If _source is not enabled, the information in _nested can also be used to parse the original _source elsewhere.

If multiple levels of nested object types are defined in the mapping, the _nested information can also be hierarchical to represent two or deeper nested hit identities.

In the following example, nested_grand_child_field nests the first slot of the "Grand" subfield, and then resides in the second slow slot of ** nested_child_field **:

...
"hits": {
 "total" : {
     "value": 2565,
     "relation": "eq"
 },
 "max_score": 1,
 "hits": [
   {
     "_index": "a",
     "_type": "b",
     "_id": "1",
     "_score": 1,
     "_nested" : {
       "field" : "nested_child_field",
       "offset" : 1,
       "_nested" : {
         "field" : "nested_grand_child_field",
         "offset" : 0
       }
     }
     "_source": ...
   },
   ...
 ]
}
...

Tags: ElasticSearch Attribute Fragment

Posted on Thu, 29 Aug 2019 01:33:04 -0700 by simulant