elasticsearch multi_match "AとB"の結果が "BとA"に等しくない

私は多くのフィールドを持つ製品インデックスを持っています。特に、それらのすべてが形態と同義語フィルタで分析されています。第一クエリelasticsearch multi_match "AとB"の結果が "BとA"に等しくない

https://gist.github.com/anonymous/6e287d328a72df07bc491312820ffdef

：

GET /products/nms/_search 
{ 
    "size": 40, 
    "_source": { 
     "include": [ 
     "_id" 
     ] 
    }, 
    "query": { 
     "multi_match": { 
     "fields": [ 
      "subject.value^2", 
      "colors" 
     ], 
     "minimum_should_match": "30%", 
     "operator": "and", 
     "query": "футболка белая", 
     "type": "cross_fields" 
     } 
    } 
}

結果：ちょうど2つのフィールドのインデックスに簡略化

はここにある

"hits": { 
     "total": 6615, 
     "max_score": 9.118673,

そして、彼らは非常に正しいです。

しかし、私は、第二のクエリの単語を入れ替えるとき：

GET /products/nms/_search 
{ 
    "size": 40, 
    "_source": { 
     "include": [ 
     "_id" 
     ] 
    }, 
    "query": { 
     "multi_match": { 
     "fields": [ 
      "subject.value^2", 
      "colors" 
     ], 
     "minimum_should_match": "30%", 
     "operator": "and", 
     "query": "белая футболка", 
     "type": "cross_fields" 
     } 
    } 
}

私は取得しています：

"hits": { 
     "total": 145434, 
     "max_score": 10.683464,

をそして最初の結果、上位100のヒットではない単一のマッチに似たものはありません。

時間を費やしていましたが、解決策がまだありません。私は文書構造（15フィールド以上）のためにcross_filedsを使用することを余儀なくされました。この場合、Elasticはあらゆるフィールドの同義語のヒットごとにカウントし、 "белая" ）と "футболка"（t-シャツ）は何もありません。

は例えば、我々は4つのドキュメント

PUT products_color_test/nms/1 
{ 
    "colors": "белая", //white 
    "subject" : { 
     "id" :1, 
     "value": "футболка"} //t-shirt 
} 
PUT products_color_test/nms/2 
{ 
    "colors": "черная", //black 
    "subject" : { 
     "id" :1, 
     "value": "футболка"} //t-shirt 
} 
PUT products_color_test/nms/3 
{ 
    "colors": "молочная", //synonym to white 
    "subject" : { 
     "id" :1, 
     "value": "футболка"} //t-shirt 
} 
PUT products_color_test/nms/4 
{ 
    "colors": "молочная", //synonym to white 
    "subject" : { 
     "id" :2, 
     "value": "куртка"} //jacket 
}

のは、それをテストしてみましょう

を持っています。

GET /products_color_test/nms/_search 
{ 
    "size": 40, 
    "query": { 
     "multi_match": { 
     "fields": [ 
      "subject.value^2", 
      "colors" 
     ], 
     "minimum_should_match": "30%", 
     "operator": "and", 
     "query": "футболка белая", 
     "type": "cross_fields" 
     } 
    } 
}

結果は次のとおりです。

{ 
    "took": 2, 
    "timed_out": false, 
    "_shards": { 
     "total": 1, 
     "successful": 1, 
     "failed": 0 
    }, 
    "hits": { 
     "total": 2, 
     "max_score": 0.58422226, 
     "hits": [ 
     { 
      "_index": "products_color_test", 
      "_type": "nms", 
      "_id": "3", 
      "_score": 0.58422226, 
      "_source": { 
       "colors": "молочная", 
       "subject": { 
        "id": 1, 
        "value": "футболка" 
       } 
      } 
     }, 
     { 
      "_index": "products_color_test", 
      "_type": "nms", 
      "_id": "1", 
      "_score": 0.568724, 
      "_source": { 
       "colors": "белая", 
       "subject": { 
        "id": 1, 
        "value": "футболка" 
       } 
      } 
     } 
     ] 
    } 
}

ほとんどcorect、同義語ヒットではなく、正確なヒットのより高いスコアを取得します。

しかし、スワップ後：

GET /products_color_test/nms/_search 
{ 
    "size": 40, 
    "query": { 
     "multi_match": { 
     "fields": [ 
      "subject.value^2", 
      "colors" 
     ], 
     "minimum_should_match": "30%", 
     "operator": "and", 
     "query": "белая футболка", 
     "type": "cross_fields" 
     } 
    } 
} 


    "hits": { 
    "total": 3, 
    "max_score": 0.58422226, 
    "hits": [ 
    { 
     "_index": "products_color_test", 
     "_type": "nms", 
     "_id": "3", 
     "_score": 0.58422226, 
     "_source": { 
      "colors": "молочная", 
      "subject": { 
       "id": 1, 
       "value": "футболка" 
      } 
     } 
    }, 
    { 
     "_index": "products_color_test", 
     "_type": "nms", 
     "_id": "1", 
     "_score": 0.568724, 
     "_source": { 
      "colors": "белая", 
      "subject": { 
       "id": 1, 
       "value": "футболка" 
      } 
     } 
    }, 
    { 
     "_index": "products_color_test", 
     "_type": "nms", 
     "_id": "4", 
     "_score": 0.46449086, 
     "_source": { 
      "colors": "молочная", 
      "subject": { 
       "id": 2, 
       "value": "куртка" // jacket ----!!!!!---- 
      } 
     } 
    } 
    ] 
    } 
}

質問：

[OK]を、同義語はと同数をカウントします。しかし、なぜスコアリングが異なり、文の同義語候補のどの位置を取っているかによって異なります。
クロスオーソライズされたドキュメント構造とmulti_matchクエリを保存して、シノニムのヒット数を1つだけカウントする方法はありますか？

ありがとう！

ps。私の英語

出典

2016-07-18 Michael M

のため申し訳ありませんが、フィルタをシノニムする

"expand": false

を追加することのように思えるの謎を解きます。私が理解しているように、ESは索引時に初めて同義語を取るだけですが、検索時にはすべての拡張セットを使用します。

結果は今2つのスワップクエリに似ており、ES数は一度だけ

 "_explanation": { 
      "value": 0.5622277, 
      "description": "sum of:", 
      "details": [ 
       { 
       "value": 0.5622277, 
       "description": "sum of:", 
       "details": [ 
        { 
         "value": 0.37481847, 
         "description": "max of:", 
         "details": [ 
          { 
          "value": 0.37481847, 
          "description": "weight(subject.value:футболка in 0) [PerFieldSimilarity], result of:", 
          "details": [ 
           { 
            "value": 0.37481847, 
            "description": "score(doc=0,freq=1.0), product of:", 
            "details": [ 
             { 
             "value": 0.37481847, 
             "description": "queryWeight, product of:", 
             "details": [ 
              { 
               "value": 2, 
               "description": "boost", 
               "details": [] 
              }, 
              { 
               "value": 1, 
               "description": "idf(docFreq=3, maxDocs=4)", 
               "details": [] 
              }, 
              { 
               "value": 0.18740924, 
               "description": "queryNorm", 
               "details": [] 
              } 
             ] 
             }, 
             { 
             "value": 1, 
             "description": "fieldWeight in 0, product of:", 
             "details": [ 
              { 
               "value": 1, 
               "description": "tf(freq=1.0), with freq of:", 
               "details": [ 
                { 
                "value": 1, 
                "description": "termFreq=1.0", 
                "details": [] 
                } 
               ] 
              }, 
              { 
               "value": 1, 
               "description": "idf(docFreq=3, maxDocs=4)", 
               "details": [] 
              }, 
              { 
               "value": 1, 
               "description": "fieldNorm(doc=0)", 
               "details": [] 
              } 
             ] 
             } 
            ] 
           } 
          ] 
          } 
         ] 
        }, 
        { 
         "value": 0.18740924, 
         "description": "max of:", 
         "details": [ 
          { 
          "value": 0.18740924, 
          "description": "weight(colors:белый in 0) [PerFieldSimilarity], result of:", 
          "details": [ 
           { 
            "value": 0.18740924, 
            "description": "score(doc=0,freq=1.0), product of:", 
            "details": [ 
             { 
             "value": 0.18740924, 
             "description": "queryWeight, product of:", 
             "details": [ 
              { 
               "value": 1, 
               "description": "idf(docFreq=3, maxDocs=4)", 
               "details": [] 
              }, 
              { 
               "value": 0.18740924, 
               "description": "queryNorm", 
               "details": [] 
              } 
             ] 
             }, 
             { 
             "value": 1, 
             "description": "fieldWeight in 0, product of:", 
             "details": [ 
              { 
               "value": 1, 
               "description": "tf(freq=1.0), with freq of:", 
               "details": [ 
                { 
                "value": 1, 
                "description": "termFreq=1.0", 
                "details": [] 
                } 
               ] 
              }, 
              { 
               "value": 1, 
               "description": "idf(docFreq=3, maxDocs=4)", 
               "details": [] 
              }, 
              { 
               "value": 1, 
               "description": "fieldNorm(doc=0)", 
               "details": [] 
              } 
             ] 
             } 
            ] 
           } 
          ] 
          } 
         ] 
        } 
       ] 
       }, 
       { 
       "value": 0, 
       "description": "match on required clause, product of:", 
       "details": [ 
        { 
         "value": 0, 
         "description": "# clause", 
         "details": [] 
        }, 
        { 
         "value": 0.18740924, 
         "description": "_type:nms, product of:", 
         "details": [ 
          { 
          "value": 1, 
          "description": "boost", 
          "details": [] 
          }, 
          { 
          "value": 0.18740924, 
          "description": "queryNorm", 
          "details": [] 
          } 
         ] 
        } 
       ] 
       } 
      ] 
     } 
    },

を打つ同義語

出典

2016-07-19 10:29:53

elasticsearch multi_match "AとB"の結果が "BとA"に等しくない

答えて

関連する問題