Luceneの各エンティティの最終イベントを検索

Luceneドキュメントストア（バージョン6.2.1）にイベント（ドキュメント）が保存されています。各ドキュメントにはEntityIdとTimestampがあります。Luceneの各エンティティの最終イベントを検索

同じEntityIdを持つ多くのドキュメントがあります。

それぞれEntityIdの最新Timestampのドキュメントを取得したいと思います。

すべてのイベントを引き出し、Javaでこれを行う必要がありますか？私は、ファセットを見ていたが、限り、私はそれを見ることができるようにない最大/最小タイプの集計のために、ちょうどカウントのためである

出典

2016-11-11 Cheetah

あなたがしようとするものは、lucene-groupingから利用可能なGroupingSearchで行うことができます。

GroupingSearch意志グループ検索にそうソートする必要が提供するグループフィールドによってドキュメント（私たちの場合はEntityIdが）あなたは次のタイプのエラーを取得します：

java.lang.IllegalStateExceptionを：予期しないdocvaluesフィールドに '$ {field-name}'（予期される= SORTED）のフィールドのNONEを入力します。

そして、与えられたEntityIdの最新の文書を持っていることができるように、あなたもTimestampがソートフィールドを持っている必要があります。例えばので

Iインデックスは次のように文書の場合は約

IndexSearcher searcher = ... 
// Some random query here I get all docs 
Query query = new MatchAllDocsQuery(); 
// Group the docs by EntityId 
GroupingSearch groupingSearch = new GroupingSearch("EntityId"); 
// Sort the docs of the same group by Timestamp in reversed order to get 
// the most recent first 
groupingSearch.setSortWithinGroup(
    new Sort(new SortField("Timestamp", SortField.Type.LONG, true)) 
); 
// Set the limit of docs for a given group to 1 as we only want the latest 
// NB: This is the default value so it is not required 
groupingSearch.setGroupDocsLimit(1); 
// Get the 10 first matching groups 
TopGroups<BytesRef> result = groupingSearch.search(searcher, query, 0, 10); 
// Iterate over the groups found 
for (GroupDocs<BytesRef> groupDocs : result.groups) { 
    // Iterate over the docs of a given group 
    for (ScoreDoc scoreDoc : groupDocs.scoreDocs) { 
     // Get the related doc 
     Document doc = searcher.doc(scoreDoc.doc); 
     // Print the stored value of EntityId and Timestamp 
     System.out.printf(
      "EntityId = %s Timestamp = %s%n", doc.get("Id"), doc.get("Tsp") 
     ); 
    } 
}

詳細：

String id = .. 
long timestamp = ... 
Document doc = new Document(); 
// The sorted version of my EntityId 
doc.add(new SortedDocValuesField("EntityId", new BytesRef(id))); 
// The stored version of my EntityId to be able to get its value later if needed 
doc.add(new StringField("Id", id, Field.Store.YES)); 
// The sorted version of my timestamp 
doc.add(new NumericDocValuesField("Timestamp", timestamp)); 
// The stored version of my timestamp to be able to get its value later if needed 
doc.add(new StringField("Tsp", Long.toString(timestamp), Field.Store.YES));

私はその後、与えられたEntityIdとして、次の最新のドキュメントを入手することができるだろうgrouping。

出典

2017-02-11 09:42:07

ああ！ - ドキュメントを読むときに覚えていなかった重要な情報は、 'SortedDocValuesField'ビットだと思います。私は再索引付けする必要がありますが、これをあきらめて戻ってきて、それが機能するときに応答をマークしてください。ありがとう！ – Cheetah

うん、私はそれが私が欲しいことをかなり確信しています。私は現在、すべてのエンティティをグループ化しようとしているので、どのように 'getAllMatchingGroups'を使うことができるのか把握しようとしていますが、返されるBytesRef Collectionの処理についてはわかりません。 – Cheetah

あなたは（テストしていない）、このようなCollapsing query parserを使用して試みることができる：

fq={!collapse field=EntityId max=Timestamp}

またはおそらく同じことを達成することができますGrouping

出典

2016-11-15 22:23:01 Persimmonium

私はこれらがSolrに固有であると誤解されていない限り、私はSolrではなくLuceneを使用しています。 – Cheetah

おっと、申し訳ありません、私はちょうど別のSolrの質問を見て、私はこれが普通のLuceneであることを認識しませんでした。 – Persimmonium

Luceneの各エンティティの最終イベントを検索

答えて

関連する問題