テンソルは、列内の複数の入力を持つカテゴリフィーチャをどのように処理しますか？

例えば、私は、次のCSV形式でデータを持っている：テンソルは、列内の複数の入力を持つカテゴリフィーチャをどのように処理しますか？

csv 
col0 col1 col2 col3 
1  A  E|A|C 3 
0  B  D|F 2 
2  C  |  2

カンマで区切って各列は一つの特徴を表しています。通常、フィーチャはワンホット（例えばcol0、col1、col3）であるが、この場合、col2のフィーチャは複数の入力を有する（|によって分離される）。

テンソルが疎テンソルのワンホットフィーチャを処理できることは確かですが、col2のような複数の入力を持つフィーチャを処理できるかどうかはわかりません。

テンソルの疎テンソルでどのように表現すればよいですか？

あなたの助けを

col0 = tf.feature_column.numeric_column('ID') 
col1 = tf.feature_column.categorical_column_with_hash_bucket('Title', hash_bucket_size=1000) 
col3 = tf.feature_column.numeric_column('Score') 

columns = [col0, col1, col3] 

tf.estimator.DNNClassifier(
     model_dir=None, 
     feature_columns=columns, 
     hidden_units=[10, 10], 
     n_classes=4 
    )

おかげで、私は以下のコードを使用します（しかし、私はCOL2入力方法を知りません）。

出典

2017-10-31 Park.BJ

まずは、投稿した.csvにカンマが含まれていないことがまず問題です。 – alex

もちろん、カンマにはcsvが含まれています。 –

OK同じタスクで私のために機能したカスタムフィーチャーの列を作成するように見えます。

HashedCategoricalColumnをベースにして、文字列のみを処理するようにクリーンアップしました。しかし、タイプのチェックを追加する必要があります。

class _SparseArrayCategoricalColumn(
    _CategoricalColumn, 
    collections.namedtuple('_SparseArrayCategoricalColumn', 
          ['key', 'num_buckets', 'category_delimiter'])): 

    @property 
    def name(self): 
    return self.key 

    @property 
    def _parse_example_spec(self): 
    return {self.key: parsing_ops.VarLenFeature(dtypes.string)} 

    def _transform_feature(self, inputs): 
    input_tensor = inputs.get(self.key) 
    flat_input = array_ops.reshape(input_tensor, (-1,)) 
    input_tensor = tf.string_split(flat_input, self.category_delimiter) 

    if not isinstance(input_tensor, sparse_tensor_lib.SparseTensor): 
     raise ValueError('SparseColumn input must be a SparseTensor.') 

    sparse_values = input_tensor.values 
    # tf.summary.text(self.key, flat_input) 
    sparse_id_values = string_ops.string_to_hash_bucket_fast(
     sparse_values, self.num_buckets, name='lookup') 


    return sparse_tensor_lib.SparseTensor(
     input_tensor.indices, sparse_id_values, input_tensor.dense_shape) 


    @property 
    def _variable_shape(self): 
    if not hasattr(self, '_shape'): 
     self._shape = tensor_shape.vector(self.num_buckets) 
    return self._shape 

    @property 
    def _num_buckets(self): 
    """Returns number of buckets in this sparse feature.""" 
    return self.num_buckets 

    def _get_sparse_tensors(self, inputs, weight_collections=None, 
          trainable=None): 
    return _CategoricalColumn.IdWeightPair(inputs.get(self), None) 


def categorical_column_with_array_input(key, 
             num_buckets, category_delimiter="|"): 
    if (num_buckets is None) or (num_buckets < 1): 
    raise ValueError('Invalid num_buckets {}.'.format(num_buckets)) 

    return _SparseArrayCategoricalColumn(key, num_buckets, category_delimiter)

次に、インジケータ列を埋め込むことでラップすることができます。あなたの必要と思われるとおりです。それは私のための第一歩でした。私は "str：float | str：float ..."のような値を持つカラムを処理する必要があります。

出典

2018-01-13 02:55:23

テンソルは、列内の複数の入力を持つカテゴリフィーチャをどのように処理しますか？

答えて

関連する問題