TensorFlow series - feature_column feature tool description

1, Some tools

1. Three features of local printout non sequential sequence data_ Method of column converted value

Applicable to tensorflow1 x

import tensorflow as tf
from tensorflow.python.feature_column import feature_column_v2 as fc_v2
from tensorflow.python.feature_column import feature_column as fc
# Note: only mode 2 will check whether the input data conforms to the feature_ Definition of column
def numeric_column():
    column = tf.feature_column.numeric_column(
        key="feature",
        shape=(3,2,1,),
        default_value=100,
        dtype=tf.float32,
        normalizer_fn=lambda x: x / 2)
    features = {"feature": tf.constant(value=[
            [[1, 2], [3, 4], [5, 6]],
            [[7, 8], [9, 10], [11, 12]]
        ])}

    # feature_column processing method: 1
    feature_cache = fc_v2.FeatureTransformationCache(features=
        features
    )
    rs_1 = column.get_dense_tensor(transformation_cache=feature_cache, state_manager=None)

    # feature_column processing method: 2
    net = tf.feature_column.input_layer(features,column)

    # feature_column processing method: 3
    builder = fc._LazyBuilder(features)
    rs_3 = column._get_dense_tensor(builder,None)
    with tf.Session() as sess:
        print(sess.run(rs_1))
        print(sess.run(net))
        print(sess.run(rs_3))

numeric_column()

2. Three kinds of features for printing sequence data_ Method of column converted value

Applicable to tensorflow1 x

2.1 function of sequence feature sequence

reference resources: TensorFlow engineering project practice of in-depth learning

2.2. How to use sequence feature

import tensorflow as tf
from tensorflow.python.feature_column import feature_column_v2 as fc_v2
from tensorflow.python.feature_column import feature_column as fc
from tensorflow.python.feature_column import feature_column_lib as fcl
from tensorflow.python.feature_column import sequence_feature_column as sqfc
def sequence_numeric_column():
    # The usage is the same as numeric_ The columns are basically the same
    column = tf.feature_column.sequence_numeric_column(
        key="feature",
        # Shape specifies the shape of each element in the sequence
        # The shape of the final returned structure is [batch_size, element_count / sum (shape [:]), shape]
        # Setting this value will only affect deny_ tensor.  sequence_length is only related to the actual input data
        shape=(3,),
        default_value=60,
        dtype=tf.float32,
        normalizer_fn=lambda x: x / 2)
    column2 = tf.contrib.feature_column.sequence_numeric_column(
        key="feature",
        shape=(3,),
        default_value=60,
        dtype=tf.float32,
        normalizer_fn=lambda x: x / 2)
    features = {
        # The value corresponding to the feature must be SparseTensor
        "feature": tf.SparseTensor(
            # indices should be written in order
            indices=[
                [0, 0, 1],
                [0, 1, 0],
                [0, 5, 0],
                [0, 5, 1],
                [1, 2, 1],
                [1, 3, 0],
                [1, 3, 1]
            ],
            values=[4, 1, 7, 9, 3, 4., 4],
            dense_shape=[2, 6, 2])
    }
    # Method: 1
    feature_cache = feature_column_lib.FeatureTransformationCache(features=features)
    rs_1 = column.get_sequence_dense_tensor(transformation_cache=feature_cache, state_manager=None)
    # Method: 2
    rs_2 = tf.contrib.feature_column.sequence_input_layer(features,column2)
    builder = fc._LazyBuilder(features)
    # Method: 3
    rs_3 =  column2._get_sequence_dense_tensor(builder,None)
    with tf.Session() as sess:
        print(sess.run(rs_1))
        print("111"*20)
        print(sess.run(rs_2))
        print("222"*20)
        print(sess.run(rs_3))

sequence_numeric_column()

3. Explain

input_ Input requirements for layer:

All items should be instances of classes derived from
`_DenseColumn` such as `numeric_column`, `embedding_column`,
`bucketized_column`, `indicator_column`. If you have categorical features,
you can wrap them with an `embedding_column` or `indicator_column`.

Simply put, it's input_ The input of layer requires dense data

2, feature_column introduction

1,feature_ What is column

tf.feature_column is a set of tools for processing data. I generally use it as "feature Engineering" in TensorFlow.

2. What data processing can be done

3. It can handle continuous real number (int or float) features with fixed length

3.1. Examples of data that can be processed

"fea_1":0.123

"fea_2":[0.123,0.222]

"fea_3":[1,3,5]

"fea_4":10

"fea_0":[ [[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]] ]

"fea_sparse_1" : tf. Sparsetensor (# indexes should be written in order: indexes = [[0, 0, 1], [0, 1, 0], [0, 5, 0], [0, 5, 1], [1, 2, 1], [1, 3, 0], [1, 3, 1], values = [4, 1, 7, 9, 3, 4, 4], deny_shape = [2, 6, 2])

3.2. Normalizer can be used_ FN method processes the input data

normalizer_fn=lambda x: x/2

3.3 non sequence feature_ How to write column

# numeric_column only supports int and float types
tf.feature_column.numeric_column(key="fea_1",shape=(1,),default_value=0,dtype=tf.float32,normalizer_fn=lambda x: ...)
tf.feature_column.numeric_column(key="fea_2",shape=(2,),default_value=0,dtype=tf.float32,normalizer_fn=lambda x: ...)
tf.feature_column.numeric_column(key="fea_3",shape=(3,),default_value=0,dtype=tf.int64,normalizer_fn=lambda x: ...)

3.4. Skills of using fixed length real number features

For fea_1,fea_2,fea_3,fea_4 features can be put together as "fea_num", so that the generated tfrecord will contain fewer key s and occupy less space.

3.5. sequence feature_ How to write column

column = tf.feature_column.sequence_numeric_column(
        key="feature",
        shape=(6,),
        default_value=60,
        dtype=tf.float32,
        normalizer_fn=lambda x: x / 2)
# For TF contrib. feature_ column. sequence_ input_ layer
column2 = tf.contrib.feature_column.sequence_numeric_column(
    key="feature",
    shape=(6,),
    default_value=60,
    dtype=tf.float32,
    normalizer_fn=lambda x: x / 2)
# Enter the sparse feature "fea_sparse_1"
# result:
TensorSequenceLengthPair(dense_tensor=array([[[60. ,  2. ,  0.5, 60. , 60. , 60. ],
        [60. , 60. , 60. , 60. ,  3.5,  4.5]],

       [[60. , 60. , 60. , 60. , 60. ,  1.5],
        [ 2. ,  2. , 60. , 60. , 60. , 60. ]]], dtype=float32), sequence_length=array([6, 4], dtype=int64))

4. Can handle fixed length discrete feature category column

4.1 examples of non sequence and sequence sequence sequence characteristic data (int type, string type) that can be processed

# 3 data with 2 rows and 2 columns
"fea_5":[
    [["value1", "value2"], ["value3", "value3"]],
    [["value3", "value5"], ["value4", "value4"]],
    [["value4", "value5"], ["value2", "value4"]]
]
# The following two are two 1D data
"fea_6":["value1","value2"]
"fea_7":[["value1"],["value2"]]
# One 1D data
"fea_8":["value1"]
# 2 data with 1 row and 2 columns
"fea_9":[["value1","value3"],["value2","value4"]]
# 2 rows and 2 columns of data
"fea_10":[
            [["value1", "value2"], ["value3", "value3"]],
            [["value3", "value5"], ["value4", "value4"]]
        ]
# 3 dense data with 2 rows and 3 columns
"fea_11":[
            [[1, 2, 3], [4, 5, 6]],
            [[5, 6, 7], [8, 9, 10]],
            [[8, 9, 10], [11, 12, 13]]
        ]
# 3 dense data with 1 row and 6 columns
"fea_12":[
            [1, 2, 3, 4, 5, 6],
            [5, 6, 7, 8, 9, 10],
            [8, 9, 10, 11, 12, 13]
        ]
# The classification feature value corresponds to the weight data (the corresponding data should be consistent with the dimension of the weight, and there are at most 2 dimensions)
"fea_weight_1":[
            [1.1, 2.2, 3.3, 4.4, 5.5, 6.6],
            [9.9, 8.8, 7.7, 6.6, 5.5, 4.4]
        ]
# Weight data: 3 data with 1 row and 4 columns
"fea_weight_2":[
            [1.1, 2.2, 3.3, 4.4],
            [9.9, 8.8, 7.7, 6.6],
            [3.4, 8.8, 2.2, 6.6]
        ]
# 3 classification feature data with 1 row and 4 columns
"fea_13":[
            ["value1", "value2","value3", "value3"],
            ["value3", "value5","value4", "value4"],
            ["value4", "value5","value2", "value4"]
        ]
# 2 characteristic data with 3 rows and 2 columns
"fea_14":[
            [[1, 2], [3, 4], [5, 6]],
            [[7, 7], [9, 10], [11, 12]]
        ]

The int data is the same as above

4.2. Few feature value categories: categorical_column_with_vocabulary_list

There are three ways to use:

The category features and sequence category features are expressed as integer numbers
The category features and sequence category features are expressed as multi_hot coding
The category features and sequence category features are expressed as weighted multi_hot coding
The category feature table and sequence category feature are shown as embedded vector representation

Note: for dense densor features, the input data dimension must be consistent

Processing result style of non sequence characteristic data:

# The category value is int type
column = tf.feature_column.categorical_column_with_vocabulary_list(
        key="feature",
        vocabulary_list=[1, 2, 3, 4],
        dtype=tf.int64,
        default_value=-1,
        # Same as default_value, but both cannot work at the same time.
        # Map the exceeded values to [len (volatile), len (volatile) + num_oov_buckets)
        # The default value is 0
        # When the value is not 0, default_value must be set to - 1
        # When default_value and num_ oov_ When all buckets take the default value, they will be mapped to - 1
        num_oov_buckets=4)

# The category value is string type
column = tf.feature_column.categorical_column_with_vocabulary_list(
        key="feature",
        vocabulary_list=["value1", "value2", "value3","value4"],
        dtype=tf.string,
        default_value=-1,
        num_oov_buckets=4)
# Input data fea_5. Sparse tensor result of post conversion:
SparseTensorValue(indices=array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 1],
       [1, 0, 0],
       [1, 0, 1],
       [1, 1, 0],
       [1, 1, 1],
       [2, 0, 0],
       [2, 0, 1],
       [2, 1, 0],
       [2, 1, 1]], dtype=int64), values=array([0, 1, 2, 2, 2, 6, 3, 3, 3, 6, 1, 3], dtype=int64), dense_shape=array([3, 2, 2], dtype=int64))
# Usage: 1
# dense tensor results converted to numerical representation:
[[[0 1]
  [2 2]]

 [[2 6]
  [3 3]]

 [[3 6]
  [1 3]]]
# Usage: 2
# Convert to multi_ Result of hot feature (8 columns = vocabulary_list length + num_oov_buckets):
[[[1. 1. 0. 0. 0. 0. 0. 0.]
  [0. 0. 2. 0. 0. 0. 0. 0.]]

 [[0. 0. 1. 0. 0. 0. 1. 0.]
  [0. 0. 0. 2. 0. 0. 0. 0.]]

 [[0. 0. 0. 1. 0. 0. 1. 0.]
  [0. 1. 0. 1. 0. 0. 0. 0.]]]
# Usage: 3
# Result of conversion to embedding feature (3 columns are embedded dimensions set by yourself):
[[[-0.36440656  0.1924808   0.1217252 ]  # Characterization data as a whole ["value1", "value2"]
  [ 0.71263236 -0.45157978 -0.3456324 ]]    # Characterization data as a whole ["value3", "value3"]

 [[-0.18493024 -0.20456922 -0.3947454 ]    # Characterization data as a whole ["value3", "value5"]
  [-0.19874108  0.6833139  -0.56441975]]    # Characterization data as a whole ["value4", "value4"]

 [[-0.64061695  0.3628776  -0.50413907]    # Characterization data as a whole ["value4", "value5"]
  [-0.28863966  0.14901578  0.16483489]]]    # Characterization data as a whole ["value2", "value4"]

# be careful:
# Using input_ "fea_5" is unavailable while "fea_9" is available during layer. It seems that too high dimension is not supported

# Input data: weighted characteristic results after "fea_weight_2" and "fea_13" (modes 4 and 5):
# Usage: 4
# Dense densor numerical features are useless, but embedding features and weighted multi_hot features:
IdWeightPair(id_tensor=SparseTensorValue(indices=array([[0, 0],
       [0, 1],
       [0, 2],
       [0, 3],
       [1, 0],
       [1, 1],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 1],
       [2, 2],
       [2, 3]], dtype=int64), values=array([0, 1, 2, 2, 2, 6, 3, 3, 3, 6, 1, 3], dtype=int64), dense_shape=array([3, 4], dtype=int64)), weight_tensor=SparseTensorValue(indices=array([[0, 0],
       [0, 1],
       [0, 2],
       [0, 3],
       [1, 0],
       [1, 1],
       [1, 2],
       [1, 3],
       [2, 0],
       [2, 1],
       [2, 2],
       [2, 3]], dtype=int64), values=array([1.1, 2.2, 3.3, 4.4, 9.9, 8.8, 7.7, 6.6, 3.4, 8.8, 2.2, 6.6],
      dtype=float32), dense_shape=array([3, 4], dtype=int64)))
[[ 1.1       2.2       7.7       0.        0.        0.        0.
   0.      ]
 [ 0.        0.        9.9      14.299999  0.        0.        8.8
   0.      ]
 [ 0.        2.2       0.       10.        0.        0.        8.8
   0.      ]]
# Usage: 5
# Weighted embedding
[[ 0.16342753 -0.07898534 -0.33816564  0.2438156 ]
 [ 0.04507026  0.30109608  0.08584949  0.28742552]
 [ 0.00048126  0.315775    0.1192891   0.21302155]]

For the processing result style of sequence feature of sequence feature data:

column = tf.feature_column.sequence_categorical_column_with_vocabulary_list(
        key="feature",
        vocabulary_list=["value1", "value2", "value3"],
        dtype=tf.string,
        default_value=-1,
        num_oov_buckets=2)
# Sparse tensor result after inputting sequence feature "fea_10":
SparseTensorValue(indices=array([[0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 1],
       [1, 0, 0],
       [1, 0, 1],
       [1, 1, 0],
       [1, 1, 1]], dtype=int64), values=array([0, 1, 2, 2, 2, 3, 3, 3], dtype=int64), dense_shape=array([2, 2, 2], dtype=int64))
# Usage: 1
# Convert to integer sequence representation
[[[0 1]
  [2 2]]

 [[2 3]
  [3 3]]]
# Usage: 2
# Convert to multi_hot means (dimension is 5 = volatile_list + num_oov_buckets)
(array([[[1., 1., 0., 0., 0.],
        [0., 0., 2., 0., 0.]],

       [[0., 0., 1., 1., 0.],
        [0., 0., 0., 2., 0.]]], dtype=float32), array([2, 2], dtype=int64))
# Usage: 3
# Convert to embedded representation (dimension setting: 3)
(array([[[ 0.54921925,  0.039222  , -0.20265868], # Means ["value1", "value2"]
        [ 0.3889632 ,  0.43282962, -0.2105029 ]], # Means ["value3", "value3"]

       [[ 0.20231032, -0.11117572, -0.14481466],    # Means ["value3", "value5"]
        [ 0.01565746, -0.65518105, -0.07912641]]    # Means ["value4", "value4"]
], dtype=float32), array([2, 2], dtype=int64))
# Pay attention to using input_layer uses the following api:
# tf.contrib.feature_column.sequence_input_layer
# This api can handle non sequence input_ "fea_5" feature that layer cannot handle

4.3. Use category when the value range of category features is medium_ column_ with_ vocabulary_ file

There are three ways to use:

The category features and sequence category features are expressed as integer numbers
The category features and sequence category features are expressed as multi_hot coding
The category features and sequence category features are expressed as weighted multi_hot coding
Show category and feature table as embedding

The same as 4.2. If the input is dense densor, its input characteristics must be consistent with the dimension:

Processing result style of non sequence characteristic data:

Same as 4.2. Category_ column_ with_ vocabulary_ list

column = tf.feature_column.categorical_column_with_vocabulary_file(
        key="feature",
        vocabulary_file="valuelist",
        dtype=tf.string,
        default_value=None,
        num_oov_buckets=3)
# be careful:
# Using input_layer cannot process multidimensional feature data such as "fea_5"
# 
# The contents of the file valuelist are as follows:
value1
value2
value3

For the processing result style of sequence feature of sequence feature data:

Same as 4.2. Category_ column_ with_ vocabulary_ list

column = tf.feature_column.sequence_categorical_column_with_vocabulary_file(
        key="feature",
        vocabulary_file="valuelist",
        dtype=tf.string,
        default_value=None,
        num_oov_buckets=3)
# The results and precautions are the same as the sequence in 4.2

4.3. For int integer feature data, it is used as classification feature_ column_ with_ identity

There are four ways to use:

The category features and sequence category features are expressed as integer numbers
The category features and sequence category features are expressed as multi_hot coding
The category features and sequence category features are expressed as weighted multi_hot coding
Show category and feature table as embedding

The same as 4.2. If the input is dense densor (int type), the input feature dimension must be consistent

Examples of processing results for non sequence features:

column = tf.feature_column.categorical_column_with_identity(
        key='feature',
        # The value range is [0, num_buckets)
        num_buckets=10,
        # The value to be mapped when the data is not in [0, num_buckets).
        # The default is None. In this case, an error will be reported when there is unknown data.
        # Default required_ The value of value is within [0, num_buckets)
        default_value=3)
# The results and notes are exactly the same as those in 4.2

Examples of processing results for sequence features:

column = tf.feature_column.sequence_categorical_column_with_identity(
        key='feature',
        num_buckets=10,
        default_value=3)
# The results and notes are exactly the same as those in 4.2

4.4. For too many values of string or int data as classification features: category_ column_ with_ hash_ bucket

There are three ways to use:

The category features and sequence category features are expressed as integer numbers
The category features and sequence category features are expressed as multi_hot coding
The category features and sequence category features are expressed as weighted multi_hot coding
Show category and feature table as embedding

The same as 4.2. If the input is dense densor data, the input feature dimension must be consistent

Examples of processing results for non sequence features:

# string type
column = tf.feature_column.categorical_column_with_hash_bucket(
        key="feature",
        # Space size of hash
        hash_bucket_size=10,
        # Only string and integer are supported
        # Numeric types are also hash mapped
        dtype=tf.string)
# int type
column = tf.feature_column.categorical_column_with_hash_bucket(
        key="feature",
        hash_bucket_size=10,
        dtype=tf.int64)
# The results and notes are exactly the same as those in 4.2

Examples of processing results for sequence features:

# Handle string type
column = tf.feature_column.sequence_categorical_column_with_hash_bucket(
        key="feature",
        hash_bucket_size=10,
        dtype=tf.string)
# Handle int type
column = tf.feature_column.sequence_categorical_column_with_hash_bucket(
        key="feature",
        hash_bucket_size=10,
        dtype=tf.int64)
# The results and notes are exactly the same as those in 4.2

4.5. Cross processing of string or int features_ column

There are four ways to use:

The category features and sequence category features are expressed as integer numbers
The category features and sequence category features are expressed as multi_hot coding
The category features and sequence category features are expressed as weighted multi_hot coding
Show category and feature table as embedding

The same as 4.2. If the input is dense densor data, the input feature dimension must be consistent

# When keys is the original input characteristic data:
column = tf.feature_column.crossed_column(
        # The type of keys can also be CategoricalColumn (category of hash type cannot be used)
        keys=["fea_9", "fea_12"],
        hash_bucket_size=100,
        hash_key=None)
# When keys is a category feature of non hash type:
column_voc = tf.feature_column.categorical_column_with_vocabulary_file(
        key="fea_9",
        vocabulary_file="valuelist",
        dtype=tf.string,
        default_value=None,
        num_oov_buckets=3)
column_iden = tf.feature_column.categorical_column_with_identity(
        key='fea_12',
        num_buckets=10,
        default_value=3)
column_cro = tf.feature_column.crossed_column(
        keys=[column_voc,column_iden],
        hash_bucket_size=10,
        hash_key=None)
# The result and attention are exactly the same as the non sequence in 4.2

4.5. For int type features, the bucket is divided according to the value boundary_ The feature processing of hot is bucketized_column

How to use:

The category feature and sequence category feature are represented as one_hot coding

Input as dense feature densor

numeric_column = tf.feature_column.numeric_column(
        key="feature",
        shape=6,
        default_value=0,
        dtype=tf.float32)
column = tf.feature_column.bucketized_column(
    # numeric column of 1-D
    source_column=numeric_column,
    # The list of requirements is in ascending order
    boundaries=[3, 5, 7, 10])
# Enter as "fea_14" numeric feature
# The result is
# Output mode: 1
# Using input_layer output
[[1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0.
  0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
  1. 0. 0. 0. 0. 1.]]
# Output mode: 2
# get_ dense_ Output of tensor mode
[[[[1. 0. 0. 0. 0.]
   [1. 0. 0. 0. 0.]]

  [[0. 1. 0. 0. 0.]
   [0. 1. 0. 0. 0.]]

  [[0. 0. 1. 0. 0.]
   [0. 0. 1. 0. 0.]]]


 [[[0. 0. 0. 1. 0.]
   [0. 0. 0. 1. 0.]]

  [[0. 0. 0. 1. 0.]
   [0. 0. 0. 0. 1.]]

  [[0. 0. 0. 0. 1.]
   [0. 0. 0. 0. 1.]]]]

5,multi_hot,one_hot,embedding,share_ Usage examples of embedding

5.1,multi_hot feature making method: indicator_column

column : categorical_column_with_vocabulary_list
         sequence_categorical_column_with_vocabulary_list
         categorical_column_with_vocabulary_file
         sequence_categorical_column_with_vocabulary_file
         categorical_column_with_identity
         sequence_categorical_column_with_identity
         categorical_column_with_hash_bucket
         sequence_categorical_column_with_hash_bucket
         crossed_column
         weighted_categorical_column

tf.feature_column.indicator_column(column)

5.2. Manufacturing method of embedding feature: embedding_column

column : categorical_column_with_vocabulary_list
         sequence_categorical_column_with_vocabulary_list
         categorical_column_with_vocabulary_file
         sequence_categorical_column_with_vocabulary_file
         categorical_column_with_identity
         sequence_categorical_column_with_identity
         categorical_column_with_hash_bucket
         sequence_categorical_column_with_hash_bucket
         crossed_column
         weighted_categorical_column

tf.feature_column.embedding_column(column)

5.3,one_hot feature making method: Bagged_ column

numeric_column = tf.feature_column.numeric_column(
        key="feature",
        shape=6,
        default_value=0,
        dtype=tf.float32)
column = tf.feature_column.bucketized_column(
    # numeric column of 1-D
    source_column=numeric_column,
    # The list of requirements is in ascending order
    boundaries=[3, 5, 7, 10])

5.4,share_embedding feature making method: shared_embeddings

column : categorical_column_with_vocabulary_list
         sequence_categorical_column_with_vocabulary_list
         categorical_column_with_vocabulary_file
         sequence_categorical_column_with_vocabulary_file
         categorical_column_with_identity
         sequence_categorical_column_with_identity
         categorical_column_with_hash_bucket
         sequence_categorical_column_with_hash_bucket
         crossed_column
         weighted_categorical_column

tf.feature_column.shared_embeddings(column,column)

6. Some problems and explanations

In tensorflow1 In X:

from tensorflow.python.feature_column import feature_column as fc
from tensorflow.python.feature_column import feature_column_v2 as fc_v2
fc_ v2. The featuretransformationcache method is used to cache the input data (densor or spark densor)
column.get_dense_tensor(transformation_cache=feature_cache, state_manager=None) corresponds to the cached data in the feature transformation cache, where the column type is: TF feature_ column. numeric_ column
fc._ The lazybuilder method is also used to cache the input data (densor or spark densor)
column._get_dense_tensor(builder,None) corresponds to_ Cache data in LazyBuilder, where column type is: TF feature_ column. numeric_ column
tf. feature_column. input_ The layer method directly combines the input data with the feature_column as input and transform, where feature_column type: TF feature_column. numeric_ column
column2._get_sequence_dense_tensor(builder,None), where the type of column2 is: TF contrib. feature_ column. sequence_ numeric_ Column, builder:_ LazyBuilder
column.get_sequence_dense_tensor(feature_cache, None), where the column type is: TF feature_ column. sequence_ numeric_ column，feature_catch: FeatureTransformationCache
tf.contrib.feature_column.sequence_input_layer(features,column2), where the type of column2 is: TF contrib. feature_ column. sequence_ numeric_ column
fc_v2._StateManagerImpl(layer=tf.keras.layers.Layer(), trainable=True) is used to create weights that generate embedding features
weigthed_col_emb.create_state(state_manager) where state_ Manager is_ StateManagerImpl，weigthed_col_emb: TF feature_ column. embedding_ column
rs_w_3 = weigthed_col_emb.get_dense_tensor(feature_weight_cache,state_manager)，feature_ weight_ The cache is FeatureTransformationCache, state_ As manager_ StateManagerImpl
The method of outputting characteristic value (densor or spark densor) is as follows:

with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        sess.run(tf.tables_initializer())
        print(sess.run(rs_1))
        print(rs_1.eval())
        print(tf.sparse_tensor_to_dense(rs_2.id_tensor,-1).eval())

Keywords: TensorFlow

Added by installer69 on Wed, 09 Feb 2022 20:58:32 +0200

Programming VIP

TensorFlow series - feature_column feature tool description

1, Some tools

1. Three features of local printout non sequential sequence data_ Method of column converted value

Applicable to tensorflow1 x

2. Three kinds of features for printing sequence data_ Method of column converted value

Applicable to tensorflow1 x

2.1 function of sequence feature sequence

2.2. How to use sequence feature

3. Explain

2, feature_column introduction

1,feature_ What is column

2. What data processing can be done

3. It can handle continuous real number (int or float) features with fixed length

3.1. Examples of data that can be processed

3.2. Normalizer can be used_ FN method processes the input data

3.3 non sequence feature_ How to write column

3.4. Skills of using fixed length real number features

3.5. sequence feature_ How to write column

4. Can handle fixed length discrete feature category column

4.1 examples of non sequence and sequence sequence sequence characteristic data (int type, string type) that can be processed

4.2. Few feature value categories: categorical_column_with_vocabulary_list

4.3. Use category when the value range of category features is medium_ column_ with_ vocabulary_ file

4.3. For int integer feature data, it is used as classification feature_ column_ with_ identity

4.4. For too many values of string or int data as classification features: category_ column_ with_ hash_ bucket

4.5. Cross processing of string or int features_ column

4.5. For int type features, the bucket is divided according to the value boundary_ The feature processing of hot is bucketized_column

5,multi_hot,one_hot,embedding,share_ Usage examples of embedding

5.1,multi_hot feature making method: indicator_column

5.2. Manufacturing method of embedding feature: embedding_column

5.3,one_hot feature making method: Bagged_ column

5.4,share_embedding feature making method: shared_embeddings

6. Some problems and explanations

Popular Keywords