How to use tf.data.TextLineDataset in TensorFlow| tf.data

tf.data.TextLineDataset module in TensorFlow returns a Dataset comprising records from the given input files. Lets understand working of tf.data.TextLineDataset with an example

This post covers following modules and functions.

  • %%writefile magic function for creating file from Jupyter or Colab notebook.
  • tf.constant for creating tensor of file names.
  • tf.data.TextLineDataset for creating dataset iterator from the given files.

Load tensorflow and create sample files with %%writefile


import tensorflow as tf


%%writefile sample1.txt
0,male,22.0,1,0,7.25,Third,unknown,Southampton,n
1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
1,female,26.0,0,0,7.925,Third,unknown,Southampton,y
1,female,35.0,1,0,53.1,First,C,Southampton,n
0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y
0,male,2.0,3,1,21.075,Third,unknown,Southampton,n


%%writefile sample2.txt
0,male,28.0,0,0,7.225,Third,unknown,Cherbourg,y
0,male,19.0,3,2,263.0,First,C,Southampton,n
1,female,28.0,0,0,7.8792,Third,unknown,Queenstown,y
0,male,28.0,0,0,7.8958,Third,unknown,Southampton,y
0,male,40.0,0,0,27.7208,First,unknown,Cherbourg,y
1,female,28.0,1,0,146.5208,First,B,Cherbourg,n


%%writefile sample3.txt
0,male,28.5,0,0,7.2292,Third,unknown,Cherbourg,y
0,male,11.0,5,2,46.9,Third,unknown,Southampton,n
0,male,22.0,0,0,7.2292,Third,unknown,Cherbourg,y
1,female,38.0,0,0,80.0,First,B,unknown,y

Create tensor with files generated in previous steps


filenames = tf.constant(value = ["sample1.txt", "sample2.txt", "sample3.txt"], dtype = tf.string)

Iterate over dataset generated with tf.data.TextLineDataset


dataset = tf.data.TextLineDataset(filenames)
for line in dataset:
  print(line)

Example Output : notice records from all the three files in dataset


tf.Tensor(b'0,male,22.0,1,0,7.25,Third,unknown,Southampton,n', shape=(), dtype=string)
tf.Tensor(b'1,female,38.0,1,0,71.2833,First,C,Cherbourg,n', shape=(), dtype=string)
tf.Tensor(b'1,female,26.0,0,0,7.925,Third,unknown,Southampton,y', shape=(), dtype=string)
tf.Tensor(b'1,female,35.0,1,0,53.1,First,C,Southampton,n', shape=(), dtype=string)
tf.Tensor(b'0,male,28.0,0,0,8.4583,Third,unknown,Queenstown,y', shape=(), dtype=string)
tf.Tensor(b'0,male,2.0,3,1,21.075,Third,unknown,Southampton,n', shape=(), dtype=string)
tf.Tensor(b'0,male,28.0,0,0,7.225,Third,unknown,Cherbourg,y', shape=(), dtype=string)
tf.Tensor(b'0,male,19.0,3,2,263.0,First,C,Southampton,n', shape=(), dtype=string)
tf.Tensor(b'1,female,28.0,0,0,7.8792,Third,unknown,Queenstown,y', shape=(), dtype=string)
tf.Tensor(b'0,male,28.0,0,0,7.8958,Third,unknown,Southampton,y', shape=(), dtype=string)
tf.Tensor(b'0,male,40.0,0,0,27.7208,First,unknown,Cherbourg,y', shape=(), dtype=string)
tf.Tensor(b'1,female,28.0,1,0,146.5208,First,B,Cherbourg,n', shape=(), dtype=string)
tf.Tensor(b'0,male,28.5,0,0,7.2292,Third,unknown,Cherbourg,y', shape=(), dtype=string)
tf.Tensor(b'0,male,11.0,5,2,46.9,Third,unknown,Southampton,n', shape=(), dtype=string)
tf.Tensor(b'0,male,22.0,0,0,7.2292,Third,unknown,Cherbourg,y', shape=(), dtype=string)
tf.Tensor(b'1,female,38.0,0,0,80.0,First,B,unknown,y', shape=(), dtype=string)