[D] Can I use tf.data to calculate new features as part of a pipeline, or should this be done before using the tf.data module?

Written by torontoai on August 5, 2019. Posted in Reddit MachineLearning.

I am just curious about how much of the data processing process I can refactor into a tf.data pipeline for inputting my data into my model. My source data is used to calculate different features to create a dataset, and then this dataset is processed further for inputting into my models. So the process is basically like this:

Source Data (structured JSON which just has text fields for data parsed from a raw document) —>
Dataset (this fields are used to calculate numerical features, categorical features, and sequence features) —>
Processed Dataset (standard techniques – scaling, encoding, tokenization, padding, etc.)

And then I have my input data for the model. I am wondering whether I can refactor this entire process into a tf.data pipeline, or will the tf.data pipeline only handle the processing done in the second step described above? I am using TF 2.0 Beta by the way.

Any insights or help will be greatly appreciated.

submitted by /u/that_one_ai_nerd
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Can I use tf.data to calculate new features as part of a pipeline, or should this be done before using the tf.data module?