Link to colab notebook used in session is here - colab.research.google.com/gist/rafiqhasan/2164304ede002f4a8bfe56e5434e1a34/dl-e2e-taxi-dataset-tfx-e2e.ipynb
Thank you so much for providing such quality content. I followed the notebook along with the video myself and was able to create the model using the full pipeline. Looking forward to trying to deploy my own model using this tutorial as well as watching the other tutorials on Spark!
The reason to split the data is to avoid any form of leakage in the test data set. Basically, whatever the transformation that is made to the training data, e.g., normalization, that extracts statistics of mean and standard deviation. That is derived only using the training data set. So yes, we generally want to split the dataset into training and test set from the start. The latter is then used to evaluate the generalization error of the trained model
thank you very much for such good material. can I find more tutorials about TFX made by you guys or can you introduce me to other useful materials like this video?
Thank you so much team! Really great sessions and I hope you continue these sessions and also please could you upload the NLP and time series playlist as and when you get the time! 😃
@@AIEngineeringLife in NLP I actually wanted a theory lecture/video on the architecture of BERT and its variations Roberta Alberta if that would be a possibility. I tried to refer a couple of other resources but ultimately your explanation is something I fully understand.
Thank you for such awesome resource, sir. I am getting errors importing the below components: from tfx.components import ResolverNode from tfx.utils.dsl_utils import external_input ImportError: cannot import name 'ResolverNode' from 'tfx.components' (/usr/local/lib/python3.7/dist-packages/tfx/components/__init__.py) Please help.