[Discussion] What hardware and computational time is reasonable for training GPT 117M parameter model from scratch?
I see that a replica of 40GB WebText has been released. I am wondering how long it would take to train the smallest GPT model on this from scratch. I havent done much training of such large datasets which is why I am asking. Hopefully someone with some experience can help me.
Would it be reasonable for an individual to do this on Google Cloud platform with the £250 credits?