[D] BERT “pooled” output? What kind of pooling?
Quick question from https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
pooled_output
: pooled output of the entire sequence with shape
[batch_size, hidden_size]
What kind of pooling are they talking about here? I don’t see it mentioned in the paper. Thanks.
submitted by /u/ME_PhD
[link] [comments]