Blog

Learn About Our Meetup

4500+ Members

[D] why softmax+CE over sigmoid+BCE?

Most of the popular neural network language models use softmax+cross entropy loss during training, which is based on the assumption that only the target label is true, and everything else is false. But isn’t language modeling a multilabel classification task? why sigmoid+BCE isn’t used often?

submitted by /u/DeMorrr
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.