Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] A BertSum (Bert extractive summarizer) model trained on research papers. Access to datasets also included.

A few months ago, I released several a dataset from ~7 million papers for ~12 million datapoints. I think the most exciting part were the datasets designed by a similar methodology of Alexios Gidiotis, Grigorios Tsoumakas [] who discovered that there are many papers with structured abstractions, whose sections correspond to entire sections within the papers.

Having a dataset of these abstract sections and full paper sections is probably the best dataset available for research paper summarization, as far as I know.

Using some of the text processing methods in Gidiotis, Tsoumakas, and using Semantic Scholar’s Science Parse, I was able to create a dataset from Arxiv and the Semantic Scholar Corpus.

I have now released a model using a slightly modified version of the BertSum repo [ ]. The model was trained on a batch size of 1024 for 5000 steps, and then a batch size of 4096 for 25000 steps.

The datasets and model are all available here.

I also included text processing and training setups for Pointer-Generator and the Tensor2Tensor transformers abstractive summarizers. At the time they were the best for abstractive summarization, but for the purposes of my future project, I needed the most accurate summarizer, which needed an extractive method.

submitted by /u/BatmantoshReturns
[link] [comments]

Next Meetup




Plug yourself into AI and don't miss a beat


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.