Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] A BertSum (Bert extractive summarizer) model trained on research papers. Access to datasets also included.

https://github.com/Santosh-Gupta/ScientificSummarizationDataSets

A few months ago, I released several a dataset from ~7 million papers for ~12 million datapoints. I think the most exciting part were the datasets designed by a similar methodology of Alexios Gidiotis, Grigorios Tsoumakas [https://arxiv.org/abs/1905.07695] who discovered that there are many papers with structured abstractions, whose sections correspond to entire sections within the papers.

Having a dataset of these abstract sections and full paper sections is probably the best dataset available for research paper summarization, as far as I know.

Using some of the text processing methods in Gidiotis, Tsoumakas, and using Semantic Scholar’s Science Parse, I was able to create a dataset from Arxiv and the Semantic Scholar Corpus.

I have now released a model using a slightly modified version of the BertSum repo [ https://github.com/nlpyang/BertSum https://arxiv.org/abs/1903.10318 ]. The model was trained on a batch size of 1024 for 5000 steps, and then a batch size of 4096 for 25000 steps.

The datasets and model are all available here.

https://github.com/Santosh-Gupta/ScientificSummarizationDataSets

I also included text processing and training setups for Pointer-Generator and the Tensor2Tensor transformers abstractive summarizers. At the time they were the best for abstractive summarization, but for the purposes of my future project, I needed the most accurate summarizer, which needed an extractive method.

submitted by /u/BatmantoshReturns
[link] [comments]