Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] Computer Science Summarization Dataset

This is a dataset of 5.6 million title / abstract data points, about 75% of which are from computer science papers (I tried my best to filter all non-CS papers (perhaps the non-CS papers add a bit of a “regularization” effect . . . ?) ) .

Title/Abstract pairs have been used to train biomedical summarizers [] , but I am doing a project on CS/ML papers so I made my own.

The dataset is basically a filtered version of the Semantic Scholar Corpus

But it took some effort to produce it and I figure I may save some people time if they wanted the same.

This is a zip file containing 12 parquet files

it’s ~2.5 gb zipped, I think like 6 something gigs unzipped

This is the sqlite database version, 1 file

it’s 2.5 gb zipped, 7.5 gb unzipped

If anyone is interested, this a part of an ongoing project to use deep learning models to better search through research papers, started with ML/CS papers. If anyone is interested in being involved, feel free to reach out. We also have a public page if anyone wants to keep updated.

submitted by /u/BatmantoshReturns
[link] [comments]

Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.