Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[N] Interview with Hamel Husain on semantic code search research at GitHub

“We hope that the community can use this dataset to improve developer tools generally, which may include semantic code search. We hope that the state of the art with regards to representation learning of code is advanced because researchers and practitioners now have a common dataset and a forum in which to discuss results. We also hope that the uniqueness of the dataset will inspire the community to uncover new approaches and techniques for code and natural language understanding.”

That’s a quote from the one of the authors of CodeSearchNet – datasets, tools, and benchmarks for representation learning of code. This research on semantic code search has been posted here before as news, but I thought some people here might be interested to know some of the details behind what goes into a project like this at a big company. I interviewed Hamel Husain, a machine learning engineer at GitHub about how the project started and evolved into a wider open source effort to involve the ML research community. Hope there are useful takeaways for people here.

Here’s a link to the interview:

And here’s a link to the original paper on arXiv:

submitted by /u/Jefro118
[link] [comments]