Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Author: torontoai

ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots

Learning-based methods for solving robotic control problems have recently seen significant momentum, driven by the widening availability of simulated benchmarks (like dm_control or OpenAI-Gym) and advancements in flexible and scalable reinforcement learning techniques (DDPG, QT-Opt, or Soft Actor-Critic). While learning through simulation is effective, these simulated environments often encounter difficulty in deploying to real-world robots due to factors such as inaccurate modeling of physical phenomena and system delays. This motivates the need to develop robotic control solutions directly in the real world, on real physical hardware.

The majority of current robotics research on physical hardware is conducted on high-cost, industrial-quality robots (PR2, Kuka-arms, ShadowHand, Baxter, etc.) intended for precise, monitored operation in controlled environments. Furthermore, these robots are designed around traditional control methods that focus on precision, repeatability, and ease of characterization. This stands in sharp contrast with the learning-based methods that are robust to imperfect sensing and actuation, and demand (a) a high degree of resilience to allow real-world trial-and-error learning, (b) low cost and ease of maintenance to enable scalability through replication and (c) a reliable reset mechanism to alleviate strict human monitoring requirements.

In “ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots”, to be presented at CoRL 2019, we introduce an open-source platform of cost-effective robots and curated benchmarks designed primarily to facilitate research and development on physical hardware in the real world. Analogous to an optical table in the field of optics, ROBEL serves as a rapid experimentation platform, supporting a wide range of experimental needs and the development of new reinforcement learning and control methods. ROBEL consists of D’Claw, a three-fingered hand robot that facilitates learning of dexterous manipulation tasks and D’Kitty, a four-legged robot that enables the learning of agile legged locomotion tasks. The robotic platforms are low-cost, modular, easy to maintain, and are robust enough to sustain on-hardware reinforcement learning from scratch.

Left: The 12 DoF D’Kitty; Middle: The 9 DoF D’Claw; Right: A functional D’Claw setup D’Lantern.

In order to make the robots relatively inexpensive and easy to build, we based ROBEL’s designs on off-the-shelf components and commonly-available prototyping tools (3D-printed or laser cut). Designs are easy to assemble and require only a few hours to build. Detailed part lists (with CAD details), assembly instructions, and software instructions for getting started are available here.

ROBEL Benchmarks
We devised a set of tasks suitable for each platform, D’Claw and D’Kitty, which can be used for benchmarking real-world robotic learning. ROBEL’s task definitions include both dense and sparse task objectives, and introduce metrics for hardware-safety in the task definition, which for example, indicate if joints are exceeding “safe” operating bounds or force thresholds. ROBEL also supports a simulator for all tasks to facilitate algorithmic development and rapid prototyping. D’Claw tasks are centered around three commonly observed manipulation behaviors — Pose, Turn, and Screw.

Left: Pose — Conform to the shape of the environment. Center: Turn — Turn the object to a specified angle. Right: Screw — Continuously rotate the object. (Click images for video.)

D’Kitty tasks are centered around three commonly observed locomotion behaviors — Stand, Orient, and Walk.

Left: Stand — Stand upright. Center: Orient — Align heading with the target. Right: Walk — Move to the target. (Click images for video.)

We evaluated several classes (on-policy, off policy, demo-accelerated, supervised) of deep reinforcement learning methods on each of these benchmark tasks. The evaluation results and the final policies are included as baselines in the software package for comparison. Full task details and baseline performances are available in the technical report.

Reproducibility & Robustness
ROBEL platforms are robust to sustain direct hardware training, and have clocked over 14,000 hours of real-world experience to-date. The platforms have significantly matured over the year. Owing to the modularity of the design, repairs are trivial and require minimal to no domain expertise, making the overall system easy to maintain.

To establish the replicability of the platforms and reproducibility of the benchmarks, ROBEL was studied in isolation by two different research labs. Only software distribution and documentation was used in this study. No in-person visits were allowed. Using ROBEL’s design files and assembly instructions both sites were able to replicate both hardware platforms. Benchmark tasks were trained on robots built at both sites. In the figure below we see that two D’Claw robots built at two different sites not only exhibit similar training progress but also converge to the same final performance, establishing reproducibility of the ROBEL benchmarks.

SAC training performance of a task on two real D’Claw robots developed at different laboratory locations.

Results Gallery
ROBEL has been useful in a variety of reinforcement learning studies so far. Below we highlight a few of the key results, and you can find all our results in this comprehensive gallery. D’Claw platforms are completely autonomous and can sustain reliable experimentation for an extended period of time, and has facilitated experimentation with a wide variety of reinforcement learning paradigms and tasks using both rigid and flexible objects.

Left: Flexible Objects — On-hardware training with DAPG effectively learns to turn flexible objects. We observe manipulation targeting the center of the valve where there is more rigidity. D’Claw is robust to on-hardware training, facilitating successful outcomes on hard to simulate tasks. Center: Disturbance Rejection — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with object perturbations (amongst others) being tested on hardware. We observe fingers working together to resist external disturbances. Right: Obstructed Finger — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with external perturbations (amongst others) being tested on hardware. We observe that free fingers fill in for the missing finger.

Importantly, D’Claw platforms are modular and easy to replicate, which facilitates scalable experimentation. With our scaled setup, we find that multiple D’Claws can collectively learn tasks faster by sharing experience.

On-hardware training with distributed version of SAC leaning to turn multiple objects to arbitrary angles in conjunction by sharing experience. Five tasks only need twice the amount of experience of single tasks, thanks to the multi-task formulation. In the video we observe five D’Claws turning different objects to 180 degrees (picked for visual effectiveness, actual policy can turn to any angle).

We have also been successful in deploying robust locomotion policies on the D’Kitty platform. Below we show a blind D’Kitty walking over indoor and outdoor terrains exhibiting the robustness of its gait in presence of unseen disturbances.

Left: Indoor – Walking in Clutter — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized perturbations learns to walk in clutter and step over objects. Center: Outdoor – Gravel and Branches — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized height field learns to walk outdoors over gravel and branches. Right: Outdoor – Slope and Grass — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized height field learns to handle moderate slopes.

When presented with information about its torso and objects present in the scene, D’Kitty can learn to interact with these objects exhibiting complex behaviors.

Left: Avoid Moving Obstacles — Policy trained via Hierarchical Sim2Real learns to avoid a moving block and reach the target (marked by the controller on the floor). Center: Push to Moving Goal — Policy trained via Hierarchical Sim2Real learns to push block towards a moving target (marked by the controller in the hand). Right: Co-ordinate — Policy trained via Hierarchical Sim2Real learns to coordinate two D’Kitties to push a heavy block towards a target (marked by two + signs on the floor).

In conclusion, ROBEL platforms are low cost, robust, reliable and are designed to accommodate the needs of the emerging learning-based paradigms that need scalability and resilience. We are proud to announce the release of ROBEL to the open source community and are excited to learn about the diversity of research and experimentation they will enable. For getting started on ROBEL platforms and ROBEL benchmarks refer to roboticsbenchmarks.org.

Acknowledgments
Google’s ROBEL D’Claw evolved from earlier designs Vikash Kumar developed at the Universities of Washington and Berkeley. Multiple people across organizations have contributed towards the ROBEL projects. We thank our co-authors Henry Zhu (UC Berkeley), Kristian Hartikainen (UC Berkeley), Abhishek Gupta (UC Berkeley) and Sergey Levine (Google and UC Berkeley) for their contributions and extensive feedback throughout the project. We would like to acknowledge Matt Neiss (Google) and Chad Richards (Google) for their significant contribution to the platform designs. We would also like to thank Aravind Rajeshwaran (U-Washington), Emo Todorov (U-Washington), and Vincent Vanhoucke (Google) for their helpful discussions and comments throughout the project.

[D] Transfer learning on GANs?

Sorry if this question has been asked a million times before but I failed to find a good explanation so far.

Transfer learning is common for image classification task with models pre-trained from Imagenet. But how to do that for image generation? Given the recent amazing results on GAN research to generate high quality images, such as BigGAN, StyleGAN, etc., it would be ideal if I can leverage these pre-trained weights for my own small dataset.

submitted by /u/worldconcepts
[link] [comments]

[D] Are NeurIPS workshop authors reserved tickets for the main conference?

Does anyone know if workshop authors are reserved a registration slot at the main conference in addition to the workshops? I’ve asked several organizers but can’t seem to get a consistent answer. I’d have to pay my own way to Vancouver, so I want to be sure I can attend the whole event before committing.

Additional info: In this blog post, the organizers say, “Registrations and tickets will be withheld from the lottery for the following content creators: authors of accepted papers, workshop presenters, …” This seems to imply that tickets are set aside for workshop presenters. But the website says “Workshop organizers will have a limited number of reserve tickets to give to workshop presenters”, which seems to imply the opposite, i.e. that only a select few workshop authors are given tickets.

Any workshop authors out there who have gone through the process and know how it works?

submitted by /u/mitare
[link] [comments]

[D] Machine Learning : Explaining Uncertainty Bias in Machine Learning

I am interesting in this topic, where one can attempt to extract meaningful interpretation on Uncertainty Bias in Machine Learning. Does anyone knows any related papers in this topic?

I already read several papers such as

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should i trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 2016.

Lipton, Zachary C. “The mythos of model interpretability.” arXiv preprint arXiv:1606.03490 (2016).

These papers try to interpret why certain models produce its prediction, while I am interesting to explain “Why this model uncertain of this data points”.

Thank you very much for your help.

submitted by /u/rmfajri
[link] [comments]

[Discussion] SOTA of ES-based RL algorithms

(a previous version of this post was removed because of a missing tag. I am sorry for this and hope to have fixed it. A message would have been nice, though since i can’t add tags afterwards)

Since people recognized that ES can solve RL-tasks, which the ES community knew more than 10 years ago, we have a crazy amount of RL algorithms based on ES. However, the ML/RL field is not looking at what the ES community is doing, but is basically repeating the same mistake the community did more than 20 years ago. The OpenAI paper would not pass any review in an ES track at GECCO because the algorithm would not be even considered a valid baseline anymore. While it is okay for the first paper reintroducing this to not know stuff, it is not okay for the follow-up work. This ignorance of SOTA in the field while knowing that the field exists is worrying.

To make this a bit more productive, here are a few references:

1.most importantly The original ES-based RL paper:

Heidrich-Meisner, Verena, and Christian Igel. “Neuroevolution strategies for episodic reinforcement learning.” Journal of Algorithms 64.4 (2009): 152-168.

  1. CMA-ES and NES

Hansen, N., Müller, S. D., & Koumoutsakos, P. (2003). Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary computation, 11(1), 1-18.

Krause, O., Arbonès, D. R., & Igel, C. (2016). CMA-ES with optimal covariance update and storage complexity. In Advances in Neural Information Processing Systems (pp. 370-378).

Wierstra, D., Schaul, T., Peters, J., & Schmidhuber, J. (2008, June). Natural evolution strategies. In 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) (pp. 3381-3387). IEEE

  1. Review of SOTA in large-scale ES:

Varelas, K., Auger, A., Brockhoff, D., Hansen, N., ElHara, O. A., Semet, Y., … & Barbaresco, F. (2018, September). A comparative study of large-scale variants of CMA-ES. In International Conference on Parallel Problem Solving from Nature (pp. 3-15). Springer, Cham.

  1. Recent developments for noisy functions (also references other relevant algorithms with noise-handling)

Krause, O. (2019, July). Large-scale noise-resilient evolution-strategies. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 682-690). ACM.

submitted by /u/Ulfgardleo
[link] [comments]

[Project] Projell.com – Simple APIs for synthetic data generation

Hi, I’m Sumit Srivastava, founder of Projell.com . We made this after dealing with the data hell like low data availability, high data procuring cost, huge time sink for data collection, and privacy concerns over the user data.

This prompted me to build an easy way to generate synthetic data for machine learning models. This primarily uses GANs, but we use techniques which are most efficient for specific usecases.

Areas where we’ve found it useful are biomedical, drone imagery, satellite imagery, retail, and autonomous mobility.

As already prominent in the ImageNet challenge, the state of the art is using synthetic data to gain higher accuracy. [ https://paperswithcode.com/sota/image-classification-on-imagenet ]

Google, for their autonomous vehicles, used millions of miles of real driving data and billions of miles of synthetic data. It is clear where the world is moving towards.

I would be happy to share the tools with everyone since dealing with data is something we struggled with and don’t want anyone to struggle anymore. This is probably only the first step towards building something robust that can reduce with as much data hassles as possible, if not all.

submitted by /u/sum2it
[link] [comments]

[D] word2vec architecture

I was trying to understand the skipgram model of word2vec, and I had some problems in understanding the details. I’m clear about the high level idea – given a word, predict the context of the word. However, when you actually train the model, what is the input and output of the model for a particular training instance? To be more concrete with an example, disregarding all sophisticated techniques like negative sampling etc., if I have the sentence “it is a beautiful day today”, the input to the cbow version would be average of one-hot encoding of “it”, “is”, “a”, “day”, “today” and the output should ideally be one-hot encoding of “beautiful”. For skip-gram, I’m confused – given input one-hot encoding of “beautiful”, what should be the output be? Should be average of one-hot encoding of “it”, “is”, “a”, “day”, “today” in a single training instance or “it”, “is”, “a”, “day”, “today” in 5 separate training instances? I tried to go through the gensim codebase to understand what they do, but it’s not clear.

As an extension to this question, I also wanted to know what happens in negative sampling. The way I have understood it is that instead of forcing determinate values in the output vector to say that we want each element to match precisely to the expected one-hot encoding of the output, we say that we want to enforce 1s and 0s at only a select few places in the vectors (corresponding to positive and negative samples), which reduces the amount of back-propagation. Is this correct?

submitted by /u/alexsolanki
[link] [comments]