[D] Are there any theoretical connections between dropout regularization and ensemble learning?

I’ve anecdotally seen the connection between dropout and ensemble learning (i.e. dropout essentially trains a subnetwork so over the course of training it trains like an ensemble) mentioned in several places; however, I couldn’t find any theoretical references. Are there any known theoretical results that make the connection between ensemble learning and dropout more concrete?

