Best way to train an RNN to play an instrument in a specific performer's style

I think building a model to learn a specific musician’s style could be a fun side project and was hoping to get some thoughts. I have complete multi-track MIDI discographies for a few bands, but I’m struggling to figure out the best way to structure the input/output of the model.

The general idea is to feed in some set of instrument/vocal tracks (this selection will basically be a hyperparameter) and generate the target track as an output. An RNN seems like the obvious approach and I was planning to start with a BLSTM using Keras and music21 in Python. To generate the training data, I’ll use a training set of the songs to randomly sample 10 second (another hyperparameter) clips. The same track will always be the target, and I am planning to use a consistent subset of the other tracks as the input. However, it would be nice to eventually use a varied set of input tracks to generate the output.

Does anyone have experience doing something similar to this, or have any relevant papers/libraries to recommend?

