Posts

4.1 - Redefining overall structure

Image
As seen in this updated mindmap below, there is a lot going on internally when making music, but there also is emotional influence of the listener, either intended or not, by the maker. This makes for a more wholesome structure of the research field and includes all the different parts that make music in itself an interesting thing to study.  When looking into the technical details of making an artificial music generator there is a part which analyses data, implements learned details (which melodic and song structure are made up of)  and the actual generator part which uses the aforementioned learned details and rules to generate music.  In a set up with GANs there is the possibility to generate more data with the encoder, while the decoder is fed this information to discriminate between. The encoder is therefore atuned to generate different types of subsets and learns better what the difference with the original data set is. The sequential aspect of music makes it less su

4.0 - More data, more data!

Good gracious, I just found some more data, the Nottingham Database , which is a collection of  ABC formatted music files. This format can be put into MIDI format and vice versa. Of course I'm facing a problem, first of all, the specific data I want, MIDI with a lot of different genres, is not widely available. Therefore I have a couple of options: Try to train on actual MP3/OGG/WAV/FLAC music files, which is going to take forever. Although the FMA data set offers 30s samples of the whole collection of songs.  NSynth is a collection of single instruments, which is mostly suitable for synthesizing intstruments and not especially for generating songs/music.  The Nottingham Database, an ABC formatted data base. The most suitable solution comes in the ABC formatted database, there is more to find and I'm currently tracking down more data. However, there are some caveits along the way. I have found papers using all of these data sets, therefore it is very likely that th

3.1 - Theoretical background

Image
I'm currently working hard on finishing my theoretical part while building my model. Therefore I haven't been updating the blog for a while. The deadline is not too far away, and it is key that I fully focus in these last days. Looking at all the work that goes into reading up on the state of the art, while already implementing models that might turn out to be a bad bet in the end.  My next update will hold the main part of my theoretical background, while I'll start giving details of my implementation as well, as this is currently a private github repo, I haven't decided yet on open sourcing the software that comes forth from the project, but that depends fully on the final product. Lens model as adapted from (Aljanaki, 2017)  During my digging sessions I found a composer/performer and listener model that cought my interest, without the inner math, this is quite the nice model for conveying emotion through certain aspects of music to a listener. One could thin

3.0 - First version introduction

Click on the title to see a preview pdf of the introduction

2.1 - Integrating knowledge into a model

I have looked into several directions, the CNN & GAN combination is still most interesting to me. I want to see whether, referring to the MidiNet paper, I'm able to find certain restrictions on the generation model to enhance performance of what humans think of as music they would want to hear. The so called Rencon experience, is an event where researchers gather to evaluate their results regarding music generation algorithms. This paper describes the different Rencon events, where mainly classical music is generated and evaluated. Their findings to not provide a golden standard to evaluate generative models as I hoped, so I'll have to look in another direction to find a solution for that. After looking on the internet I found another paper depicting a turing test for generative music algorithms, also called Rencon. It might prove useful, but for now I'll let it rest. Another approach was the human hearing model. I haven't really found any mathematics that describ

2.0 - Literature draft

I've been reading about a lot of different things and it is time to focus. Right now there are multiple ways I can solve the music generator problem. One is through Recurrent Neural Networks (RNN's) there is Convolutional Neural Networks (CNN's) and there is a combined approach in which I use a CNN combined with a Generative Adversarial Network (GAN). This amounts to a state of the art approach. RNN's are easier to train and capable of generating good midi files (when listeners proof it) while maintaining structural cohesion in an Encoder Decoder model (EDM). This is a method gives the hidden node states of the Encoder model to the Decoder model to obtain a certain level of music generation that is high level enough to be interesting for my project. However, as it is proven that it is possible to use 2D data as well in RNN's the only reason to choose a CNN & GAN combination is because it has been done in only one paper I found before. While RNN's have been w