Posts

3.1 - Theoretical background

Image
I'm currently working hard on finishing my theoretical part while building my model. Therefore I haven't been updating the blog for a while. The deadline is not too far away, and it is key that I fully focus in these last days. Looking at all the work that goes into reading up on the state of the art, while already implementing models that might turn out to be a bad bet in the end.  My next update will hold the main part of my theoretical background, while I'll start giving details of my implementation as well, as this is currently a private github repo, I haven't decided yet on open sourcing the software that comes forth from the project, but that depends fully on the final product. Lens model as adapted from (Aljanaki, 2017)  During my digging sessions I found a composer/performer and listener model that cought my interest, without the inner math, this is quite the nice model for conveying emotion through certain aspects of music to a listener. One could thin...

3.0 - First version introduction

Click on the title to see a preview pdf of the introduction

2.1 - Integrating knowledge into a model

I have looked into several directions, the CNN & GAN combination is still most interesting to me. I want to see whether, referring to the MidiNet paper, I'm able to find certain restrictions on the generation model to enhance performance of what humans think of as music they would want to hear. The so called Rencon experience, is an event where researchers gather to evaluate their results regarding music generation algorithms. This paper describes the different Rencon events, where mainly classical music is generated and evaluated. Their findings to not provide a golden standard to evaluate generative models as I hoped, so I'll have to look in another direction to find a solution for that. After looking on the internet I found another paper depicting a turing test for generative music algorithms, also called Rencon. It might prove useful, but for now I'll let it rest. Another approach was the human hearing model. I haven't really found any mathematics that describ...

2.0 - Literature draft

I've been reading about a lot of different things and it is time to focus. Right now there are multiple ways I can solve the music generator problem. One is through Recurrent Neural Networks (RNN's) there is Convolutional Neural Networks (CNN's) and there is a combined approach in which I use a CNN combined with a Generative Adversarial Network (GAN). This amounts to a state of the art approach. RNN's are easier to train and capable of generating good midi files (when listeners proof it) while maintaining structural cohesion in an Encoder Decoder model (EDM). This is a method gives the hidden node states of the Encoder model to the Decoder model to obtain a certain level of music generation that is high level enough to be interesting for my project. However, as it is proven that it is possible to use 2D data as well in RNN's the only reason to choose a CNN & GAN combination is because it has been done in only one paper I found before. While RNN's have been w...

1.1 - Oversight of current sources

This first part of my sources is here to remind me of what I'm doing and are trying to do in comparison to other projects. It contains some interesting works on which I will be grounding my further development of my own neuralmusic. Or actually just computer generated music. [ 1 ] Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. Midinet: A convolutional generative adversarial network for symbolic-domain music generation using 1d and 2d conditions. arXiv preprint arXiv:1703.10847 , 2017. [  bib  ] [ 2 ] Gabriel Sargent, Frédéric Bimbot, and Emmanuel Vincent. Estimating the structural segmentation of popular music pieces under regularity constraints. IEEE/ACM Transactions on Audio, Speech, and Language Processing , 25(2):344--358, 2017. [  bib  ] [ 3 ] Pieter Adriaans, Henning Fernau, and Menno van Zaanen. Grammatical Inference: Algorithms and Applications: 6th International Colloquium: ICGI 2002, Amsterdam, The Netherlands, September 23-25, ...

1.0 - Further digging

Image
I dug further into the MidiNet paper and found an interesting image depicting the working of music generated based on a reversed convolition neural network with Noise as input. Illustration of the generator model’s CNN and the use of 1D and 2D conditions of MidiNet. Besides that I dug into getting CNN's to work inside of TensorFlow (TF), I found a wonderful tutorial by Martin Gorner on how to get CNN's working. I will come back on my findings making a neural network in TF. The approximate time to go through it is 150 minutes, but with some experience I'm expecting to be done a bit faster. Python also provides a midi interpreter, but not by default, the one I found  can do some interesting stuff, writing, reading and modifying midi files. It is only available for Python 2.x but another I found is available for Python 3.x I will report back on all my findings once I'm closer to finding all the tools and prerequisites I need.

0.1 - Literature

My first thought with the whole CNN structure is that it doesn't necessarily represent how the ears take in information, but they are very good at mimicking a visual retina. Therefore after looking into different directions as to what optimisations might be possible for the CNN's to further help them to make sense of music, found a recent paper about MIDInet . Which states that to guide the process of music generation by a GAN coupled CNN has to have a time constraint in which it generates the following note based on the note(s) that came before as to make sure there is more structure. I am still looking into the idea of how to replicate the inner workings of the human ear to make sure that better suits the whole interpretative part of the music generation process.