2.0 - Literature draft

I've been reading about a lot of different things and it is time to focus. Right now there are multiple ways I can solve the music generator problem. One is through Recurrent Neural Networks (RNN's) there is Convolutional Neural Networks (CNN's) and there is a combined approach in which I use a CNN combined with a Generative Adversarial Network (GAN). This amounts to a state of the art approach. RNN's are easier to train and capable of generating good midi files (when listeners proof it) while maintaining structural cohesion in an Encoder Decoder model (EDM). This is a method gives the hidden node states of the Encoder model to the Decoder model to obtain a certain level of music generation that is high level enough to be interesting for my project. However, as it is proven that it is possible to use 2D data as well in RNN's the only reason to choose a CNN & GAN combination is because it has been done in only one paper I found before. While RNN's have been widely adopted across multiple instances of Machine Learning (ML) where music was generated. I added some more literature to the growing list of instances. For now I will include every paper found, but this week I will cut away unnecessary papers and add other useful papers.


[1] Li-Chia Yang, Szu-Yu Chou, and Yi-Hsuan Yang. Midinet: A convolutional generative adversarial network for symbolic-domain music generation using 1d and 2d conditions. arXiv preprint arXiv:1703.10847, 2017. [ bib ]
[2] Gabriel Sargent, Frédéric Bimbot, and Emmanuel Vincent. Estimating the structural segmentation of popular music pieces under regularity constraints. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(2):344--358, 2017. [ bib ]
[3] Pieter Adriaans, Henning Fernau, and Menno van Zaanen. Grammatical Inference: Algorithms and Applications: 6th International Colloquium: ICGI 2002, Amsterdam, The Netherlands, September 23-25, 2002. Proceedings, volume 2484. Springer Science & Business Media, 2002. [ bib ]
[4] Ian Goodfellow. Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160, 2016. [ bib ]
[5] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, page 201611835, 2017. [ bib ]
[6] Nima Mesgarani, Connie Cheung, Keith Johnson, and Edward F Chang. Phonetic feature encoding in human superior temporal gyrus. Science, 343(6174):1006--1010, 2014. [ bib ]
[7] Dana L Strait and Nina Kraus. Biological impact of auditory expertise across the life span: musicians as a model of auditory learning. Hearing research, 308:109--121, 2014. [ bib ]
[8] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. [ bib ]
[9] Nipun Agarwala, Yuki Inoue, and Axel Sly. Music composition using recurrent neural networks. [ bib ]
[10] Matthias Weing, Amrei Röhlig, Katja Rogers, Jan Gugenheimer, Florian Schaub, Bastian Könings, Enrico Rukzio, and Michael Weber. P.i.a.n.o.: Enhancing instrument learning via interactive projected augmentation. In Proceedings of the 2013 ACM Conference on Pervasive and Ubiquitous Computing Adjunct Publication, UbiComp '13 Adjunct, pages 75--78, New York, NY, USA, 2013. ACM. [ bib | DOI | http ]
Keywords: augmented reality, instrument learning, piano
[11] Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, and Mohammad Norouzi. Neural audio synthesis of musical notes with wavenet autoencoders, 2017. [ bib | arXiv ]
[12] Sergio Canazza, Giovanni De Poli, and Antonio Roda. How do people assess computer generated expressive music performances? Director, 5:0, 2013. [ bib ]
[13] Ender Özcan and Türker Erçal. A genetic algorithm for generating improvised music. In International Conference on Artificial Evolution (Evolution Artificielle), pages 266--277. Springer, 2007. [ bib ]
[14] Arne Eigenfeldt, Adam Burnett, and Philippe Pasquier. Evaluating musical metacreation in a live performance context. In Proceedings of the Third International Conference on Computational Creativity, pages 140--144. Citeseer, 2012. [ bib ]
[15] Haruhiro Katayose, Mitsuyo Hashida, Giovanni De Poli, and Keiji Hirata. On evaluating systems for generating expressive music performance: the rencon experience. Journal of New Music Research, 41(4):299--310, 2012. [ bib ]
[16] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In Acoustics, speech and signal processing (icassp), 2013 ieee international conference on, pages 6645--6649. IEEE, 2013. [ bib ]

This file was generated by bibtex2html 1.98.

Comments

Popular posts from this blog

4.0 - More data, more data!

4.1 - Redefining overall structure