This post is a replica of my OpenAI Scholar final project proposal, also available here.
UPDATE 8/31/18: “deephypebot: an overview” is a revamped, more comprehensive version of this post. Check it out!
tl;dr- auto-generating conditioned music commentary on Twitter.
The theme of my summer as an OpenAI Scholar has been explorations around music + text. I find the language around music - manifested by hundreds of “nice, small blogs” on the internet - to be a deep and unique well of creative writing.
As such, my final project will pay homage to these labors of love on the web and attempt to auto-generate consistently good and entertaining new writing about songs, based on a set of characteristics about the song and knowledge of past human music writing.
The project will culminate in a Twitter bot (@deephypebot) that will monitor other music feeds for songs and automatically generate thoughts/opinions/writing about the songs.
My training data consists of ~20,000 blog posts with writing about individual songs. The count started at about 80K post links from 5 years of popular songs on the music blog aggregator Hype Machine - then I filtered for English, non-aggregated (i.e., excluding “round up”-style posts about multiple songs) posts about songs that can be found on Spotify. There was some additional attrition due to many post links no longer existing. I did some additional manual cleanup of symbols, markdown, and writing that I deemed non-commentary.
From there, I split the commentary into sentences, which are a good length for a variational autoencoder (VAE) model to encode.
A language model (LM) is an approach to generating text by estimating the probability distribution over sequences of linguistic units (characters, words, sentences). This project centers around a sequence-to-sequence conditional variational autoencoder (seq2seq CVAE) model that generates text conditioned on a thought vector z
+ attributes of the referenced music v
(simply concatenated together as cat(z, v)
). The conditional fed into the CVAE is provided by an additional latent constraints generative adversarial network (LC-GAN) model that helps control aspects of the text generated.
The CVAE consists of an LSTM-based encoder and decoder, and once trained, the decoder can be used independently as a language model conditioned on latent space cat(z, v)
(more on seq2seq VAEs here). The conditional input is fed into the decoder only.
The LC-GAN is used to determine which conditional inputs cat(z, v)
to this LM tend to generate samples with particular attributes (more on the LC-GAN here). This project uses LDA topic modeling as its automatic reward function for encouraging samples of a descriptive, almost flowery style (more on LDA topic modeling here). The generator is trained to fool the discriminator with “fake” (e.g., not from training data) samples, ostensibly from the desired topic set. Once trained, the generator can be used independently to provide conditional inputs to the CVAE for inference.
Once the neural network is trained and deployed, this project will use it to generate new writing conditioned on either audio features or genre information pulled from the Spotify API (depending on which conditioning seems to work better).
This will require detecting the song and artist discussed in tweets that show up on @deephypebot’s timeline and then sending this information to Spotify. Then Spotify’s response will be sent to the neural network.
Text generation is a notoriously messy affair where “you will not get quality generated text 100% of the time, even with a heavily-trained neural network.” While much effort will be put into having as automated and clean a pipeline as possible, some human supervision is prudent.
Once generations for a new proposed tweet are available, an email will be sent to the human curator (me), who will select and lightly edit for grammar and such before releasing to @deephypebot for tweeting.
Reading…
Software…
August 3: Project dev spec + preliminary tasks; LC-GAN experiments towards better/controllable samples
August 10: Twitter bot + production pipeline ready
August 17: More sophisticated modeling
August 24: End-to-end integrations
August 31: Project due
Mentor: Natasha Jaques