Skip to main content Skip to search

YU News

YU News

Student’s Musical DNA Project Cracks the Code Behind Music We Can’t Stop Playing

Benji Morris' project examined everything from chord progressions and background vocals to syllable complexity and repeated lyrical phrases. He even added contextual features, such as whether a song was performed on tour or played recently as a surprise song during a concert.

By Dave DeFusco

For most listeners, music streaming feels simple: press play, hear a song and let an app recommend what comes next. Behind every recommendation, however, is a mountain of data trying to predict what people want to hear. Benji Morris, a student in the Katz School’s M.S. in Data Analytics and Visualization, believes today’s music platforms are only scratching the surface.

His project, “Musical DNA,” aims to build a much deeper understanding of music by analyzing songs through hundreds of characteristics instead of the limited set commonly used by streaming services like Spotify.

“Spotify publishes a dataset on audio descriptors like tempo, valence and energy,” said Morris, who recently presented his work at a forum hosted by the Department of Graduate Computer Science and Engineering. “I looked at it and thought they were interesting but not specific. What do you mean when you’re looking at energy? I really wanted to break it down and get a much broader description of audio features.”

Spotify currently describes songs using a small collection of traits such as danceability, loudness and tempo. Morris found those measurements useful, but too shallow to explain why some songs become massive hits while others do not.

“The focus of the project was not necessarily a recommender system,” said Morris. “It was a prediction model. Could you predict how well a song is going to stream on Spotify?”

To answer that question, Morris built a large data pipeline that treated songs almost like scientific samples. Instead of relying on about 15 or 16 audio measurements, his system analyzed hundreds of features connected to lyrics, harmony, production style, instrumentation, structure and cultural context.

The project examined everything from chord progressions and background vocals to syllable complexity and repeated lyrical phrases. Morris even added contextual features, such as whether a song was performed on tour or played recently as a surprise song during a concert.

“There were features that I put in there that I never would have thought would meaningfully change the way the model predicted,” said Morris. “One of the last features I added was track number, whether a song was track one, track five or whatever. That made the prediction meaningfully closer.”

He also discovered that tour-related information unexpectedly mattered. “I added a tracker for whether a song had been played in the last 15 days as a surprise song,” said Morris. “I remember it improved the prediction somewhat, and I was surprised.”

To test the system, Morris used Taylor Swift as a case study because her catalog offered a rare combination of scale, diversity and streaming data. Her more than 300 songs span country, pop and folk music, while her re-recorded albums created a unique opportunity to study how fans respond to older music released again as something new.

“Some of these re-releases were outperforming like a new album would,” said Morris. “It says that when you’re an artist that big, with a career that long, fans will stream a song they’ve heard for 12 years as if it’s brand-new music.”

One of the project’s biggest findings was that lyrics mattered more than Morris expected, especially for Swift’s audience. “She’s known as being a lyricist before anything else,” he said, “and those were the types of features I commonly would see in the top five.”

Morris describes the project as treating songs “like a genome instead of a flat row of numbers.” In simple terms, that means recognizing music as the result of hundreds of creative decisions shaped by time, culture and audience behavior.

“A hit from the 1960s wouldn’t necessarily resonate the same way today,” said Morris. “Contextually, in the current cultural climate, that becomes such a key point.”

Beyond predicting streams, Morris believes systems like Musical DNA could eventually help artists make creative decisions. His model grouped songs into “archetypes,” such as “The Opener,” “Emotional Core” and “The Pop Hit,” based on shared musical traits.

“That’s where I felt like the project becomes much more meaningful,” said Morris. “Once you can predict how well a song is going to stream, you can start making creative decisions around that.”

In the future, Morris envisions streaming platforms evolving from passive music libraries into active creative advisors capable of helping artists decide which songs should become singles, where collaborations fit best or how albums should be sequenced. He also believes deeper musical analysis could transform how listeners understand their own tastes.

“Spotify Wrapped right now tells you your top songs and artists,” said Morris, “but something like this could tell you that you actually prefer guitar as a primary instrument or songs within a certain tempo range. Once you add more descriptive features, you can give people a much deeper understanding of what they really respond to in music.”