Saturday, July 13, 2024
spot_img
HomeBusinessMusic’s complicated tryst with generative AI tools, is only just beginning

Music’s complicated tryst with generative AI tools, is only just beginning

Music’s complicated tryst with generative AI tools, is only just beginning


‘A trance track, ideal for a Scandinavia night, looking at the northern lights’. That’s the prompt we punched into Stability AI’s new Stable Audio 2.0 generative artificial intelligence (AI) tool. The resulting 3 minute track generation sounded so on point, you likely wouldn’t have this as an AI generation, had it showed up as a recommendation from your regularly followed musicians, on Apple Music or Spotify.

Not only does Stable Audio 2.0 work with a single text input you can optionally point it to a 3 more audio track for inspiration. (Official photo)

Not only does Stable Audio 2.0 work with a single text input (the more detailed you are with it, the more detailing width in generated audio), but you can optionally point it to a 3 more audio track for inspiration, and a bunch of add-ons to dictate the tempo and structure of the track being generated. “The architecture of the Stable Audio 2.0 latent diffusion model is specifically designed to enable the generation of full tracks with coherent structures,” the company says, in a statement.

Hindustan Times – your fastest source for breaking news! Read now.

Though we are watching a fast moving text to music generation ecosystem, the first true AI song generation happened at the Sony Computer Science Laboratory in Japan, in 2016. The song, ‘Daddy’s Car’, sounded reminiscent to something The Beatles may have created. Except it wasn’t. That was the wizardry of an AI system called FlowMachines, which analysed a database of songs to pick a musical style.

Tech companies, now more than ever, are building tools that learn quickly too. You can experiment with a few taps on the keyboard, without any prior knowledge of music creation or editing. Often, generative AI is free to use, helping adoption.

Tech giant Adobe’s latest generative AI experiment Project Music GenAI Control is being built quickly, for eventual addition into the company’s Premiere Pro and Audition tools later this year. There’s academia, including University of California and Carnegie Mellon University, helping Adobe develop this generative AI tool. It is currently in prototype stage, and some functionality will include allowing users to generate music using text prompts, adjust tempo, structure, audio intensity or re-mix a section.

“One of the exciting things about these new tools is that they aren’t just generating audio—they’re taking it to the level of Photoshop by giving creatives the same kind of deep control to shape, tweak, and edit their audio. It’s a kind of pixel-level control for music,” says Nicholas Bryan, Senior Research Scientist at Adobe Research.

Models and music

Crucial to how well music generations from these tools are, are the music libraries that act as datasets to train these models. Stability AI is relying on the AudioSparx music library, of which around 800,000 audio files were used. “All of AudioSparx’s artists were given the option to ‘opt out’ of the Stable Audio model training,” the company confirms. This is a sensible move by Stability AI, considering how in January, OpenAI and Microsoft were sued by authors Nicholas Basbanes and Nicholas Gage for using their work to train artificial intelligence (AI) models without their permission.

There is a new autoencoder and a diffusion transformer similar to the text to image generator Stable Diffusion 3 working together for what Stability AI calls “large scale structures”, which in other words, means very detailed musical compositions. and Even though the free version is so capable, there is also a Pro plan ($11.99 per month, or around 1,000) that gives you 500 monthly track generations (the free plan is limited to 20 generations a month.

Meta’s open source AudioCraft generative AI relies on two distinct data sets. “MusicGen, which was trained with Meta-owned and specifically licensed music, generates music from text-based user inputs, while AudioGen, which was trained on public sound effects,” says Meta. They used more than 20,000 hours to train MusicGen, with what they specify is a combination of internal dataset (that’s around 10,000 high-quality music tracks), Shutterstock (25,000 tracks) and Pond5 music data collection (that’s about 365,000 tracks).

In November, The Beatles released what is presumed to be their last “new” song. Titled ‘Now and Then’, AI came to the rescue when Paul McCartney and Ringo Starr re-attempted to put together a track from an old lo-fi demo recording John Lennon had done on tape, many years ago. The band first tried in the 1990s, but technology didn’t exist that could extract Lennon’s vocals and distinguish them from other sounds, such as the piano. The song “just kind of languished”, as McCartney described, in an official documentary about the track.

AI is also helping music producers with the more ordinary tasks in the workflow, such as equalising vocal pitch. One such example is a software called Landr, which can master a release-ready track using algorithms to analyse and enhance quality, as well as match loudness levels for streaming platforms.

Everyone’s an artist?

“Founders rise, with dreams that pierce the night. Challenging the dark, with visions burning bright,” a glimpse of the lyrics of a song and video created by the Mini Yohei custom GPT tool, made by venture capitalist Yohei Nakajima. Based on what this generative artificial intelligence tool had learnt, it generated a 1 minute and 34 second track with just the input “anthem”. Nakajima built Mini Yohei late last year, requiring OpenAI’s ChatGPT Plus subscription. With the right commands, can everyone become an artist?

As with most things AI, there are two sides to this coin. Some popular musicians are perplexed with the use of cloned voices. At the same time, there is utility of AI to complete tracks producers may be scrambling to process.

“Shit of a song”, is how Puerto Rican artist Bad Bunny reacted on his WhatsApp channel to an AI generated song featuring his recreated voice, alongside Justin Bieber and Daddy Yankee. AI advancements have been so significant, it is now difficult to distinguish between an artist’s real voice, or one that’s created by an AI tool.

Generated voices of artists Drake and The Weeknd, were the basis of a viral track called ‘Heart On My Sleeve’ with lyrics referencing pop star and actress Selena Gomez, created by a TikTok account @Ghostrider977. Google owned video sharing platform YouTube’s takedown attempts didn’t achieve much success (versions of this track are still easily available), though it seems legal progress ensured this track was removed from music streaming platforms.

“We’re working closely with our music partners, including Universal Music Group, to develop an AI framework to help us work toward our common goals,” says YouTube in a statement. This comes as the tech giant shifts focus to three fundamental AI principles, including responsibly embracing AI with music partners, include appropriate protections and content policies scaled to meet AI’s challenges.

Another not-so-real track overlayed Frank Sinatra’s voice over profanity lyrics of artist Lil Jon. Drake had enough, with a post on Instagram late last year that simply read “This is the final straw AI”. It was after someone posted an AI generated voice of Drake rapping the track of another artist, Ice Spice.

These examples are the result of a change in landscape, which Fabio Morreale, Senior Lecturer and Coordinator of Music Technology and Director of Research at the School of Music of the University of Auckland illustrates in his research ‘Where Does the Buck Stop? Ethical and Political Issues with AI in Music Creation’. He writes, “AI applications for music creation have been available since the last century but, until recently, their adoption has been limited to a small niche of researchers and engineers and their ontology limited to experimentation in computational creativity.”

Late last year, Google Deepmind’s experimental Lyria AI music generation model was unveiled to the world. A specific tool, called Dream Track, allows users to input keywords and generate a 30 second track in the styles of participating artists including Alec Benjamin, Charlie Puth, Charli XCX, Demi Lovato, John Legend and Sia. Google hopes it’ll help generate content for YouTube Shorts platform.

Tech for all, but how do we write the next chapter?

There is of course the element of democratisation of music by technology available to everyone with a smartphone or a computer, and thereby questions about ethics. AI’s dramatic advancement (tools that can automatically organize your music library, now a minor aspect) has created music generators that non-musicians can use to create songs mimicking other artists, with some input commands and a few clicks.

Also Read: Is Android USB-C cable, charger completely compatible with iPhone too?

“Artificial intelligence is both embodied and material, made from natural resources, fuel, human labour, infrastructures, logistics, histories, and classifications. AI systems are not autonomous, rational, or able to discern anything without extensive, computationally intensive training with large datasets or predefined rules and rewards,” writes author Kate Crawford in her thought-provoking book, The Atlas of AI.

This is something Edward Newton-Rex alluded to in a public letter as he resigned from his post as vice-president of audio at Stability AI, last November. “Companies worth billions of dollars are, without permission, training generative AI models on creators’ works, which are then being used to create new content that in many cases can compete with the original works,” he wrote, in a note, that was extensively shared on social media.

For music and creative arts, approach to ethics will be determined by how the creators of AI tools understand and determine self-regulation. The likes of Adobe and Google Deepmind are taking steps to watermark AI music creation too. But not all platforms do. “Reclaiming a broad and foundational understanding of ethics in the AI domain, with radical implications for the re-ordering of social power, will be an important task of the arts and humanities,” warns John Tasioulas, who is is Professor of Ethics and Legal Philosophy at the Faculty of Philosophy, University of Oxford, and Director of the Institute for Ethics in AI.

Amidst AI and music’s complicated tryst, the need is to write a new chapter. Things must evolve beyond mimicking voices of singers to generate artificial music.

Perhaps AI company Bronze has the answer. They’ve worked with artist Jai Paul on a track called Jasmine, for which Bronze AI engine “performs a unique and infinite playback of the piece”, on each listen. Artist Arca’s piece Riquiqui becomes a “dynamic, ever-transforming representation”. It seems, evolving sound, is the real answer.



Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments