Two undergraduate students have created an advanced open-access AI speech model aimed at competing directly with Google’s NotebookLM. The pair, despite lacking deep prior expertise in artificial intelligence, developed the model named Dia, which allows users to generate realistic podcast-style audio clips.
The emerging marketplace for synthetic speech technologies is growing rapidly, with significant players already established, such as ElevenLabs, and numerous new startups competing. Last year alone, voice AI startups attracted approximately $398 million in venture capital investment, underscoring the magnitude of interest and potential in this space.
Co-founder Toby Kim, part of the Korea-based team behind Nari Labs, explains that he and his partner deliberately set out just three months ago to learn about speech-generation technology. Inspired by NotebookLM, their goal was to deliver an accessible solution offering more creative control and scripting flexibility.
Leveraging Google’s TPU Research Cloud—a resource providing researchers free access to powerful TPU AI computing hardware—the team trained Dia on approximately 1.6 billion parameters. Parameters in AI models typically correlate closely to predictive and generative accuracy and complexity; higher numbers usually indicate more sophisticated capabilities.
Dia, hosted on AI platforms Hugging Face and GitHub, is versatile enough to operate on most modern PCs equipped with at least 10GB of VRAM. Users can input custom scripts, tailor voice tones, and even integrate non-verbal cues like laughter, coughing, or conversational pauses. If not provided with specific voice descriptions, Dia generates voices randomly but can seamlessly clone specific voices if desired.
Brief tests of Dia’s online demo confirmed smooth, high-quality voice rendering capabilities, along with user-friendly voice cloning. However, similar to other generative voice technologies currently available, Dia provides minimal safeguards against misuse. The tool raises straightforward concerns about its potential applicability for misinformation or deceptive activities. While the project’s documentation explicitly discourages unethical use—such as impersonation or fraudulent intent—it simultaneously places responsibility for misuse firmly on users rather than the developers.
Furthermore, Nari Labs has not publicly revealed the exact datasets used to train Dia, fueling speculation and concerns over potential copyright violations. For instance, some user feedback has pointed out how certain voice outputs strongly resemble hosts from NPR’s “Planet Money” podcast, highlighting broader issues within the generative AI industry related to the legal implications of training data usage.
Looking forward, the Nari Labs team plans upgrades for Dia, including multilingual capabilities, larger and more powerful model versions, and new platform functionalities designed to encourage user interaction and collaboration. A technical report detailing Dia’s mechanisms and specifications will also soon be released, the team said.