According to Cartesia, its AI is effective enough to function almost anyplace

AI Development Faces Mounting Costs

The expenses associated with creating and maintaining AI systems are skyrocketing. OpenAI’s operational costs could reach $7 billion this year, while Anthropic’s CEO anticipates models with development costs exceeding $10 billion in the near future.

This has sparked a race to reduce AI costs.

Some researchers are refining existing model architectures to optimize performance, while others are innovating entirely new architectures designed to scale more affordably.

A New Approach: State Space Models

Karan Goel, co-founder of the startup Cartesia, is among those exploring new territory. His focus is on state space models (SSMs), a highly efficient architecture capable of processing large volumes of data — including text and images — simultaneously.


AI Development


“New model architectures are essential for building truly effective AI systems,” Goel shared with TechCrunch. “In such a competitive industry, creating superior models is critical to success.”

Academic Foundations

Before launching Cartesia, Goel pursued a PhD at Stanford under computer scientist Christopher Ré. During this time, he collaborated with fellow PhD candidate Albert Gu to conceptualize what would later become SSMs.

Goel held part-time roles at Snorkel AI and Salesforce, while Gu became an assistant professor at Carnegie Mellon. Together, they continued researching SSMs, publishing key papers on the topic.

In 2023, Goel and Gu teamed up with Stanford colleagues Arjun Desai and Brandon Yang to establish Cartesia, aiming to commercialize their work. Christopher Ré also joined as a co-founder.

Cartesia builds on Mamba, one of the most prominent SSM derivatives, initially launched as an open research project by Gu and Princeton professor Tri Dao. Cartesia extends Mamba’s capabilities while developing its own SSMs, which offer improved speed and efficiency by enabling AI models to process data more intelligently.

Transformers vs. SSMs

Most modern AI applications, like ChatGPT, rely on transformer-based architectures, which use a "hidden state" to retain data context. However, transformers are resource-intensive, requiring significant computational power to retrieve or process stored information.

SSMs, on the other hand, summarize past data points into a condensed "state," updating as new information is received and discarding most prior data. This approach allows SSMs to handle vast datasets while outperforming transformers in specific tasks, presenting a cost-effective alternative for AI development.

Ethical Challenges

Cartesia positions itself as a community-focused research lab, collaborating with partners to develop innovative SSMs. One of its projects, Sonic, is a state-of-the-art voice cloning tool that can replicate voices or create entirely new ones, while adjusting tone and cadence.

Goel claims Sonic is the fastest model of its kind, excelling in handling long-context data like audio. However, the tool raises ethical concerns. Cartesia trained some of its models on datasets such as The Pile, which includes unlicensed copyrighted materials. This has led to legal disputes in the AI industry.

Additionally, Sonic's voice cloning feature requires minimal user verification, allowing for potential misuse. Goel acknowledged these issues, stating that Cartesia is enhancing its moderation processes with voice verification and watermarking systems.

Business Momentum

Cartesia’s Sonic API is gaining traction, with “thousands” of paying customers, including automated calling service Goodcall. The company’s API is free for limited use, with premium plans starting at $299 per month. While the platform uses customer data to improve its models, privacy-conscious users can opt out.

Despite these challenges, Cartesia’s technical advantages are attracting clients. For instance, Goodcall chose Sonic due to its unmatched low latency. Sonic is currently being used for gaming, voice dubbing, and more, but Goel envisions even broader applications.

The Road Ahead

Cartesia aims to create multimodal AI models capable of processing text, images, and videos in real time across any device. The company recently launched Sonic On-Device, optimized for mobile platforms, and Edge, a software library for hardware-specific SSM optimization.

Cartesia is also facing competition from startups like Zephyra and AI21 Labs, which are exploring hybrid architectures. However, Goel believes Cartesia’s 26-person team is well-positioned for success.

Recently, the company secured $22 million in funding led by Index Ventures, bringing its total raised to $27 million. Shardul Shah of Index Ventures praised Cartesia for pushing the boundaries of AI with faster, more scalable models, positioning the company to drive the next wave of AI innovation.

Post a Comment

Previous Post Next Post

ad4

ad3