Preparing for tomorrow’s agentic workforce

To effectively compete, companies must take a hard look at what they can do to support an AI infrastructure. On this episode of the At the Edge podcast, SambaNova Systems cofounder and CEO Rodrigo Liang joins host and McKinsey Senior Partner Lareina Yee to discuss agentic AI, the S-curve of AI value, and why businesses must adopt a hybrid AI model.

The following transcript has been edited for clarity and length.

Rethinking AI infrastructure

Lareina Yee: SambaNova is an ambitious, completely exciting AI company addressing an enormous market. Can you tell us what you originally saw in the marketplace that inspired you to start SambaNova?

Rodrigo Liang: I have two amazing cofounders, Stanford Professors Kunle Olukotun and Christopher Ré, and the three of us got together and really started thinking about this worldwide transformation we’re going through. If you think about this AI-first, AI-centric world we’re building, it’s ultimately driving a scale of transformation we’ve only seen a few times over the last two or three decades.

So the genesis of SambaNova came from this brainstorming process to see if the computing infrastructure we’re running on was really the most efficient. And the conclusion, based on Stanford research, is that there are significantly better ways to enable AI. That’s when we decided to embark on this journey seven and a half years ago.

Lareina Yee: Seven years ago, there were some of us, myself included, who were super interested in data centers. Today, it’s become a hot topic, and everyone’s talking about infrastructure.

McKinsey calculates a roughly $5 trillion investment needed over the next five years to build all the data center infrastructure, including buildings, software, cooling systems, and energy plants, to power AI’s voracious appetite. How do you think about the dynamics of the cost, the innovation, and the moment we’re in?

Rodrigo Liang: There are three things I think are incredibly important for us to think about as we’re building out the scale.

In the last three years, we’ve already seen an incredible build-out of GPUs [graphics processing units], AI infrastructure, and teraflops [floating-point operations per second]. Most of this build-out has been for pretraining large models and is really dominated by the largest players in the world. But as you move forward, you’re seeing a world that wants to do inference, do test-time computing, and all these different things requiring the models we’ve trained.

But as we scale up, we’re now seeing other constraints start to appear, like a lack of sufficient power for these data centers. So people are talking about nuclear power plants and other sources of energy. But then you have to figure out how to get the cooling done as well.

And as you think about energy, you’ll also need to figure out how to update your entire grid to power those gigawatt data centers. And eventually, you’ve got to get all of that back-connected to where the users are, which is mainly in these large metropolitan areas—which is not where you’re going to put your gigawatt data center.

So, there are a lot of infrastructure challenges we have to figure out, and at SambaNova, we’re very focused on making it all easier. We’re dedicated to figuring out how to deliver the capacity you need at a fraction of the cost and a fraction of the power.

We all need to contribute to the solution because the answer can’t be, “Just build more power plants and build more data centers.” It’s too hard. You will need those, but the core tech also needs to be significantly more efficient.

Bookending the technology stack

Lareina Yee: Tell us about some of that magic sauce around efficiency that SambaNova is working on. If I’m an average layperson thinking about this, how do I understand the important role you play in this ecosystem?

Rodrigo Liang: Think of SambaNova as bookends on the technology stack. On the one hand, we build chips, and on the other, we create API services that allow the best open-source models to be accessed without having to actually invest in all of this custom-model work.

With SambaNova, you can go to cloud.sambanova.ai and use all the best open-source models, with all the benefits and full accuracy, with world-record speeds at a very efficient cost. Because as soon as you actually deploy AI, the cost of infrastructure acquisition, power, networking, and all the things that are required starts adding up.

And if you’re going to go from this world of training to what I think is going to be a tenfold increase in investment for inferencing, you have to be more efficient. You have to make the cost come down. Otherwise, it won’t scale.

Planning for a hybrid model

Lareina Yee: Let’s just fast-forward and assume businesses will figure out how to scale AI. So, if I’m a business leader, how do I plan?

Rodrigo Liang: The companies that win will use AI to provide better services in the market, engaging with customers faster and better and making customization easier. They will also change their operations so AI can give them a significantly better time to market and a significantly better customer experience.

So, your AI solution is going to be a hybrid model. Just like you have cloud and on-premises, you’re going to have large language models [LLMs] and custom LLMs. You’re also going to have text, vision, language, and voice models.

When you run a company, you have your own custom methods to accomplish your various operational needs. But to fully embrace hybrid, the data will anchor where your AI models run, whether it’s in cloud A, cloud B, or on-premises.

That’s how we think about how you should deploy infrastructure. Let the data reign and drive the solution you need because you’re going to be hybrid anyway.

The beauty of small language models

Lareina Yee: I’ve had conversations with large businesses that are enthusiastic about huge LLMs but say the secret sauce is in the user experience [UX] with small models that only tap into their internal data. They may not always need the internet at their fingertips, but there are other times when they actually do. Another thing on people’s minds is agentic AI, and you have some really interesting ways to make that happen for businesses. Tell us about that.

Rodrigo Liang: Whether the customer wants to use a very large, trillion-parameter model on the cloud, or they want to bring it on-premises because they have their own private data, they’re not restricted to small models on-premises.

In fact, we’re actually quite proud of the fact that we can give you a 400- or 600-billion-parameter model on-premises, trained and fine-tuned on all your data. That gives you your own large-class, air-gapped LLM, which you can access completely privately and securely. It’s valuable because you never know what you’re going to ask next.

Because there is a challenge associated with smaller models. If you ask them questions they were tuned for—say, legal, finance, or HR—they are very accurate. But the smaller the model, the more brittle it is. As soon as that prompt drifts a little bit, it breaks. That said, I think small language models are fantastic for agentic AI.

I actually think the agentic workflow will transition into production in the enterprise faster than LLMs because in a production environment, most businesses have to verify and certify the output. When the models are smaller, it’s significantly easier for me to say, “Input A to output A didn’t do what I expected.” So it’s a much easier thing for me to prove.

What I do then is connect dozens of these little agents to create a workflow. And some of them do that really well. Why? Because of two factors. In the world of agentic, the key parameter for people to remember is time to first token [TTFT]. What that means is, when I actually prompt a model, how quickly does it take for that agent to respond to me?

You want to be in a position like some of our Llama 8B agents, where the time to first token is 0.03 seconds. And when you connect 20 of those agents sequentially, it’s 0.6 seconds, which feels like it’s real time. That’s the beauty of using these agents in these very specialized environments because they allow you to create a very sophisticated workflow that feels like a real-time experience to the end user.

Maintaining workflow security

Lareina Yee: Those are two huge factors. One is the speed, and the other piece is the cost. There is also kind of a third dimension I’d love you to elaborate on a bit. If I’m a business leader, I care about security, which will enable me to stay ahead in a world of evolving regulatory and consumer expectations.

Rodrigo Liang: That’s right. Let’s think of a bank, and as you’re building these agents, each one is going to have secure access to it. Even though it may be shared within an agentic workflow, it needs to be reconciled with the user writing that prompt. Sometimes, that’s easy to do on a generic workflow because it’s just passing public information.

But as soon as you’re going into specific client information or regulated information, that workflow needs to reconcile the fact that the user you’ve been chatting with in an agentic workflow may or may not have access to that information.

You have to have a way for the users to say, “I’m serving this agentic workflow, but this particular person does not have access to the agent. So I need to provide a different answer without disclosing information they don’t have access to.”

The S-curve of AI value

Lareina Yee: Are there other factors you think are really important as companies think about changing workflows, not tasks but actual full workflows, into an agentic AI world?

Rodrigo Liang: We’re thinking of static things as a step one that allows us to deploy these things. Why? Because it’s easier. But also, when you’re doing these things as very thin slivers of AI, there’s not enough value being generated for the corporation to only deploy AI in that one sliver.

If I look at most enterprises, it’s an S-curve when it comes to the value of AI. We’re in the pre-scaling period of value, doing the simple, low-risk things. Which is why as an industry, people ask, “Where’s the ROI for AI? It’s not in the chatbot, because we don’t save enough money.”

Well, that may be true. But, again, let’s return to the banking example. Every time there’s a money laundering event at a bank, you have to write a 400-page report once the investigation’s done. How many people are involved in actually generating that document, making sure it’s correct, and submitting it to the regulatory agency? Think about that.

I think, as an industry, we just have to cross this S-curve, where we have enough infrastructure that convinces us that models are behaving correctly and the data and the outputs are being securely managed.

This is not something people love to do, making sure everything is documented correctly. But what if that report could be generated correctly with a bot in five seconds? Now you’re starting to drive real value, because all of these things that you have to do as part of the business can be done significantly cheaper and much more accurately.

I think, as an industry, we just have to cross this S-curve, where we have enough infrastructure that convinces us that models are behaving correctly and the data and the outputs are being securely managed. Then we’ll get through our change management in terms of how we actually integrate the technology into the workflows, the regulatory requirements, and the proving.

Basic versus transformational change

Lareina Yee: Let’s pause on this concept of task automation versus an actual workflow transformation. If we say it very simply, there’s basic and then there is transformational. I think a lot of people are spending time on basic, feel this frustration, watch costs going through the roof, and don’t see the business value. What are examples of things that are basic versus transformational?

Rodrigo Liang: Almost every enterprise today has a significant amount of software engineering happening in their company. I mean, there is no reason not to just lean into that.

AI can scan 100,000 SKUs of all your products and has already learned every single product spec down to the most technical ones. It does a much, much better job than any engineer could trying to crawl through 100,000 different product specs and figuring out what each one does.

Lareina Yee: I love your point. Why would you live in the analog world at all for some of these ideas? Particularly marketing, as well as sales, software, and product development. So then, by contrast, tell me what excites you in terms of that next step beyond the S-curve? What do you get excited about seeing companies do?

Rodrigo Liang: I would say the ability to take very complex things in the analog world, such as the whole process of drug discovery, and the typically seven years that it takes to parse through the initial data, and find more efficient ways to do that.

This is something that SambaNova did during COVID. We worked with the US government trying to create a surrogate AI scientist to be able to significantly increase the throughput of discovering the next drug. Because you can experiment significantly faster in the virtual world.

And then there’s the process of mapping subterranean gas deposits in the energy sector. Today, we’re using seismic signals, combined with history and gut feelings. But locating these underground deposits is still incredibly difficult. If you let AI disrupt the process, it can locate carbon deposits and tell you where to drill with much, much higher accuracy. So there’s a significant reduction in cost and environmental impact.

AI inferencing versus AI training

Lareina Yee: One thing you mentioned at the top of the interview was AI inferencing and how important it is. Tell us about AI inferencing, and how it differs from traditional model training.

Rodrigo Liang: There are two basic pieces of artificial intelligence: your training models and your inferencing models. The parallel I like to draw is that training is like creating a search algorithm. And inferencing is like the Google search we all do every day.

You’re going to see fewer and fewer people training these models because they are already pretty good. And as the open-source models continue to get better, you won’t have to invest $100 million, $200 million, or $300 million to train your own models anymore. You can actually take open-source models out there already and just customize them.

Lareina Yee: It’s pretty remarkable because two years ago, some of our AI data scientists were spending a lot of time training. And now it’s possible to spend more of your time on the inference part. So when you look a year or two ahead, what do you think will be normalized?

Rodrigo Liang: Agents. I think every single one of us will have a custom set of agents. All the business things that you would do, you’re going to have agentic workflows or agents that are customized, ready for you to pull off the shelf.

Two years from now, every single one of us will have a suite of our favorite agents that we use every day, integrated into everything that we do.

To me, it’s kind of like templates. You don’t go into PowerPoint and start designing your slide background by yourself. Basically, everybody’s going to be able to take those agents, create whatever workflows they need, and create their own customized experience.

That’s just how it’s going to be. Two years from now, every single one of us will have a suite of our favorite agents that we use every day, integrated into everything that we do. Because doing everything ourselves and not using the machine is just nuts.

So, I think that’s going to be the world we’ll live in, and everybody is going to be so dependent on those capabilities.

AI and robotics

Lareina Yee: Looking to the future, what about robots?

Rodrigo Liang: I think that the robots we picture in our heads are walking around like humans. But the first set of robots will be throughout production in our manufacturing. You already see robots in stores restocking shelves. And you’ve got robots assembling PC boards and automobiles in factories. Those are going to become pervasive.

I think you’re going to see the technology’s reached a point where it already allows you to deploy these things for business use cases. And again, just as with agents when it comes to software, these very specialized robots will deploy faster. It’s just because you can prove that given an input A, you can produce an output B very consistently and much more efficiently. And those things will go into production first.

As for the robots we all imagine and think, “Hey, I want to try out this humanoid-looking thing to do all my chores at home,” those are coming too. I think it’s just going to take a little more time. But real-life use cases are starting to increase.

Next best steps for business

Lareina Yee: What near-term steps do you think businesses should take now, given a normalization or a mainstreaming of AI, AI agents, and robotics in the workplace?

Rodrigo Liang: Take an inventory of the business, including the back office, the storefront, and the operational things. You can do it by geography or by function, but businesses should figure it out very quickly and just embrace hybrid. Because if you’re a Fortune 50 company, you have rules around customer data residing within certain countries.

So embrace hybrid and, depending on your business segmentation in terms of operations, geography, and product lines, just begin with initial starting points. It’s not all going to be on-premises at first. In fact, you should probably be thinking both.

Because every location, every business, and every function is going to likely have both of those use cases, on cloud and on-premises. You will want something on the cloud because it’s more efficient, but you’ll also need some things that you can do securely and privately on-premises.

If you start there, it allows the corporation to start learning—because you have to learn, and most companies don’t know what they don’t know. So you have to start deploying some things in certain locations that allow you to create that institutional learning.

Your ability to use the tech is going to differentiate you in the market. And every single one of your competitors in the market is also trying to do the same.

Because behind the tech, the bigger thing is change management, which has to happen for the production. The faster you can get through that curve, the more effectively you’ll be able to take advantage of the technology.

Your ability to use the tech is going to differentiate you in the market. And every single one of your competitors in the market is also trying to do the same. The one that gets there first by using the tech most effectively will gain a competitive edge.

AI access for everyone

Lareina Yee: We’ve talked a lot about AI, but I’d love to just talk a little bit about you. You’re from Brazil and have had an incredible career. Tell us a little bit about your wish list if you were bringing AI to Brazil.

Rodrigo Liang: AI’s going to be pervasive and should not be available only to those who can afford it. I think everybody should have access to this technology, regardless of where on the planet they live. Also, in every market we enter, SambaNova is very invested in linguistics, because most countries don’t want to be English-first; they want everything translated into their own language.

So when SambaNova enters a market, we come prepared for the native language, or we work with locals to help us operate in their language. Because whether it’s for customer support, interpreting documents, or translating audio and video, it needs to be native.

Lareina Yee: Final question. Tell us the origins of the SambaNova name.

Rodrigo Liang: My cofounder, Kunle, is of Nigerian background, and we had a company that was named Afara, which in his language meant “bridge.” So this time around, we were talking about something Brazilian because of my background.

And if you really want a word that immediately makes you think of Brazil, it’s down to either samba or rio. One thing led to another, and SambaNova came together. It’s a new dance.

Since SambaNova technologies are about data flow, it’s all about allowing these models to operate on their own without having to parse them, cut them, or do all the legacy things we do to workflows.

So the name stuck because it’s at the essence of what we do. Let the technology flow out there and see how it goes.