Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
With more enterprises looking to build more AI applications or even AI agents, it’s becoming increasingly clear that organizations should use different language models and databases to get the best results.
However, switching an application from Llama 3 to Mistral in a flash may take a bit of technology infrastructure finesse. This is where the context and orchestration layer comes in; the so-called middle layer that connects foundation models to applications will ideally control the traffic of API calls to models to execute tasks.
The middle layer mainly consists of software like LangChain or LlamaIndex that help bridge databases, but the question is, will the middle layer solely consist of software, or is there a role hardware can still play here beyond powering much of the models that power AI applications in the first place.
The answer is that hardware’s role is to support frameworks like LangChain and the databases that bring applications to life. Enterprises need to have hardware stacks that can handle massive data flows and even look at devices that can do a lot of data center work on device.
>>Don’t miss our special issue: Fit for Purpose: Tailoring AI Infrastructure.<<
“While it’s true that the AI middle layer is primarily a software concern, hardware providers can significantly impact its performance and efficiency,” said Scott Gnau, head of data platforms at data management company InterSystems.
Many AI infrastructure experts told VentureBeat that while software underpins AI orchestration, none would work if the servers and GPUs could not handle massive data movement.
In other words, for the software AI orchestration layer to work, the hardware layer needs to be smart and efficient, focusing on high-bandwidth, low-latency connections to data and models to handle heavy workloads.
“This model orchestration layer needs to be backed with fast chips,” said Matt Candy, managing partner of generative AI at IBM Consulting, in an interview. “I could see a world where the silicon/chips/servers are able to optimize based on the type and size of the model being used for different tasks as the orchestration layer is switching between them.”
Current GPUs, if you have access, will already work
John Roese, global CTO and chief AI officer at Dell, told VentureBeat that hardware like the ones Dell makes still has a role in this middle layer.
“It’s both a hardware and software issue because the thing people forget about AI is that it appears as software,” Roese said. “Software always runs on hardware, and AI software is the most demanding we’ve ever built, so you have to understand the performance layer of where are the MIPs, where is the compute to make these things work properly.”
This AI middle layer may need fast, powerful hardware, but there is no need for new specialized hardware beyond the GPUs and other chips currently available.
“Certainly, hardware is a key enabler, but I don’t know that there’s specialized hardware that would really move it forward, other than the GPUs that make the models run faster, Gnau said. “I think software and architecture are where you can optimize in a kind fabric-y way the ability to minimize data movement.”
AI agents make AI orchestration even more important
The rise of AI agents has made strengthening the middle layer even more critical. When AI agents start talking to other agents and doing multiple API calls, the orchestration layer directs that traffic and fast servers are crucial.
“This layer also provides seamless API access to all of the different types of AI models and technology and a seamless user experience layer that wraps around them all,” said IBM’s Candy. “I call it an AI controller in this middleware stack.”
AI agents are the current hot topic for the industry, and they will likely influence how enterprises build a lot of their AI infrastructure going forward.
Roese added another thing enterprises need to consider: on-device AI, another hot topic in the space. He said companies will want to imagine when their AI agents will need to run locally because the old internet may go down.
“The second thing to consider is where do you run?” Roese said. “That’s where things like the AI PC comes into play because the minute I have a collection of agents working on my behalf and they can talk to each other, do they all have to be in the same place.”
He added Dell explored the possibility of adding “concierge” agents on device “so if you’re ever disconnected from the internet, you can continue doing your job.”
Explosion of the tech stack now, but not always
Generative AI has allowed the expansion of the tech stack, as more tasks became more abstracted, bringing new service providers offering GPU space, new databases or AIOps services. This won’t be the case forever, said Uniphore CEO Umesh Sachdev, and enterprises must remember that.
“The tech stack has exploded, but I do think we’re going to see it normalize,” said Sachdev. “Eventually, people will bring things in-house and the capacity demand in GPUs will ease out. The layer and vendor explosion always happens with new technologies and we’re going to see the same with AI.”
For enterprises, it’s clear that thinking about the entire AI ecosystem, from software to hardware, is the best practice for AI workflows that make sense.