The Hollow Middle in the AI Data Promise

8 Oct 2024

As we continue exploring the latest AI innovations and startup activity, there is a noticeable “hollow middle” in this expanding space.

⬇ At the bottom are the big mainstream language models (OpenAI, Claude, Gemini, Llama), most of which are already owned or intertwined with BigTech (Google, Meta, Microsoft, etc). Few will disrupt this without multi-billion dollar investments.

⬆ At the top are what I imagine to be thousands of new single-purpose agent tools sprouting up every week that claim to solve highly role-specific outcomes for e.g. marketers, creatives, support, finance, sales, and engineers – as well as highly domain-specific sectors like e.g. health, banking, e-commerce, etc. VCs love this, but few will likely survive to scale or own their category.

🔄 But where is the data curation coming from that feeds all these agents???

📦 Out of the box, these foundational models don’t solve all your company-specific data problems, especially if the important data is buried in silos (databases, CRMs, legacy systems, files, 3rd party feeds).

And out of the box, most of these shiny tools assume you provide them the right data to provide the context they require to in turn provide company-specific outcomes.

Sure, you can copy and paste data into the shiny AI tools each time you need a new output. But that doesn’t scale.

And sure, you can have your dev team weave one-shot calls to your LLM API into the core tech stack. But that becomes expensive, slow, and hard to rapidly iterate on given the current state of the tooling.

And data engineering teams everywhere are underfunded and overstressed just dealing with BI requests before they can fulfil AI curation.

🧠 So, perhaps you can leverage “intermediate” AI tactics like Vector storage and RAG as core competencies in your data/engineering teams if they are large and advanced enough to own this alongside other roadmap priorities, but:

1. We can’t see a world where the majority of startups let alone enterprises become native experts in e.g. RAG or Vector storage.

2. We are reasonably certain those concepts will be fully commoditised at some point in the future, not unlike how The Cloud has commoditised e.g. Linux and network administration knowledge.

There are dozens of established players in the data tool space that have spent years making basic user analytics and data aggregation work at scale for companies with fragmented customer data. But I’m still waiting for their next-gen AI offerings.

The ground has been shifting beneath our feet for 2 years and counting, but beware of the sinkholes that might come from trying to reinvent data curation if you haven’t already developed it as a core competency at your business — i.e. you already have good BI data warehousing and ETL pipelines (both into AND out of your warehouse).

🚰 For the rest of us, let's see how that hollow middle gets filled. If you know great companies innovating in this space, leave them for me in the comments!