Return to New York on June 5th to work with executives to explore comprehensive ways to audit AI models for bias, performance, and ethical compliance across diverse organizations. Click here to learn how to participate.
The past decade has seen explosive growth in the world of data tools and infrastructure. As the founder of a cloud data infrastructure company in the early days of cloud computing in 2009, and then as the founder of a meetup community for the early data engineering crowd in 2013, before I became a “data engineer” It has been at the heart of this community ever since. ” was the title of the job. From this seat, we can reflect on lessons learned from the recent past of data tools and how they should guide development in the new AI era.
In the anthropology of technology, 2013 was a period between the era of “big data” and the era of “modern data stacks.” In the era of big data, as the name suggests, more data is better. The data was claimed to contain analytical secrets that could unlock new value for businesses.
As a strategy consultant for a major Internet company, I once combed through the data emitted by billions of DNS queries a day to discover what magic was buried within it that could become a new line of business for my company. I was tasked with creating a plan to find insights. It's worth $100 million. Did you find this insight? Not in the relatively short amount of time (a few months) we had to spend on the project. After all, storing big data is relatively easy, but generating big insights takes a lot of effort.
However, not everyone noticed this. All they knew was that they couldn't play the insight game without a data house in place. As a result, businesses of all shapes and sizes rushed to strengthen their data stacks, resulting in an explosion in the number of data tools offered by vendors offering: their This solution was the missing piece in a truly comprehensive data stack that could generate the magical insights businesses were looking for.
VB event
AI Impact Tour: AI Audit
request an invitation
Please note that I do not use the word “explosion” lightly. In his recent book, “The 2024 MAD (Machine Learning, AI, Data) Landscape,'' author Matt Turk notes that in 2012, the number of companies selling his infrastructure tools and products will be He says: He started creating a market map) he had 139 companies. In this year's edition, he has 2,011 cases, an increase of 14.5 times.
Several things happened that helped shape the current data landscape. Enterprises have begun moving many of their on-premises workloads to the cloud. Modern data stack vendors have offered managed services as composable cloud products that can provide customers with high reliability, greater system flexibility, and the convenience of on-demand scaling.
But as companies navigated the Zero Interest Rate Policy (ZIRP) period and expanded the number of data tool vendors, cracks began to appear in the MDS façade. Problems with system complexity (posed by many disparate tools), integration challenges (many disparate point solutions that need to communicate with each other), and underutilized cloud services reduce the promise of an MDS panacea. Some wonder whether this will ever be achieved.
Many Fortune 500 companies invest heavily in data infrastructure without a clear strategy for how to create value from their data (remember, insights are hard to find!) This led to soaring costs without adding value. However, it was fashionable to collect various tools. I've often heard reports of different teams within the same company using multiple, overlapping tools. For example, when it comes to business intelligence (BI), many companies have installed his Tableau, Looker, and perhaps even his third tool, which serve essentially the same business purpose while reducing the bill. You will be retrieving it three times faster than him.
Of course, this kind of excess will eventually end with the bursting of the ZIRP bubble. Still, the MAD landscape continues to grow, not shrink. why?
What is the new “AI stack”?
Clearly, many data tools companies were very well capitalized during ZIRP and will be able to continue operating in the face of tight corporate budgets and reduced market demand for their services. One reason for this is that the number of logos has not yet seen much variation caused by startup failures or consolidation.
But the main reason is the rise of the next wave of data tools, fueled by a boom in interest in AI. What is somewhat unique is that this new wave of AI gained momentum and spawned even more new data tools companies before any real market shakeout or consolidation from the last wave (MDS) was complete.
But if you believe, as I do, that the “AI stack” is a fundamentally new paradigm, this makes some sense. At a high level, AI is driven by large amounts of unstructured data (think Internet-sized mountains of text, images, and videos), whereas MDS is driven by small amounts of structured data (such as tabular data in spreadsheets and databases). data).
Furthermore, the so-called non-deterministic or “generative” nature of AI models is completely different from the deterministic approaches designed into more traditional machine learning (ML) models. These older models were often designed to predict outcomes based on a limited set of training data. But new generative AI models are designed to synthesize summaries and generate insights. This means that the output may be different each time you run the model, even if the inputs have not changed. To prove this, notice the difference you get when you ask ChatGPT the same question more than once.
The architecture and output of AI models are fundamentally different, requiring developers to adopt new paradigms to test and evaluate such responses according to the original intent of the user or application. Not to mention ensuring the ethical safety, governance, and oversight of AI systems. Additional areas that require further investigation for new AI stacks include agent orchestration (where AI models communicate with other models). Opportunities for small, purpose-built models for vertical use cases that disrupt traditional industries that are too expensive and complex to automate. A workflow tool that enables the collection and curation of fine-tuned datasets that companies can use to “insert” their own private data to create customized models.
As new developer platforms emerge, all of these opportunities and more will be addressed as part of the new AI stack. Hundreds of startups are already tackling these challenges by building new batches of, you guessed it, cutting-edge tools.
How can we build better and smarter this time?
As we enter this new “AI era,” I think it’s important to recognize where we’re coming from. After all, data is the mother of AI, and countless data tools in recent history have provided at least a solid education in winning business. We are on a firm path to treating data as a first-class citizen. But I keep asking myself:As we continue to build towards the future of AI, how can we avoid the over-tooling of the past?”
One suggestion is that companies strive to articulate the specific value that specific data and AI tools are expected to bring to their business. Over-investing in technology trends for the wrong reasons is never a good business strategy. While AI is currently sucking all the air out of the office and money out of companies' IT and software budgets, it's important to focus on implementing tools that can demonstrate: Clear value and real ROI.
Another appeal is to ask founders to stop building “me too” data and AI tool options. If the market you're considering entering already has multiple tools, take the time to ask yourself: “Are we the absolute best founding team with unique and differentiated experience that brings important insights into how we approach this problem?” ” If the answer is not a resounding “yes,” then don’t pursue building that tool, no matter how much money your VC is willing to put into it.
Finally, before investing in early-stage companies, investors are advised to carefully consider where the value may lie in the various layers of the data and AI tools stack. I often see a VC say he has one checkbox criteria. If a tool-building founder has a certain pedigree or comes from a certain technology company, they're quick to cut a check. Not only is this lazy, there are too many undifferentiated data tools on the market. No wonder you need a magnifying glass to read MAD 2024.
A speaker at a recent conference suggested that companies ask themselves, “What does it cost my business if one line of data is inaccurate?” In other words, can you establish a clear framework for how to quantify the value of data or data tools to your business?
If we don't get there, no amount of money or venture capital investment in data or AI tools will solve our mess.
Pete Soderling is the founder and general partner of Zero Prime Ventures.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
Why not consider contributing your own articles?
Read more about DataDecisionMakers