Generative AI hype took over 2023, but GenAI won't be the only trend impacting data operations in 2024. As data continues to be a core element of business operations, analytics, machine learning, and AI, there is a growing need to improve data observability and governance.
Several data management trends from 2022 continue to evolve in 2023, including the move to cloud data lakes and data lakehouse architectures. Macroeconomic conditions, including inflation, continue to put pressure on organizations seeking to maximize the potential value of their data. Despite the economic challenges, some vendors were also able to raise funding, but the amount raised in 2023 pales in comparison to the numbers seen in 2021 and 2022.
Generative AI rules data
It’s no surprise that generative AI is a major trend in data management, just like in the IT industry and other industries.
Almost every major database and data platform vendor announced some form of generative AI news in 2023. Some vendors have incorporated generative AI as tools that act as assistants to help users perform various tasks. Managing data platforms and creating different types of data queries has long been a complex task, but generative AI can simplify it.
Among the many vendors that have integrated some form of AI assistant, Dremio released a Text-to-SQL AI-powered tool in June. This allows users to more easily generate SQL queries. In August, Couchbase announced Capella iQ, a generative AI tool that helps developers write database application code. Also in August, SnapLogic unveiled his SnapGPT AI tool that allows users to build data pipelines using natural language. Alation announced his Allie AI tool in October to help improve productivity with a suite of data catalog and governance tools.
In addition to integrating AI-powered assistants, database vendors have added new features to help enable large-scale language models (LLMs). LLM acts as a knowledge base for search augmentation generation (RAG) by providing functionality, typically vector database type. This functionality typically includes supporting vector embeddings as a data type and providing vector search functionality. Many database vendors have added vector search support in his 2023, including Rockset, Neo4j, Oracle Database 23c, MongoDB, and SingleStore.
Data lakehouse momentum continues to grow
Growing in popularity, data lakehouses (cloud object storage used as data lakes) have data analytics uses similar to data warehouses.
Databricks pioneered the basic concept of a data lakehouse in 2020, and other companies have entered the market in the years since. Databricks advanced its data lakehouse efforts with multiple updates during 2023, the most notable of which was the release of Delta Lake 3.0 in June. Delta Lake is one of his three major open source data lake table formats, along with Apache Iceberg and Apache Hudi.
To limit potential disruption and lock-in risks across the three open source data lake table formats, the OneTable open source project announced an interoperable metadata layer across Hudi, Delta Lake, and Iceberg. Apache Hudi vendor Onehouse launched OneTable with support from Google and Microsoft.
Oracle got into the lakehouse act in July with the launch of the MySQL HeatWave Lakehouse service. MySQL Heatwave is a service that combines both operational and analytical database capabilities into an integrated database, which is another positive trend across the board.
Data governance and observability remain top priorities
Whether it's AI, data manipulation, or analytics, the topic of data governance is becoming increasingly important.
Being able to understand where your data comes from and how it is made available and used is important for security, privacy, accuracy, and reliability. Through 2023, multiple vendors expanded and enhanced their data governance capabilities to help manage data.
Needing to strengthen data governance, Informatica acquired startup Privitar in June to help cloud data platform vendors improve their capabilities. Collibra has improved data quality, lineage, and detection capabilities.
In November, Starburst updated its Galaxy cloud service with automated data governance powered in part by GenAI.
Being observable is part of being able to manage and manage data effectively. In 2023, the rise of generative AI and vector databases will increase the importance of the ability to observe and manage data used in AI in the future. In November, Monte Carlo launched new data observability features specifically focused on vector databases.
Investment capital growth slows down
One of the many indicators of the health of the data management industry is the pace of fundraising activity for emerging vendors.
Although the volume of funding events was lower than the previous two years, several data platform vendors secured large funding rounds throughout 2023 to drive expansion and innovation.
Earlier this year, InfluxData, creator of the InfluxDB time series database, secured $81 million in a February funding round. The company released InfluxDB 3.0 in April and released new deployment options including InfluxDB Clustered for private cloud and on-premises environments.
Onehouse raised $25 million in February for its OneTable initiative to accelerate data lakehouse interoperability. Databricks he raised $500 million in September and plans to use the funding to generate AI-focused research and development and geographic growth. Databricks has introduced new tools to build generative AI applications that leverage customers' own data, including vector search and his RAG pipeline.
Also in September, Denodo received a $336 million equity investment from private equity firm TPG Growth. Denodo recently added new data governance features, including data lineage, and launched a free tier to attract new users.
Data management should continue to be the foundation of data analytics, operations, and AI efforts in 2024 and beyond. Further integrating generative AI into data platforms, including data lakehouse efforts, makes sense for both vendors and users to improve efficiency and achieve more with less.
Sean Michael Kerner is an IT consultant, technology enthusiast, and tinkerer. He is known for pulling his ring of tokens, configuring NetWare, and compiling his own Linux kernel. He consults with industry and media organizations on technology issues.