Data modeling with Rich Fox
Analytics consultant and educator Rich Fox explains the ins and outs of data modeling, including why it remains an essential practice in the age of AI.
In this “Speaking of Data” podcast, Rich Fox explains the ins and outs of data modeling and why it's still an essential practice in the age of AI. Fox has nearly 25 years of experience in data and analytics, both as a practitioner and as a consultant and educator, where he is on the faculty of TDWI. [Editor’s note: Speaker quotations have been edited for length and clarity.]
“I come from the business side,” Fox said. “So answering business questions is still important. It requires data. What I tell people is that you can do all kinds of interesting analysis with data, but data Data is so important because you can't do anything without it.'' His definition of data modeling is simply collecting disparate data sources from within and outside the company and combining them for business and analytical purposes. explained.
Fox noted that a key recent change has been the increased use of external third-party data to mix with internal first-party data. “The reason this is important is because a lot of the data on the source systems is not clean,” he said. “This does not mean the data is inaccurate, but rather data that does not match fields from other sources.
“A good example of this is dates. Some sources may have month/day/year and others may have day/month/year, and you can't simply merge them. Some sources may have leading zeros in the customer ID field, while others may not. We run into these issues all the time, so One of the main purposes of data modeling is to create multidimensional data models for various purposes such as analysis.”
Fox said new competitive threats to businesses always exist and usually need to be addressed using some new data source. He gave the example of working for a restaurant group in 2020 during the coronavirus outbreak. The company ultimately obtained epidemiological data from Johns Hopkins University to assess the pandemic's impact on 750 restaurants and determine how best to respond, down to specific regions.
He added that many companies still struggle with modeling their data, especially since so many organizations are moving to the cloud. “Many companies simply lifted their existing data warehouse architecture and migrated it when moving to the cloud,” he said. “They didn't take advantage of the opportunity to optimize their models for the cloud. Now they want to revisit and upgrade their models and take advantage of cloud technology.”
When asked about the potential impact of generative AI on data modeling, Fox gave a modest answer. “AI has the potential to have the same impact as the industrial revolution 100 years ago, but I don't think it will happen anytime soon,” he said. “About 12 years ago, I was lucky enough to work on the team that was deploying IBM's Watson, the AI that won Jeopardy, and was able to see the behind-the-scenes operations. One of the areas was healthcare, looking at how Watson could help doctors diagnose cancer or pre-cancerous conditions faster. Even though it was so large, it meant that developing and training the algorithm would take years.”
He explained that AI will have a significant impact on the way business is done and on daily tasks and processes. After all, the cost of doing business continues to rise, he noted, but companies can only raise prices before customers go elsewhere. As a result, the only solution is to make operations more efficient and lower costs, and that's where he said improvements such as AI and robotics can help.
For more information about Rich's upcoming TDWI Data Modeling Seminar (April 8-10, 2024), visit TWDI.org.