Testing machine learning approaches using simulation
Simulation is a powerful tool in your data science toolbox. This is the first part of a multi-part series about the different ways simulations can help in data science and machine learning. This article describes how to use simulation to test machine learning approaches.
Specifically, we'll show you how to use simulation in three ways:
- Testing machine learning approaches
- Compare the performance of different machine learning models
- Evaluate model behavior in different situations
Before getting into this particular application of data simulation, let's define the simulation.
What is data simulation?
The definition of data simulation is very simple. It is the creation of fictitious data that mimics the characteristics of real-world data.
When do you want to simulate your data?
- If you want an “answer” to a question that cannot be observed in the real world, that is, with real world data, you can only infer the relationship between X and y. But using simulated data, create The relationship between X and y — You can use this “answer” to test your machine learning and analytics approaches to see if they discover the relationship you simulated.
- If you don't have actual data or have very limited data
- When you want to simulate something that has never happened before
Simulated data is often created with some degree of randomness. Typically, we derive randomness from probability distributions based on observed data or domain knowledge. For example, if you want to simulate the productivity of orange trees, you can randomly sample from the distribution of orange tree productivity. You can create a probability distribution through observation (if you have a data set of productivity for many orange trees) or you can draw it from a statistical distribution that describes the productivity of oranges. For example, the productivity of orange trees is normally distributed with a mean of 150 pounds. The deviation is 24 pounds (this is completely made up, please don't fact check!).