Generative AI is a key feature of all new software and hardware projects, so it's no surprise that Microsoft is developing its own machine learning models. VASA-1 is one such example, capable of converting a single image and audio track of a person into a convincing video clip of that person speaking the recording.
Just a few years ago, anything created by generative AI was instantly identifiable due to several factors. For still images, this can include simple information such as the number of fingers on a person's hands or the correct number of feet. The AI-generated video was even worse, but at least it was meme-worthy.
However, Microsoft's research report shows that the indisputable nature of generative AI is rapidly disappearing. VASA-1 is a machine learning model that converts a single still image of a person's face into a short, realistic video using a voice audio track. The model examines changes in the tone and pace of the sound and creates a series of new images with faces modified to match the audio.
Some of the examples posted by Microsoft are surprisingly good, so this explanation is not accurate. However, others have received less attention and it is clear that researchers have selected the best examples to showcase their work. In particular, a short video demonstrating the use of the model in real time highlights that we still have a long way to go before physical reality and computer-generated reality become indistinguishable.
But still, the fact that this was all done on a desktop PC, even with an RTX 4090, rather than a huge supercomputer, means that just about anyone with access to such software could use the generated AI. shows that it is possible to create perfect deepfakes. The researchers acknowledged this in their study report.
“It is not our intention to create content that will be used to mislead or deceive. However, like other related content generation technologies, it is not intended to be used to impersonate humans. We oppose any activity that creates misleading or harmful content about real people, and we are interested in applying our technology to advance counterfeit detection. there is. ”
Perhaps this is why Microsoft's research is currently being conducted behind closed doors. That being said, I can't imagine it would be long before someone succeeded in not only copying that work, but improving upon it, and possibly using it for nefarious purposes. On the other hand, if VASA-1 could be used to detect deepfakes and implemented in the form of a simple desktop application, this would be a huge step forward. In fact, it will be a step away from a world where AI is doomed. all of us. yay!