It would be easy to think that Apple is late to the AI space. Since his late 2022 when ChatGPT took the world by storm, most of Apple's competitors have been tipping themselves over to catch up. Apple has certainly talked about AI and released some products with it in mind, but it seemed like they were playing around with it a little bit more than tackling it head-on.
But rumors and reports over the past few months suggest that Apple was actually just waiting for the right moment to make a move. In recent weeks, there have been reports that he is in talks with both OpenAI and Google about Apple enhancing some of its own AI capabilities, and the company is also working on its own model called Ajax. I'm here.
A look at the AI research that Apple has published provides a glimpse into how Apple's approach to AI comes to fruition. Now, it's clear that basing product hypotheses on research papers is a very inexact science. The line from research to store shelves is windy and full of holes.But at least you can understand what the company is about thought And we'll also explain how its AI features will work when Apple starts talking about them at its annual developer conference WWDC in June.
Smaller and more efficient model
I think you and I both want the same thing: better Siri. And it looks like Better Siri is coming. There's an assumption in a lot of Apple's research (and a lot of research in the tech industry, around the world, everywhere) that large-scale language models will instantly make virtual assistants better and smarter. For Apple, delivering Better Siri means making these models as fast as possible and ubiquitous.
With iOS 18, Apple plans to run all AI functionality in an on-device, fully offline model. bloomberg recently report. Even with networks of data centers and thousands of state-of-the-art GPUs, building good multi-purpose models is difficult. It's significantly harder to do that with just the guts inside your smartphone. So Apple needs to get creative.
In a paper called “LLM in a flash: Efficient Large Language Model Inference with Limited Memory” (all of these papers have really boring titles, but they're definitely really interesting!), the researchers We devised a system to store model data. It is stored on the SSD rather than the device's RAM. “We have demonstrated that we can run his LLM up to twice the size of available DRAM. [on the SSD]” the researchers wrote, “compared to traditional loading methods, we achieved inference speed accelerations of 4-5x on CPUs and 20-25x on GPUs.” It turns out that models can run faster and more efficiently by taking advantage of the available storage.
Apple researchers also created a system called EELBERT that essentially compresses LLM to a much smaller size without making it significantly worse. The compressed version of Google's Bert model was 15 times smaller, at just 1.2 megabytes, with only a 4% quality loss. However, it also came with a latency trade-off.
In general, Apple is working to resolve core tensions in the modeling world. This means that larger models can be better and more useful, but they can also be more unwieldy, consume more power, and run slower. Like many other companies, the company is trying to find the right balance between all of these factors, while also figuring out how to make it all happen.
Siri is good too
Virtual assistants are often talked about when talking about AI products. That is, an assistant that understands things, reminds us, answers questions, and performs tasks for us. So it's not really surprising that much of Apple's AI research boils down to his one question: “What if Siri was really, really, really good?”
A group of Apple researchers is working on a way to use Siri without using a wake word at all. Instead of hearing “Hey Siri” or “Siri,” your device might be able to intuitively know if you're speaking to it. “This problem is much more difficult than voice trigger detection, as there may not be an initial trigger phrase to indicate the beginning of a voice command,” the researchers acknowledge. That may be why another group of researchers has developed a system to detect wake words more accurately. In another paper, we trained a model to better understand unusual words that assistants often don't understand.
In either case, the appeal of an LLM is that it theoretically allows you to process more information faster. For example, in the Wakeward paper, researchers found that: do not have I tried to discard all unnecessary sounds, but by instead feeding everything to the model and letting it handle what was important and what wasn't, the wake word worked much more reliably.
When Siri hears your voice, Apple has done a lot of work to help Siri better understand and communicate with you. In one paper, I developed a system called STEER (STEER stands for Semantic Turn Extension-Expansion Recognition). This system is intended to improve your interactions with your Assistant and is intended to improve your interactions with your Assistant. When to ask follow-up questions and when to ask new questions. Another example is using LLM to better understand “fuzzy queries” and understand their meaning regardless of how you say them. “In situations of uncertainty, intelligent conversational agents need to take the initiative to reduce uncertainty by proactively asking the right questions, thereby solving problems more effectively. ” they wrote. Another paper also aims to help with this. The researchers used her LLM to reduce redundancy when the assistant generated answers, making them easier to understand.
AI in health, image editors, and Memoji
When Apple talks about AI publicly, it tends to focus less on raw technical capabilities and more on the mundane things that AI can actually do for you. So while there's a lot of attention on Siri, especially as Apple looks to compete with devices like the Humane AI Pin and Rabbit R1, and as Google continues to bring Gemini to Android, it's hard to imagine what Apple might be thinking. There seem to be many other ways that AI can be useful.
One area that Apple clearly needs to focus on is health. In theory, LLM could wade through a sea of biometric data collected by various devices and help users make sense of it all. So Apple has been researching how to collect and match all of your motion data, how to identify users using gait recognition and headphones, and how to track and understand heart rate data. Apple also created and released the “largest multi-device, multi-location sensor-based human activity dataset” available after collecting data from 50 participants using multiple body sensors. Did.
Apple also seems to imagine AI as a creative tool. For one paper, researchers interviewed a number of animators, designers, and engineers and built a system called Keyframer.[s] Users can iterate and refine the generated designs. ” Instead of typing a prompt to get an image, then typing another prompt to get another image, start at the prompt and then tweak and adjust parts of the image to your liking. Get a toolkit to help you. You can imagine this back-and-forth artistic process showing up everywhere from Memoji creation tools to Apple's more specialized artistic tools.
In a separate paper, Apple describes a tool called MGIE that lets you edit images by simply writing what you want to edit. (“Make the sky more blue,” “Make my face a little less weird,” “Add some rocks,” etc.) “Instead of succinct but vague guidance, MGIE “We derive explicit visual intentions and rational image editing,” the researchers wrote. The first experiments weren't perfect, but they were impressive.
Apple Music may also be powered by AI. In a paper called “Resource-Constrained Stereo Singing Cancellation,” researchers investigated how to separate voices in songs from instruments. This could be useful if Apple wants to provide tools to people. , remix songs the way you can on TikTok and Instagram.
Over time, we'll likely see Apple lean into this kind of thing, especially with iOS. Some of it his Apple plans to incorporate into its own apps. Some are made available as APIs to third-party developers. (The recent Journaling Suggestions feature is probably a good guide to how it works.) Apple has always touted its hardware capabilities, especially compared to the average Android device. . All that horsepower, combined with privacy-focused AI on the device, could be a huge differentiator.
But if you want to see the biggest and most ambitious AI efforts taking place at Apple, you need to know about Ferret. Ferret is a multimodal, large-scale language model that can take instructions, focus on a specific thing that it has circled or selected, and understand the world around it. It's designed for the now usual AI use case of asking your device questions about the world around it, but it might also be able to understand what you see on your screen. In Ferret's paper, researchers show that this could help with app navigation, answer questions about App Store ratings, explain what you're looking at, and more. This has very interesting implications for accessibility, but it could also one day completely change the way we use our phones, Vision Pro, and smart glasses.
We're way ahead of ourselves here, but you can imagine how this will work with other things Apple is working on. When you combine Siri, which can understand your requests, with a device that can see and understand everything that's happening on the display, you literally have a phone that can use itself. Apple doesn't need deep integration with everything. Just run the app and automatically tap the appropriate button.
Again, this is all just research, and having it all work starting this spring would be a legitimately unprecedented technological achievement. (I mean, you've tried chatbots. You know chatbots aren't great.) But I'd wager there will be big AI announcements at WWDC. Apple CEO Tim Cook teased as much in February and essentially promised it during his earnings call this week. And there are two things that are very clear to him. That means Apple is actively participating in the AI race, which could lead to a complete overhaul of the iPhone. Well, you might even be happy to start using Siri! And it will be a big accomplishment.