Here is an example. Where should I file taxes if my child goes to college out of state? When I asked, TurboTax's “Intuit Assist” bot offered unrelated advice about tax credits and extensions. Ta. H&R Block's “AI Tax Assist” bot gave me the false impression that she had to file in both places. (Correct answer: She will only apply in another state if she has income in another state.)
Whenever I asked a question, I got a lot of the same bullshit, misleading, or inaccurate AI answers.
what happened? We've all heard about the incredible potential of generative AI. But now, as companies pack still-experimental AI into everyday things, we have to navigate a parade of terrible AI products. As consumers, it is up to us to decide how to evaluate each new AI we encounter. (See an AI you need to investigate? Send us an email.)
The good news is that you can ignore chatbots completely and still pay your taxes. My experience shows us that we need to be especially wary when generative AI has real-world consequences if it goes wrong. And we can't necessarily trust companies experimenting with AI to make the right decisions to protect our interests.
I tested TurboTax and H&R Block's AI with the help of real tax experts hired from EP Wealth Advisors, an independent wealth management firm. TurboTax's self-help AI (accessed by clicking the question mark in a circle in the top right corner of the screen) got more than half of the 16 test questions wrong. he asked. Most of the time I got a totally irrelevant response. After I shared the results with his TurboTax maker, Intuit, the company changed some of the ways the bot selects answers. But that new version of Intuit Assist still didn't help with a quarter of the questions.
H&R Block's AI returned unhelpful answers to more than 30% of questions. Although it performed well on 529 plans and mortgage deductions, it confidently recommended false filing statuses and misrepresented IRS guidance regarding virtual currencies.
“I feel very secure in my job as a tax professional,” said Beverly Goodman, a tax manager who helped analyze the AI advice.
Both companies include text below their chatbots explaining that they are in development. “Intuit Assist is still in development and we will continue to improve it with your help,” he said of TurboTax. “Al Tax Assist is a digital helper that is still learning, so check out all the answers,” he says of H&R Block.
That much is clear. If the fine print on a product says “Do not trust us,” then you shouldn't trust it.
Both companies are not using AI to actually calculate people's taxes, but just to answer questions and help people understand their tax liability, so tax preparers states that the risk to the company is limited. Both companies disputed my testing, saying the AI's answers could not be seen outside of a wide range of tax preparation services, such as traditional software for checking returns. Both companies offer a way to ask a human agent questions, but TurboTax usually comes with an additional fee.
But imagine if someone relied on incorrect advice from a chatbot to make important decisions about what to report, or even which filing status to use. Being audited is just scary. So, to a lesser extent, you end up wasting even more time filing your taxes. Wrong and unhelpful Answers are wrong and unhelpful. period.
My results are anecdotal. Both companies declined to disclose their own data on the accuracy and relevance of their chatbots. But if journalists and a handful of tax experts could so easily spot holes in their own AI, one wonders why these companies didn't do it. Or worse, maybe they just didn't care.
How did tax bots go wild?
Both Intuit and H&R Block say the goal of integrating AI into their products is to help users ask questions that might otherwise turn to Google for answers that may be unreliable. I did. That might save you time and the cost of talking to an expert (I suspect these companies are trying to cut costs).
However, the performance of these chatbots shows the limitations of the current state of the art in AI, which should be a red flag for any product where accuracy is paramount.
Companies are using generative AI in a variety of ways. Intuit uses this for several purposes, including providing self-help for your questions, providing additional feedback during the filing process, providing customized instructions for your completed filing, and translating the Services into Spanish. integrated into TurboTax. My test focused on the first part: Q&A. Because it seemed the most useful.
The answers you receive are a mix of written answers and links to pages created by TurboTax staff and a self-help community of experts. Intuit said the bot is a hybrid of long-standing self-help systems and new generative AI technology designed to help people with less common questions.
When I first tested it, the AI frequently directed me to results written by the TurboTax community. However, it was very bad at finding relevant answers, and sometimes linked to pages that left a false impression of what the answer should be. It reminded me of the useless old Clippy from Microsoft Office.
Why did AI miss the mark so much? “Our legacy technology was positioned as the first place for Intuit Assist to pull answers,” said Intuit spokesperson Karen Nolan. . “Now, we've updated this so that Intuit Assist self-help answers are retrieved first through the more advanced multifaceted search feature.”
After Intuit updated the software, my second test yielded more accurate and easier answers, including questions about tax credits for education and billing for a new air conditioner. However, the chatbot still returned an answer and a link to irrelevant information 1 in 4 of the time.
H&R Block's AI is a more visible part of the website and is more like a typical chatbot in the vein of ChatGPT. H&R Block told me that the company uses his Microsoft foundational technology and has specially trained its in-house tax lab to get answers.
The answers to my questions were more on-topic, but I was also more likely to make up incorrect answers. For example, at the recommendation of a tax professional, I asked a specific question about cryptocurrencies. Do I need to report so-called wash transactions where the net amount is zero? H&R Block's bot said the IRS doesn't consider this question. However, that is not true, it is a fact, and the wash sale rule does not apply to cryptocurrencies.
The company says AI Tax Assist answers are hand-picked based on the most common questions asked by customers over the past year, so “niche” questions like questions about virtual currency sales rules “We may not be able to fully deal with it,” he said.
H&R Block also suggested that my questions lacked the “specificity and clarity” that AI needs to be effective.
Imagine being given such an excuse by a human tax agent.
Users are not guinea pigs
Clearly, both bots needed more work. So why include them in a product used by tens of millions of people who are stressed about paying actual taxes?
Companies seem to have different standards for where responsibility lies than I do. They think it's okay to give bad answers to some people as long as the following conditions are met: Some people get better things, and the system continues to improve. In other words, we are their guinea pigs.
The Q&A feature I tested is still in the “very early stages” and is “just one part of the broader use of AI in TurboTax,” says Intuit's Nolan. “It's only her first year, but we continue to innovate with her Intuit Assist and how we can better serve our customers,” she said.
My take: If AI is built into a product, it needs to work.
H&R Block spokeswoman Terry Daly wrote in an email: Will it happen? Probably not. The main reason for this is that our country's tax system is extremely complex and the wording is ambiguous. ”
She continued: “We believe in being very careful by placing language that prompts individuals to confirm their answers directly above the prompt area. Other features such as guided interviews built into our DIY software We feel strongly that allowing individuals to connect to Tax Pro for free will ultimately ensure correct filing.”
It is not wise to include a disclaimer. You're shifting the blame onto users who are probably not in a position to assess the situation. Taking ownership means working behind the scenes to test and improve your product until it can give you the right answers to questions and say “I don't know” when you don't know. It means that.
H&R Block also touted the fact that it provides access to human agents at no additional cost. That's all well and good, but for complex tax topics, you often need to already be a tax expert to know that one of the AI's answers is questionable.
Shouldn't companies that market AI as an information source be held liable for harms that result from AI? Unfortunately, without stronger laws, we can't let both Intuit and H&R Block give you the wrong advice. A disclaimer in a company's fine print may be enough to protect it from liability for doing something.
At least in one respect both companies are doing the right thing. Both companies have told me they guarantee the accuracy of their tax returns and will help them if they are audited because they listened to bad AI advice. Let's hope it's done through humans and not chatbots.