× close
Each sample in the TorNet dataset contains six types of radar images representing different radar data products. The images shown here are two of those products, including reflectance and radial velocity from an example tornado in the dataset.Credit: Massachusetts Institute of Technology
As spring arrives in the Northern Hemisphere, tornado season begins. A tornado funnel of dust and debris seems like an unmistakable sight. But that sight can be hidden by meteorologists' tools: radar. It is difficult to know exactly when or why tornadoes occur.
A new dataset may contain the answer. Contains radar returns from thousands of tornadoes that have struck the United States over the past decade. On either side of the storm that produced the tornado are other severe storms, including storms with nearly identical conditions that did not produce the tornado. Researchers at his MIT Lincoln Laboratory, who have curated a dataset called TorNet, have released this dataset as open source. They hope to enable breakthroughs in detecting one of nature's most mysterious and violent phenomena.
“Many advances are being driven by readily available benchmark datasets, and we expect TorNet to lay the foundation for machine learning algorithms for both tornado detection and prediction,” said James Kurdzo. said Mark Veillette, co-principal investigator on the project. Both researchers are part of the Air Traffic Control Systems Group.
Along with the dataset, the team is releasing a model trained on that dataset. This model shows promise in the ability of machine learning to detect twisters. This research could break new ground for forecasters, allowing them to provide more accurate warnings that could save lives.
swirling uncertainty
Approximately 1,200 tornadoes occur in the United States each year, causing millions to billions of dollars in economic damage and killing an average of 71 people. Last year, an unusually long tornado struck along a 59-mile stretch of road in Mississippi, killing 17 people and injuring at least 165 others.
However, tornadoes are notoriously difficult to predict because scientists don't know exactly why they form. “We're seeing two storms that look the same, but one produces tornadoes and one doesn't. We don't fully understand it,” Creuzot said. .
The basic elements of a tornado are a thunderstorm with instability caused by rapidly rising warm air and wind shear that causes rotation. Weather radar is the main tool used to monitor these conditions. But tornadoes, even if reasonably close to radar, are too low to detect. As a radar beam with a given inclination angle travels farther from the antenna, it rises higher above the ground, primarily reflecting from rain or hail carried by the storm's wide, rotating updrafts, or “mesocyclones.” can be seen. Mesocyclones do not necessarily produce tornadoes.
With this limited view, forecasters must decide whether to issue a tornado warning. They often err on the side of caution. As a result, the false alarm rate for tornado warnings exceeds 70%.
“It can lead to boy-werewolf syndrome,” Curzo says.
In recent years, researchers have turned to machine learning to improve tornado detection and prediction. However, raw datasets and models are not always accessible to the broader community, hindering progress. TorNet fills this gap.
The dataset contains over 200,000 radar images, 13,587 of which depict tornadoes. The remaining images are not tornadoes, but are taken from storms that fall into one of two categories: randomly selected severe storms or false alarm storms (storms for which forecasters issued warnings but no tornadoes occurred). It's something.
Each storm or tornado sample consists of two sets of six radar images. The two sets correspond to different radar sweep angles. The six images represent a variety of radar data products, including reflectivity (indicating precipitation intensity) and radial velocity (indicating whether the wind is moving toward or away from the radar).
The challenge in organizing the dataset was first to find tornadoes. Tornadoes are very rare phenomena in the corpus of weather radar data. The team then had to balance those tornado samples with a difficult non-tornado sample. If the dataset is too simple, for example when comparing tornadoes and snowstorms, an algorithm trained on the data may over-classify the storm as a tornado.
“The great thing about a true benchmark dataset is that we can all work with the same data at the same level of difficulty and compare the results,” Veillette says. “It also makes meteorology more accessible to data scientists and vice versa, making it easier for both parties to work on common problems.”
Both researchers represent the progress that can come from mutual collaboration. Veillette is a mathematician and algorithm developer who has long been fascinated by tornadoes. Kurdzo is a trained meteorologist and signal processing expert. During his graduate school years, he tracked tornadoes with a custom-built mobile radar and collected data to analyze in new ways.
“This dataset also means graduate students don't have to spend a year or two building a dataset; they can start working on their research right away,” Kurdzo says.
Pursue answers with deep learning
The researchers used the dataset to develop a baseline artificial intelligence (AI) model. They were particularly keen on applying deep learning, a form of machine learning that excels at processing visual data. By itself, deep learning can extract features (key observations that algorithms use to make decisions) from images across datasets. Other machine learning approaches require humans to manually label features first.
“We wanted to see if deep learning could rediscover what people typically look for in tornadoes, and even identify new things that forecasters don't typically look for. ,” says Veilette.
The results are promising. Their deep learning model performed as well as or better than all known tornado detection algorithms in the literature. The trained algorithm correctly classified 50% of weak EF-1 tornadoes and more than 85% of EF-2 or higher tornadoes, which constitute the most destructive and costly occurrences of these storms.
They also evaluated two other machine learning models and one traditional model for comparison. The source code and parameters for all these models are freely available. The model and dataset are also described in a paper submitted to the journal of the American Meteorological Society (AMS). Veillette announced this result at his AMS annual meeting in January.
“The biggest reason we make our models public is so the community can improve them and do other cool things with them,” Kurdzo says. “The optimal solution may be a deep learning model, but some may decide that a non-deep learning model is actually better.”
TorNet may also be useful in other applications in the weather community, such as conducting large-scale case studies on storms. It can also be enriched with other data sources such as satellite imagery and lightning maps. Fusing multiple types of data can improve the accuracy of machine learning models.
Step up towards operation
In addition to detecting tornadoes, Creuzot hopes the model may help scientists understand why tornadoes occur.
“As scientists, we see all the warning signs of a tornado, such as increased low-level rotation, hook echoes in reflectance data, specific differential phase (KDP) feet, and differential reflectance (ZDR) arcs. But these How does it all work together? And are there physical symptoms that we don't know about?'' he asks.
With explainable AI, it may be possible to derive these answers. Explainable AI refers to how a model can provide inferences in a human-understandable format about why it reached certain decisions. In this case, these explanations may reveal the physical processes that occur before a tornado. This knowledge could help train forecasters and models to recognize warning signs faster.
“None of this technology will replace forecasters, but it may one day be able to guide forecasters' eyes in complex situations and provide visual warnings to areas where tornado activity is expected. '' said Mr. Creuzot.
Such assistance could be particularly useful as radar technology improves and future networks are likely to become denser. Data update rates for next-generation radar networks are expected to increase from every five minutes to about every minute, likely faster than forecasters can interpret new information. Because deep learning can process large amounts of data quickly, it may be suitable for monitoring radar returns in real time in parallel with humans. Tornadoes arise and dissipate within minutes.
But the road to operational algorithms is a long one, especially in situations where safety is important, Veillette said. “Understandably, I think the forecaster community remains skeptical of machine learning. One way to establish trust and transparency is to have public benchmark datasets like this. This is the first This is the step.
The team hopes that researchers around the world who are inspired by the dataset and eager to build their own algorithms will take the next step. These algorithms are in turn put into a testbed and finally shown to forecasters to begin the process of moving them into production.
Ultimately, the path can lead back to trust.
“Even with these tools, you may never receive a tornado warning longer than 10 to 15 minutes. But if we can lower the false alarm rate, we can advance public awareness. You can do it,” Creuzot said. “People will use these warnings to take the necessary actions to save lives.”
This article is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site covering news about MIT research, innovation, and education.