Atmospheric and Oceanic Science Assistant Professor Maria Molina Answers Questions About Machine Learning and Extreme Weather

The College of Computer, Mathematical, and Natural Sciences hosted a Reddit Ask-Me-Anything spotlighting research on climate and weather extremes.

.
Atmospheric and Oceanic Science Assistant Professor Maria Molina promoting her Reddit AMA on Tuesday, August 20. Photo by Katie Bemb.

University of Maryland Atmospheric and Oceanic Science Assistant Professor Maria Molina participated in an Ask-Me-Anything (AMA) user-led discussion on Reddit to answer questions about using machine learning to understand climate and weather extremes.

Molina’s research focuses on the application of machine learning tools, such as neural networks, and numerical modeling systems to answer pressing questions in the domains of climate and extremes. She leads the PARETO (Predictability and Applied Research for the Earth-system with Training and Optimization) group. Examples of problems the group is tackling include extending understanding of Earth system predictability, parameterizing subgrid-scale processes in Earth system models and uncovering multi-scale patterns in the climate system.

Visiting postdoctoral researcher Manuel Titos and Ph.D. students Dean Calhoun, Jhayron Steven Perez Carrasquilla, Kyle Hall, Jonathan David Starfeldt and Emily Faith Wisinski—members of Molina’s research group—joined Molina to answer questions on Reddit.

This Reddit AMA has been edited for length and clarity.


Just how useful is making predictions based on past data, when the climate is changing beyond anything previously seen before?

(Molina) We can modify our architecture so they're robust enough or we can develop climate-invariant and/or physics-informed neural networks. We can then embed this information in the neural network so that when it's predicting in a future, warmer climate, it doesn't mess up. The extent to which data-driven models can extrapolate to unseen events is still an open question. 

Weather data has quietly become one of the largest continuously collected datasets around the world—but with a dataset that large, how do you make sure the data you're getting is accurate? Do you add confidence intervals to individual stations? Does your model "learn" if a station is reliably 10 degrees different than its neighbors?

(Molina) Garbage-in, garbage-out applies to physics-based models too, and is commonly referenced in weather forecasting. Your forecast is highly sensitive to how good the initial state is, so if you have a bad initial state, then your later states will be bad too (most likely).

(Hall) Every dataset has different sources of uncertainty. Part of the art of creating a re-analysis data set is identifying those, and those sources might change over the time period the data set covers. This interesting study out of Harvard on the history of ocean temperatures is worth checking out.

(Perez Carrasquilla) Both in machine learning models and in numerical models, we use ensembles that result from perturbing the initial conditions so we can mimic the uncertainty from measurements and its evolution throughout the forecast.

(Hall and Molina) Part of developing forecast models is ensuring their ensemble spread reflects the amount of uncertainty and variability coming from all these different sources. How to propagate uncertainty from different sources, including observations, through a machine learning model is not trivial and still an open question.

(Wisinski) I would start with data from a trusted source, like NOAA, which conducts quality assurance/quality control before they publish their data.

Weather and the Earth's atmosphere is naturally a chaotic system whereas machine learning is often better suited to predict regular things, is machine learning really applicable to chaotic systems?

(Calhoun) Maria sent me a really interesting paper investigating this question. What they found was that some of these machine learning models cannot successfully replicate the upscale growth of small errors that are present in traditional physics-based models (also known as chaos). This is a great question—and really an open one. These AI systems that make predictions from data do have good forecasting skill, but the question is, are they encoding the fundamental behavior of the system? Or are they focusing on reducing error as defined in the loss function? And the answer at this point is, we don't really know yet.

Growing up, I've felt like the rains are getting shorter and more intense. Is this true? Is there some example you can give us that helps quantify this trend?

(Perez Carrasquilla) Models do show that the climate is changing—in some regions, the characteristics of extreme events/rainfall/periods with certain temperatures are changing. However, despite our current knowledge, there still exists a range of outcomes regarding the specific spatial characteristics of these changes, meaning in which regions they occur.There's also a range of possible outcomes about how strong these changes will be. Though there is a lot of literature about which physical processes are responsible for some of these changes, it's still an open field. 

Based on the latest data, when would the Gulf Stream slow down to a halt and interrupt our Earth systems? What are the projected direct consequences of this? They've taught us about this since my undergrad 10 years ago but I have no idea on the latest data/projections.

(Hall) The Gulf Stream is often conflated with the northward branch of the Atlantic Meridional Overturning Circulation (AMOC). The Gulf Stream is at least partially the result of a conservation of potential vorticity (rotational energy from the Earth spinning), which is related to the North Atlantic gyre rather than AMOC. There's also an aspect that's driven by salt and heat gradients—which would be AMOC—and that's the contentious part where there's a range of outcomes.

Assuming you use a black-box model for machine learning, after the machine has detected some kind of pattern, how do you go about trying to understand it? Is there a process to use that can help derive equations, relationships between variables, etc.?

(Molina) Explainable AI (XAI) has a range of methods that can be used to explain deep learning models after they have been trained. These include variable importance type analyses, heat maps that show where a neural network is looking for a prediction (e.g. layer-wise relevance propagation) and symbolic regression (which can provide a mathematical equation describing a relationship among the data). All these methods are available open source.

How do the last two years of global weather compare to the last Intergovernmental Panel on Climate Change (IPCC) report forecasts? 

(Group) The IPCC report does not issue forecasts, so there's not really a way for us to evaluate the last two years against it. The IPCC just reflects our current best scientific understanding of the processes and effects of climate change. They're taking climate model simulations running out to 2100 under different emissions scenarios from different modeling centers and referencing the latest research. You can take a look at the Community Earth System Model or other models to take a look at different climate model tracks for the future.

About the College of Computer, Mathematical, and Natural Sciences

The College of Computer, Mathematical, and Natural Sciences at the University of Maryland educates more than 8,000 future scientific leaders in its undergraduate and graduate programs each year. The college's 10 departments and nine interdisciplinary research centers foster scientific discovery with annual sponsored research funding exceeding $250 million.