My last post, on the connection between the equatorial Pacific Ocean and the floods in Eastern Australia, closed with a graph of the Southern Oscillation Index (SOI) over the past 40 years, and with an assertion that major rains and flooding in Queensland were associated with this index. (As a reminder, it is derived from the difference in air pressure between Tahiti and Darwin, Australia, and is one indicator of El Niño and La Niña.) At least one Australian columnist says “we should have seen it coming”. Should they have? As Levar Burton would say, don’t take my word for it. Let’s check out that connection with some quick-and-dirty statistics.
First, we will need rainfall records to line up agains the SOI. These are available from the Australian Bureau of Meteorology here. Let’s look at the period from 1970 to the present:
Very spiky. Looking closer, the spikes are all occurring mid-summer down under (that is, in December, January, and February). Rainfall in Queensland is extremely seasonal. A few of the spikes are higher than the rest, most notably in 1974, when there were also major floods in Queensland. On further reflection, it doesn’t make much sense to compare the SOI directly with rainfall, since the rain changes more from summer to winter than it does between years. We’ll want to remove the average seasonal cycle from the data to leave us with anomalies, or departures from that average.
The graph below shows each year of rainfall since 1900 plotted as a separate line over the twelve months of the year, giving us a sense of how the annual cycle of rainfall varies from year to year. From these we can calculate the average monthly rainfalls, shown by the black line.
This graph shows us several things. January is the rainiest month, and August the driest. Rainfall is also much more variable during the Austral summer (November-March) than it is during the rest of the year. We can also compare particular years to the mean. The last year of disastrous flooding, 1973-74, is shown in red (the line starts in the middle in July 1973, then runs off the right edge and wraps around to start 1974 on the left). November and December 1973 were among the rainiest on record. January 1974 set the all-time record, half again as rainy as the next highest. Rainfall in 2010 has been above average since July, and well above average since September. Last month was the rainiest December on record.
Now that we have the average rainfall for each month (a.k.a. the “climatology”), we can subtract it from the data to get anomalies, which then look like this (again, just plotted from 1970-present):
Still spiky, but with no immediately obvious periodicity. Let’s go ahead and do a simple regression of Queensland rainfall on the SOI:
Aha! There is a positive correlation—higher values of the SOI are associated with above-average rainfall in Queensland. This result is highly significant (p << 0.001), meaning that it is very, very unlikely this result would occur by chance. Still, it isn't a particularly strong relationship. There is still a lot of scatter around the best-fit line, and our simple regression model only explains about 13% of the variability. There are evidently (and not surprisingly) other, more complicated dynamics going on.
Finally, let's dig a little deeper into this correlation. It would be reasonable to wonder if rain perhaps lags the SOI by some amount of time. Maybe it takes a month or two for the oceanic and atmospheric conditions expressed in the SOI to manifest themselves as increased rain in Queensland. To check this idea, we use a statistical tool called the cross-correlation function.
The graph above shows the strength of the correlation of rainfall with SOI as a function of the time lag between the two series. When the lag is negative, rainfall is regressed on previous values of SOI. When the lag is positive, SOI is in effect regressed on previous values of Queensland rainfall. When the bars stick out past the dotted blue lines, they are significant at the 5% level.
So what does this show? SOI and rainfall are positively correlated for about seven months in either direction. The correlations look a bit stronger on the left side—that is, the SOI is a better advance predictor of rain than vice-versa. But generally, SOI doesn’t look like a super-precise predictor of rainfall from month to month, even if it is a good indicator at a seasonal or yearly scale.
I’ll close out with a snippet from the Australian poet Dorothea McKellar. I’m told that little Aussies learn this in school the same way I learned Longfellow growing up in Boston…Listen, my children, and you shall hear, of the midnight ride of Paul Revere…er, ahem.
I love a sunburnt country,
A land of sweeping plains,
Of ragged mountain ranges,
Of droughts and flooding rains.
First, I should note that this really is quick-and-dirty climatology. I’m not an expert in this, nor am I familiar with this part of the world, and I didn’t go that far beneath the surface here, statistically speaking. I could dig deeper, but I have other things I should be doing, such as my thesis and laundry. Take it with a grain of salt.
All data analyzed here came from the Australian Bureau of Meteorology, as mentioned above. With the help of some Python magic, I compiled them into two data files, soi.csv and qlnd_rain.csv. Each has three columns for the year, month, and data value. All graphs and analysis in this post were done using R. The code, for those interested, is here.