DIY Scale Dependence

What’s the big deal about “scale?” It’s a word that I’ve written about before here, and one that certain types of ecologists can’t seem to stop talking about. But it can be an infuratingly vague word to pin down, given that it can have more than one meaning, even in technical usage. And the fact that scale-dependent thinking is applicable to such a staggeringly wide range of phenomena, while a testament to its relevance, hardly helps in nailing down what it means. As a grad-student friend of mine, no slouch when it comes to quantitative reasoning, asked me recently, “What’s the deal with ‘scale?’ It all just seems kinda theoretical to me.”

Let’s take a concrete example. Say you’re interested in how seabirds forage for small fish, and so you put small GPS recorders on some birds which record their position every second. Examining the tracks, one of them looks like this:

Since I don’t have any real data, I had to make some up. For simplicity’s sake, I used a random walk: at each step, the bird moves a random distance from its current position.[1] The path starts on the right side near the middle, and then wanders around in a counterclockwise loop, ending up in the top right. This isn’t a terrible approximation of a bird swooping around its foraging grounds, searching for fish. If you use R, you can reproduce the above using these commands.

x <- cumsum(rnorm(500, 0, 5))
y <- cumsum(rnorm(500, 0, 5))
plot(x, y, ty='o' , cex=0.2 , xlab='x (meters)' , ylab='y (meters)' )

An important question in foraging ecology is whether the animal is taking in more energy from eating than it is spending from running/swimming/flying around looking for food. To get an estimate of how many calories the bird has burned, you calculate how far it has flown. Figure out how much the x and y change between each time step, and use the pythagorean theorem to get the distance traveled.

distance <- sum(sqrt(diff(x)^2 + diff(y)^2))

This gives a total distance flown of 3114.4 meters.

But suppose that because of budget cuts to the National Science Foundation, your research grant was actually 30% smaller. As a result, you could only afford a cheaper version of the GPS tag, which records the bird's position every 2 seconds instead of every one second. No big deal, right? Calculate the distance, carry on:

subsample <- seq(0, 500, by=2)
x.2 <- x[subsample]
y.2 <- y[subsample]
distance.2 <- sum(sqrt(diff(x.2)^2 + diff(y.2)^2))

Whoa! That gives a total distance of only 2256.0 meters. What happened? The bird flew the exact same path, but when you sample every two seconds instead of one, it appears to have flown 858 meters less. How does that work? Looking at the new path overlaid on the old one helps:

plot(x, y, col='grey' , ty='o' , cex=0.2, xlab='x (meters)' , ylab='y (meters)' )
lines(x.2, y.2, ty='o' , cex=0.2)

Because it skips every other observation, the new path (in black) avoids some of the random squigglyness in the old one (grey). The total measured length is 28% shorter. This is important, because if you were calculating the energy use by the bird, you would get a very different answer with a 1-second and 2-second resolution. Your conclusions would be incomplete at best, and wrong at worst.

But if you're a curious person, you might start wondering what the calculated distance would be at other resolutions. Instead of waiting around passively for the next round of corporate tax cuts before enacting your experiment, you decide to be proactive, and rush to your computer. After dashing off an angry email to your representative, you run a simple simulation, resampling the bird's track at different resolutions, and then plotting the total distance measured as a function of sampling scale.

scales <- 1:200
distances <- rep(0, length(scales))
for (i in 1:length(scales)) {
  subsample <- seq(1, 500, by=scales[i])
  x.sub <- x[subsample]
  y.sub <- y[subsample]
  distances[i] <- sum(sqrt(diff(x.sub)^2 + diff(y.sub)^2))
plot(scales, distances)

This graph shows very clearly what we mean when we talk about "scale-dependence." The distance measured depends, in a very real sense, on the scale of measurement. Fortunately, this relationship appears quite regular and predictable. It is, in fact, an example of a power law. Power laws are one of those mathematical relationships that show up all over the place in nature. Ratio of heart rate to body size? Velocity spectrum in a turbulent flow? Distribution of income among the richest Americans? Gravity? All power laws. In our case, the relationship looks like this:

D = a \, s^b

...which means, simply, that the distance measured is proportional to the scale of measurement raised to a power b. These constants, a and b, are easy to estimate. Taking the log of both sides of the equation, we can transform it from a power law to a straight line:
\ln(D) = \ln(a) + b \ln(s)

and from there, we can do a regular linear regression using R's "lm" function.

reg.distance <- lm(log(distances) ~ log(scales))

In this fit, because it is a log-transform of the real relationship, the slope (-0.528) is actually the value of the exponent in the power law, and the y-intercept (8.260) is the natural log of the coefficient a. We transform it back to get the true power law relation.

D = e^{8.260} \, s^{-0.528}  = 3867.331 \times s^{-0.528}

This relationship can be generalized a bit further as follows:

\frac{D_1}{D_2} = \left( \frac{s_1}{s_2} \right) ^ {-0.528}

What this says is that if you change the sampling resolution by a certain percentage, the percent change in measured distance is not the same—it is the proportional change in scale raised to the -0.528 power. Assuming the relationship is the same for the other bird tracks (which we need to check), we now have a general expression for how the measured distance changes with the scale of measurement. So no matter what scale we measure at, we can predict what the distance would have been had we measured at another scale.

In the real world, scaling relationships like this won't always appear, and when they do, they will usually only apply over a particular range of scales. Still, if you can deduce a scaling relationship, it's a powerful tool for reasoning about your problem, and may clarify what used to look like inconsistencies. Moral of the (somewhat artificial) story? Sample at multiple scales, or at high enough resolution that you can resample your data at a lower resolution, as I did here. And keep your mind open to the idea of different patterns and processes at different scales—what you see at first is not always the whole story!

[1] In physics, this kind of random walk is known as Brownian motion, and has all kinds of interesting properties.

Tweet about this on TwitterShare on FacebookShare on RedditShare on Google+Pin on PinterestShare on TumblrEmail this to someone
This entry was posted in Quantitative and tagged , , , , . Bookmark the permalink.

5 Responses to DIY Scale Dependence

Leave a Reply

Your email address will not be published. Required fields are marked *