Design and Analysis of BACI Studies

Author

Carl James Schwarz

Published

2025-12-14 17:50:00

1 Design and Analysis of BACI studies

The suggested citation for this chapter of notes is:

Schwarz, C. J. (2025). Design and Analysis of BACI studies. Available at https://cschwarz-stat-sfu-ca.github.io/. Version updated: 2025-12-14.

1.1 Introduction

Environmental-impact assessment is often required as part of large projects. A very common design is the Before-After-Control-Impact (BACI) design. In this design, measurements are taken at an Impact site and at a Control site both Before and After the impact occurs.

This design is preferred over a simple Before-After comparison as a change in the response may occur independently of any impact event because of temporal effects. For example, precipitation levels may change between the Before and After periods and the change in response may be related to changes in precipitation rather than the impact event. Or measurement devices may improve over time and the observed difference is simply an artefact of the measurement process.

By establishing a Control site (where presumably no effect of the impact will be felt), the temporal change that occurs in the absence of the impact can be measured Then the differential change in the difference over the before and after periods is evidence of an environmental impact.

There are many variants of the BACI design – this chapter will review some of the most common. This chapter looks at detecting changes in the mean, but similar designs can be used to detect changes in slopes or proportions or any other response measure. The analysis of these more complex experiments can be difficult – please contact me for details.

A key assumption of BACI designs is that the system is in equilibrium before and after the impact. Schematically, we assume that the overall mean before and after the impact is stable and that the change in the mean is quick and immediate in the mean. Schematically, we are assuming a system like:

where there is a step change in the mean immediately after the impact.

Consequently, BACI designs are good for:

Detecting large changes after impact.
Detecting permanent changes after impact.
Monitoring to protect against disasters.
Monitoring for changes in the MEAN.

BACI designs are poor for:

Small potential changes after impact.
Gradual changes after impact (i.e., not a step change).
Long term monitoring.
Monitoring for changes in VARIABILITY.

It is important to recognize in advance that there will be some change over time and because of the impact. It is simply not scientifically defensible to believe that the mean response will be identical to 20 decimal places over time or in the absence of an impact. Rather, the key question is more than simply hypothesis testing for evidence of any change – regardless of the final P-value from the statistical hypothesis testing, a estimate of the effect size, i.e., an estimate of what is known as the BACI contrast, is required along with a measure of precision. Then this can be examined to see if the effect is detected and if it is biologically important. A conceptual diagram of the possible outcomes from the analysis of an experiment is:

In the left three experiments, the effects are all statistically significant (i.e., the P-value will be less than the \(\alpha\) level, usually 0.05). However, the first outcome implies that the effect is also biologically important. In the second outcome, the biological important is in doubt. In this case, more data needs to be collected or precautionary management actions should be implemented. In the third outcome, despite the effect being statistically significant, the actual effect is small and not of biological importance.

Conversely, in the right two experiment, the effect is not statistically significant. However, the fourth case really is no different than the third case. In the fourth case, the effect was not statistically significant, and the 95% confidence intervals indicate that even if the effect was different from zero, it is small. In both the third and fourth cases, no management action is needed. In the last experiment, the estimated effect size has such poor precision (i.e., a large standard error and wide confidence limits) that nothing can be said about the outcome. This failure of this experiment can be due to inadequate sample size, excessive variation in the data that wasn’t expected, or other reasons, but really provides no information on how to proceed.

The most difficult part of the above conceptual diagram is the determination of the biologically important effect size! This is NOT a trivial exercise. For example, if you are looking at the impact of warm water from a power plant upon local crab populations, how much of a change in density is biologically important?

An important part of the study design is the question of power and sample size determination, i.e., how many samples or how many years of monitoring are required to detect the biologically important difference as noted above? Far too many studies have inadequate sample sizes and end up with results as in the far right entry in the previous diagram. The proper time for a determination of power and sample size is at the start of the study – the practice of retrospective power analysis provides no new information (see the chapter on the introduction to ANOVA for details). A key (and most difficult) part of the power/sample size determination is the question of biologically important effect sizes.

Now a days, computer packages are used to analyze these complex designs. As noted in other chapters, simple spreadsheet programs (such as Excel) are NOT recommended. With the use of sophisticated packages such as R, JMP, or SAS, care needs to be taken to ensure that the analysis requested matches the actual design of the experiment. Always draw a diagram of the experimental protocol. Make sure that your brain is engaged before putting the package in gear!

A review article on the design and analysis of BACI designs is available at:

Smith, E. P. (2002).
BACI design.
Encyclopedia of Environmetrics, 1, 141-148.
http://www.web-e.stat.vt.edu/vining/smith/B001-_o.pdf

The standard reference for the analysis of complex BACI designs is:

Underwood, A. J. (1993).
The mechanics of spatially replicated sampling programs to detect environmental impacts in a variable world.
Australian Journal of Ecology 18, 99-116.
http://http://dx.doi.org/10.1111/j.1442-9993.1993.tb00437.x

In this chapter, we will look at the design and analysis of four standard BACI designs:

Single Impact site; single Control site; one year Before; one year After.
Single/multiple Impact sites; multiple Control sites; one year Before; one year After.
Single Impact site; single Control site; multiple years Before; multiple years After.
Single/multiple Impact sites; multiple Control sites; multiple years Before; multiple years After.

1.2 Analyze on log() scale

Let \(Y\) be the response of interest. It is recommended that all analyses be done on the \(log(Y)\) scale (unless \(Y\) is a temperature measurement) because

the effect size estimates do not depend on the unit of measurement
the ease of interpretation.

Consider a Before-After study. Suppose that the response of interest is the mean mass of fish in a reservoir before and after construction of an industrial plant that discharges into the reservoir. If the mean mass of fish before construction is 2.0 kg, and the mean mass of the fish after construction is 2.1 kg, then the effect is \(2.1-2.0 = .1\) kg. If the mass of fish is measured in grams, the effect size is \(2100-2000=100\) g.

On the logarithmic scale, the effect size is \(\log{2.1}-\log{2.0}=.048\) or about a 5% increase. The effect size is the same when fish are measured in grams, i.e., \(\log{2100}-\log{2000}=.048\) or again about a 5% increase.

As will be shown later, the problem with a Before-After study is that the mean mass of fish could have changed over time due to other factors other than the construction of the industrial plant (e.g., climate change). For this reason, a Before-After-Control-Impact or BACI design is preferred.

Suppose that in the control lake the mean mass of fish changed from \(1.8\) kg to \(2.0\) kg. Here the change in the mean mass of fish between the After and Before periods is \(2.0-1.8=0.2\) kg. Again, on the logarithmic scale, the change is \(\log{2.0}-\log{1.8}=.105\) or about a 10% increase.

The BACI contrast is the difference in the differences or

\[(\log{2.1}-\log{2.0}) -(\log{2.0}-\log{1.8}) = .048-.105=-.056\]

or about a 5 percentage point decline in the difference between the Before and After periods. This value also does not depend on the unit of measurement, i.e., kg or g. Notice that we use the term “percentage point” to avoid confusion of percents of percents.

What is special if the response variable is temperature. Temperature measurements are unlike weight or length measurements in that there is no natural zero. Consequently, it makes no sense to say that 10 C is “twice as hot” as 5 C. This certainly would not be true if measurements were converted to the Fahrenheit scale. In technical terms, temperature is an interval scaled measurement, while weight or length are ratio scaled measurements. There are other interval scaled variables (e.g., scaled test score such as IQ; time of day; compass directions), but temperature is the most common seen in ecological contexts.

Finally, note that \(\log{Y}\) refers to the natural logarithmic transformation.

1.3 Before-After Experiments - prelude to BACI designs

As a prelude to the analysis of BACI designs, let us consider a simple before-after experiment, where measurements are taken over time both before and then after an environmental impact.

For example, consider an experiment where two streams were measured at a small hydro electric power project. At each stream multiple quadrats were taken on the steam with a new set of quadrats chosen each year, and the number of invertebrates was counted in each quadrat. After 5 years, the hydro electric plant started up, and an additional 5 years of data were collected. Is there evidence of a change in mean abundance after the plant starts?

Note that this is NOT a BACI design, and is a VERY poor substitute for such. The key problem is that changes may have occurred over time unrelated to the impact, so a “change” may simply be natural (e.g., unrelated to the hydro project). A nice discussion of the perils of using a simple before-after design in non-ecological contexts is found at http://ssmon.chb.kth.se/safebk/Chp_3.pdf.

Also, these design typically involve a long time-series. As such, problems related to autocorrelation can creep in. Refer to the chapter on the analysis of trend data for more details. in this section, we will ignore autocorrelation which can be a serious problem in long-term studies.

This data will be analyzed several ways. First, we will look at only stream 1, and analyze the yearly averages or the individual data. In these cases, inference is limited to that particular stream and you cannot generalize to other streams in the study. If the same number of quadrats was measured in all years, the two analyses will give identical results.

Second, we will analyze both streams together and analyze the yearly averages or the individual values. Now, because of multiple streams, the results can be generalized to all streams in the project (assuming that the two streams were selected at random from all potentially affected streams). Again, if the number of quadrats was the same for all streams in all years, then the analysis of the yearly averages and the individual values will be identical.

Finally, we will demonstrate how to compute a power analysis of these designs to see how many years of monitoring would be required. The limiting feature for these designs is the year-to-year variation and no amount of subsampling (i.e., sampling more quadrats each year) will reduce this variation.

The raw data is available in the invert.csv file available at the Sample Program Library.

The data are imported into R in the usual way and part of the data are shown below:

setwd(file.path("../../Stat-Ecology-Datasets-Revised/BACI/BeforeAfter"))
invert <- read.csv('invert.csv', header=TRUE, as.is=TRUE, strip.white=TRUE)

# Declare Period as an ordered factor so that sorts in proper order
invert$Period  <- factor(invert$Period, levels=c("Before","After"), ordered=TRUE)
# We also want Year to be categorical variable so make a new variable
invert$YearF   <- factor(invert$Year)
# Stream should also be defined as a factor
invert$StreamF <- factor(invert$Stream)

# Compute the log(density) 
invert$logCount <- log(invert$count)

# It turns out that missing values cause subtle problems in some R
# functions that deal with the fitted object. Sigh... Usually, R
# is sensible enough, but not all package authors did a good job.
cat("Dimensions of full dataset :",dim(invert), "\n")
invert <- invert[complete.cases(invert),]
cat("Dimensions of reduced dataset after removing missing values: ", dim(invert), "\n")
invert[1:12,]

Dimensions of full dataset : 100 8 
Dimensions of reduced dataset after removing missing values:  90 8 
   Year Period Stream quadrat count YearF StreamF logCount
1     1 Before      1       1    58     1       1 4.060443
2     1 Before      1       2    62     1       1 4.127134
3     1 Before      1       3    48     1       1 3.871201
4     1 Before      1       4    72     1       1 4.276666
5     1 Before      1       5    41     1       1 3.713572
6     1 Before      2       1    76     1       2 4.330733
7     1 Before      2       2    73     1       2 4.290459
9     1 Before      2       4    69     1       2 4.234107
10    1 Before      2       5    66     1       2 4.189655
11    2 Before      1       1    61     2       1 4.110874
12    2 Before      1       2    48     2       1 3.871201
13    2 Before      1       3    60     2       1 4.094345

As recommended in Section 1.2, the analysis is performed on the \(\log{count}\).

We begin by computing some basic summary statistics in the usual fashion

mean.invert <- plyr::ddply(invert, c("Year","YearF","Stream","StreamF","Period"), function(x){
  # compute sample size, missing values, mean, and sd
  n <- nrow(x)
  n.miss <- sum(is.na(x$count))
  count.mean <- mean(x$logCount, na.rm=TRUE)
  count.sd   <- sd  (x$logCount, na.rm=TRUE)
  res <- c(n=n, n.miss=n.miss, count.mean=count.mean, count.sd=count.sd) # put results together
  res
  })
mean.invert

   Year YearF Stream StreamF Period n n.miss count.mean   count.sd
1     1     1      1       1 Before 5      0   4.009803 0.22051659
2     1     1      2       2 Before 4      0   4.261239 0.06203165
3     2     2      1       1 Before 5      0   4.025361 0.09549140
4     2     2      2       2 Before 5      0   4.089866 0.10701567
5     3     3      1       1 Before 4      0   3.896376 0.17340781
6     3     3      2       2 Before 5      0   4.028166 0.14173288
7     4     4      1       1 Before 3      0   3.900741 0.18704927
8     4     4      2       2 Before 5      0   4.063751 0.20839008
9     5     5      1       1 Before 4      0   3.716527 0.09116709
10    5     5      2       2 Before 4      0   3.931119 0.26369443
11    6     6      1       1  After 5      0   4.223324 0.13948434
12    6     6      2       2  After 5      0   4.251269 0.11819981
13    7     7      1       1  After 5      0   3.943078 0.14406935
14    7     7      2       2  After 5      0   4.110708 0.09235990
15    8     8      1       1  After 5      0   4.068924 0.11383272
16    8     8      2       2  After 3      0   4.277968 0.19448642
17    9     9      1       1  After 5      0   4.214481 0.07192020
18    9     9      2       2  After 5      0   4.385247 0.10377546
19   10    10      1       1  After 4      0   4.024802 0.11645040
20   10    10      2       2  After 4      0   4.158137 0.11104182

We plot the \(\ln(s)\) vs \(\ln(\overline{Y})\) for each stream and period

# Plot the log(std) vs.  log(mean)
# Because there is no relationship between the mean and the sd,
# this plot shows that indicates that no transformation is needed.

sd.plot <- ggplot(data=mean.invert, aes(x=log(count.mean), y=log(count.sd), 
                                        shape=StreamF, color=StreamF))+
  ggtitle("SD vs. Mean")+
  geom_point(size=4)
sd.plot

The standard deviations appears to be approximate equal over time and doesn’t show any evidence of relationship between the mean and the variance which is often found when dealing with count data, especially if the counts were smallish.

A plot of the time trend

plot.mean <- ggplot(data=invert, aes(x=Year, y=logCount, 
                    shape=StreamF, group=StreamF, color=StreamF))+
  ggtitle("Preliminary plot of data and means")+
  geom_point(size=2, position=position_dodge(w=0.2))+
  geom_vline(xintercept=.5+max(invert$Year[as.character(invert$Period)=="Before"],na.rm=TRUE))+
  stat_summary(fun="mean", geom="line") # , position=position_dodge(w=0.2))
plot.mean

appears to show a shift in the mean starting in year 6, but there is large year-to-year variation in both streams. However, the trend lines in both streams appear to be parallel (indicating little evidence of a stream-year interaction).

1.3.1 Analysis of stream 1 - yearly averages

We start with the analysis of a single stream as it is quite common for Before-After studies to have only one “experimental” unit. Of course, inference will be limited to that specific stream chosen, and the results from this stream cannot be extrapolated to other streams in the study.

We start by subsetting the data to include the first stream only.

# Subset the data
stream1 <- invert[ invert$Stream ==1,]
head(stream1)

# The analysis of the mean(Count)) for each year
mean.invert1 <- mean.invert[ mean.invert$Stream ==1, ]
mean.invert1

   Year Period Stream quadrat count YearF StreamF logCount
1     1 Before      1       1    58     1       1 4.060443
2     1 Before      1       2    62     1       1 4.127134
3     1 Before      1       3    48     1       1 3.871201
4     1 Before      1       4    72     1       1 4.276666
5     1 Before      1       5    41     1       1 3.713572
11    2 Before      1       1    61     2       1 4.110874
   Year YearF Stream StreamF Period n n.miss count.mean   count.sd
1     1     1      1       1 Before 5      0   4.009803 0.22051659
3     2     2      1       1 Before 5      0   4.025361 0.09549140
5     3     3      1       1 Before 4      0   3.896376 0.17340781
7     4     4      1       1 Before 3      0   3.900741 0.18704927
9     5     5      1       1 Before 4      0   3.716527 0.09116709
11    6     6      1       1  After 5      0   4.223324 0.13948434
13    7     7      1       1  After 5      0   3.943078 0.14406935
15    8     8      1       1  After 5      0   4.068924 0.11383272
17    9     9      1       1  After 5      0   4.214481 0.07192020
19   10    10      1       1  After 4      0   4.024802 0.11645040

The multiple quadrats measured every year are pseudo-replicates and cannot be treated as independent observations. As in most cases of pseudo-replication, one approach is to take the average over the pseudo-replicates and analyze the averages. If the number of quadrats taken each year was the same, the analysis of the averages doesn’t loose any information and is perfectly valid. If the number of quadrats varies from year-to-year, the analysis of the averages is only approximate, but usually is sufficient.

The averages are computed in the usual way and the actual code to do this is not shown. After the averages are taken, the data has been reduced to 10 values (one for each year in the study) with five measurements taken before the impact and five measurements taken after the impact. Consequently, the simple two-sample t-test can be used to analyze the data. As noted earlier, autocorrelation in the measurements over time has been ignored.

We use the lm() function to fit the model. The lm() function assumes equal variance for both periods. This could be examined and the t.test() function could be used for the unequal variance t-test but is not shown here.

The ANOVA table is shown below:

result100 <- lm( count.mean ~ Period, data=mean.invert1)
anova(result100)

Analysis of Variance Table

Response: count.mean
          Df   Sum Sq  Mean Sq F value Pr(>F)  
Period     1 0.085711 0.085711  5.6949 0.0441 *
Residuals  8 0.120402 0.015050                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is weak evidence of an effect with a two-sided P-value just under 0.05.

A multiple comparison procedure is used to estimate the effect size:

# Estimate the marginal means and the differences.
result100.emmo <- emmeans::emmeans(result100, ~Period)
cat("\nEstimated marginal means for each Period \n")
summary(result100.emmo)
cat("Estimated difference between means in Before and After periods\n")
summary(pairs(result100.emmo), infer=TRUE)


Estimated marginal means for each Period 
 Period emmean     SE df lower.CL upper.CL
 Before   3.91 0.0549  8     3.78     4.04
 After    4.09 0.0549  8     3.97     4.22

Confidence level used: 0.95 
Estimated difference between means in Before and After periods
 contrast       estimate     SE df lower.CL upper.CL t.ratio p.value
 Before - After   -0.185 0.0776  8   -0.364 -0.00624  -2.386  0.0441

Confidence level used: 0.95

As expected, the confidence interval for the effect size just barely excludes the value of zero. A power analysis could be conducted to see how many years of monitoring would be required to detect a reasonably sized effect.

1.3.2 Analysis of Stream 1 - individual values

The number of quadrats does vary a bit over time, and so the analysis on the averages for each year is only approximate. A more refined analysis that also provides estimates of the year-to-year and the quadrat-to-quadrat variation is possible.

Again note that quadrats are pseudo-replicates taken in each year. This analysis is functionally equivalent to the analysis of means with sub-sampling discussed in earlier chapters.

A mixed-model analysis is performed. The statistical model is

\[\log{Y} =\mathit{Period}~~\mathit{Year(Period)(R)}\]

If each year is labelled with a separate value (e.g., using the actual calendar year as the label) rather than year 1 within the Before period and year 1 within the After period, the nesting of year within period can be “dropped” and the model can be written as \[Y=\mathit{Period}~~\mathit{Year(R)}\] Note that these two models are equivalent – we are just letting the software deal with the nesting of years within each period. The Period term refers to the Before vs. After periods and represents the effect of interest.

This model can be fit using the lmer() function from the lmerTest package. Be sure to verify that Year is defined as factor prior to the fit.

# Don't forget that you need a Year as a factor.
# We created YearF as a factor when we read in the data
# We don't need to transform counts because they are largish.
result200 <- lmerTest::lmer(log(count) ~ Period + (1|YearF), data=stream1)
anova(result200, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
        Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)  
Period 0.10665 0.10665     1 7.9837  5.4241 0.0483 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA table presents the test for no period effect (i.e., is there evidence of a difference in the mean response between the Before and After period) The P-value indicates weak evidence of an effect, similar to that found with the analysis of the averages. Don’t live and die by the 0.05 rule – surely the two analyses are saying similar things!

The estimated means in the two periods are found:

# Estimate the marginal means.
result200.emmo <- emmeans::emmeans(result200, ~Period)
cat("Estimated marginal means\n")
summary(result200.emmo)

Estimated marginal means
 Period emmean     SE   df lower.CL upper.CL
 Before   3.91 0.0562 8.31     3.78     4.04
 After    4.10 0.0549 7.66     3.97     4.22

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

The estimated difference in the means between the two periods can also be estimated, and usually is of most interest.

# Estimate the period effect along with a se
cat("Estimated difference between means in Before and After periods\n")
summary(pairs(result200.emmo), infer=TRUE)

Estimated difference between means in Before and After periods
 contrast       estimate     SE   df lower.CL upper.CL t.ratio p.value
 Before - After   -0.183 0.0786 7.98   -0.364 -0.00174  -2.329  0.0483

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

Both results are similar to that from the analysis of the averages.

There are two sources of variation in this experiment. First the year-to-year variation represents the effects of year-specific factors such as total rainfall, or other random events. Second, the residual variation (or quadrat-to-quadrat) variation measures the variation over quadrats in different parts of the stream.

The variance components is extracted using the VarCorr() function:

# Extract the variance components.
vc <- VarCorr(result200)
vc

 Groups   Name        Std.Dev.
 YearF    (Intercept) 0.10473 
 Residual             0.14022

As you will see later when the power analysis is discussed, you can reduce the impact of the quadrat-to-quadrat variation on the estimates by sampling more quadrats each year. However, sampling more quadrats has no impact on the year-to-year random variation and this is often the limiting features of simple Before-After designs. As a head up, the year-to-year variation can be controlled by conducting a paired-baci experiment where both impact and control sites are measured on each year of the study.

It should be noted that in the case of a single stream, that year-to-year variation really is a combination of the actual year-specific effects and a year-stream interaction where the time-trend of this particular stream may not be completely parallel with other streams in the study. Of course, with only a single-stream measured, these two sources could be disentangled.

Diagnostic plots (not shown) show no evidence of problems.

1.3.3 Analysis of all streams - yearly averages

We now move to the analysis of both streams. By sampling more than one stream, the inference is greatly strengthened as now the conclusions are more general. Rather than referring to one specific stream, the results now refer to all streams in the population of streams on this project.

As before, we start by averaging over the pseudo-replicates (the multiple-quadrats) measured in each stream-year combination. The statistical model is

\[MeanLogYearlyAvg = \mathit{Period}~~\mathit{Stream(R)}~~\mathit{Year(Period)(R)}\]

Once again, if each year is uniquely labelled, the model can be written as: \[MeanLogYearlyAvg = \mathit{Period}~~\mathit{Stream(R)}~~\mathit{Year(R)}\]

As before, the Period term represents the difference in the mean between the Before and After periods and the other terms represent the random effects of streams and years. The residual variation represents a combination of the stream-year interaction and the quadrat-noise terms, but because the averages over the quadrats were taken, cannot be separated.

The model can be fit using the lmer() in the lmerTest package. Be sure to verify that both Year and Stream are defined as factors prior to the fit.

# Model for all streams together
# Note that Stream, Period, and Year all need to be factors
# These were defined as such when the data were read in.
result300 <- lmerTest::lmer(count.mean~ Period + (1|Stream) + (1|YearF), data=mean.invert)
anova(result300, ddf='Kenward-Roger')

Type III Analysis of Variance Table with Kenward-Roger's method
        Sum Sq Mean Sq NumDF DenDF F value  Pr(>F)  
Period 0.01361 0.01361     1     8  5.8949 0.04133 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The P-value for the test for no period effect (i.e., is there evidence of a difference in the mean response between the Before and After periods) indicates weak evidence of an effect.

The estimated marginal means in the two periods are estimated:

# Estimate the marginal means
result300.emmo <- emmeans::emmeans(result300, ~Period)
cat("Estimated marginal means for each Period \n")
summary(result300.emmo)

Estimated marginal means for each Period 
 Period emmean     SE   df lower.CL upper.CL
 Before   3.99 0.0912 1.95     3.59     4.39
 After    4.17 0.0912 1.95     3.76     4.57

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

The estimated difference in the marginal means between the two periods is usually of most interest.

# Estimate the Period effect
cat("Estimated difference between means in Before and After periods\n")
summary(pairs(result300.emmo), infer=TRUE)

Estimated difference between means in Before and After periods
 contrast       estimate     SE df lower.CL upper.CL t.ratio p.value
 Before - After   -0.173 0.0715  8   -0.338 -0.00871  -2.428  0.0413

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

The confidence interval just barely excludes the value of zero.

There are four sources of variation in this experiment, and not all all can be separated when the analysis is done on the yearly averages. First the year-to-year variation represents the effects of year-specific factors such as total rainfall, or other random events. Second, the stream-to-stream variation measures the effect of the different streams on the mean response. The plot produced earlier shows that there is some stream effect as the lines for the streams are consistently separated. Third, is the stream-year interaction where the individual streams do not have exactly parallel trend lines. Lastly, is the quadrat-to-quadrat variation measures the variation over quadrats in different parts of a stream. Unfortunately, when the yearly averages are analyzed, the last two variance components cannot be separated.

The estimated variance components are:

# Extract the variance components
vc <-VarCorr(result300)
vc

 Groups   Name        Std.Dev.
 YearF    (Intercept) 0.10776 
 Stream   (Intercept) 0.10740 
 Residual             0.04805

Notice that the residual variance component represents a combination of the stream-year interaction and the quadrat-to-quadrat variation (divided by the average number of quadrats sampled per year). The stream and year variance component are substantially larger than the residual variance!

Diagnostic plots (not shown) show no evidence of problems.

1.3.4 Analysis of all streams - individual values

Finally, we do the analysis of both streams using the individual values. By sampling more than one stream, the inference is greatly strengthened as now the conclusions are more general. Rather than referring to one specific stream, the results now refer to all streams in the population of streams on this project.

The statistical model is

\[\log{Y} = \mathit{Period}~~\mathit{Stream(R)}~~\mathit{Year(Period)(R)}~~ \mathit{Stream:Year(Period)(R)}\]

Once again, if each year is uniquely labelled, the model can be written as:

\[\log{Y} = \mathit{Period}~~\mathit{Stream(R)}~~\mathit{Year(R)}~~\mathit{Stream:Year(R)}\]

As before, the Period term represents the difference in the mean between the Before and After periods and the other terms represent the random effects of streams and years. We now can separate out the stream-year interaction from the quadrat-to-quadrat variation.

This model can be fit using the lmer() function in the lmerTest package. Be sure to verify that both Year and Stream are defined as factors prior to the fit.

# Year, Period, Stream must all be defined as factors.
# This was done when the data were read in.
result400 <- lmerTest::lmer(log(count) ~ Period + (1|StreamF)+(1|YearF)+(1|StreamF:YearF), data=invert)
anova(result400, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
        Sum Sq Mean Sq NumDF  DenDF F value  Pr(>F)  
Period 0.11534 0.11534     1 7.9984    5.87 0.04167 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The P-value for the test for no period effect (i.e., is there evidence of a difference in the mean response between the Before and After period) indicates weak evidence of an effect and the result is similar to that found when the yearly averages were analyzed.

The estimated marginal means in the two periods are:

# Estimate the marginal means
result400.emmo <- emmeans::emmeans(result400, ~Period)
cat("Estimated marginal means for each Period \n")
summary(result400.emmo)

Estimated marginal means for each Period 
 Period emmean     SE   df lower.CL upper.CL
 Before   3.99 0.0887 1.96     3.60     4.38
 After    4.16 0.0886 1.95     3.77     4.56

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

and the estimated difference in the marginal means between the two periods is:

# Estimate the Period effect
cat("Estimated difference between means in Before and After periods\n")
summary(pairs(result400.emmo), infer=TRUE)

Estimated difference between means in Before and After periods
 contrast       estimate     SE df lower.CL upper.CL t.ratio p.value
 Before - After   -0.172 0.0711  8   -0.336  -0.0083  -2.423  0.0417

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

The confidence interval just barely excludes the value of zero.

There are four sources of variation in this experiment, but all can separated when the analysis is done on the individual values. First the year-to-year variation represents the effects of year-specific factors such as total rainfall, or other random events. Second, the stream-to-stream variation measures the effect of the different streams on the mean response. The plot produced earlier shows that there is some stream effect as the lines for the streams are consistently separated. Third, is the stream-year interaction where the individual streams do not have exactly parallel trend lines. Lastly, is the quadrat-to-quadrat variation measures the variation over quadrats in different parts of a stream. These can now be separated:

The estimated variance components are:

# Extract the variance components
vc <- VarCorr(result400)
vc

 Groups        Name        Std.Dev.
 StreamF:YearF (Intercept) 0.00000 
 YearF         (Intercept) 0.10205 
 StreamF       (Intercept) 0.10320 
 Residual                  0.14018

We see that the stream-year interaction variance component is estimated to be zero. That is the source of the warning message about a singular fit above. R does not have an option to allow unbounded variance component estimates, i.e., does not allow estimates of variance components to be negative.

We see that the stream-year interaction is very small – this not surprising given the parallel responses of the two steams over the years in the study seen in the preliminary plots.

The usual diagnostic plots (not shown) fail to show any problem with the model fit.

1.4 Simple BACI - One Before/After measurement; one Impact/Control site

1.4.1 Overview

The very simplest BACI design has a single Impact and a single Control site each measured once Before and once After the impact occurs. As noted earlier, the evidence of an environmental impact is the non-parallelism of the response between the Control and the Impact site. Several examples of possible response are shown below:

The results in the first row above both show no environmental impact (responses are parallel over time); the results in the bottom row all show evidence of an environmental impact (responses are not parallel over time).

The number of replicates at each site in each year does not have to be equal, however, power to detect non-parallelism is maximized when the sample sizes are equal in each site-year combination.

The usually assumptions for ANOVA are made:

the response variable is interval or ratio scaled (i.e., a continuous variable). This assumption can be modified, e.g., responses are categorical, but this is not covered in this document. Please contact me for details.
the standard deviations at each site-year combination should be equal. This is checked by looking at the sample standard deviations in each site-year combination. If these are reasonably equal (rough rule of thumb is that the ratio of the largest to smallest standard deviation should be less than 5:1), then the analysis can proceed. If the standard deviation are grossly unequal, then consider a transformation.
the observations are independent within each site-year combination and across years. For example, if the measurement at each site are from quadrats, then the quadrats are sufficiently far apart and the identical quadrat locations are not measured after impact. [If the same spot is measured before and after impact, this is an example of blocking and will be discussed later.]
the distribution of responses within each site has an approximate normal distribution. This assumption is not crucial – it is far more important that the standard deviations be approximately equal and the observations are independent. In any case, most impact studies have too small of a sample size to detect anything but gross violations of Normality.

The analysis is straight forward. This is a classical two-way anova, a.k.a. a two-factor completely randomized design. The model to be fit (in the standard simplified syntax) is: \[\log{Y} = \mathit{SiteClass} ~~ \mathit{Period} ~~ \mathit{SiteClass:Period}\] where

\(\log{y}\) is the LOGARITHM of the response variable;
SiteClass takes the values of Control or Impact;
Period takes the value of Before or After.
SiteClass:Period interaction is the BACI effect and is of most interest.

The random variation among the multiple measurements is the only source of random variation and is implicit at the lowest level of the design.

Most standard statistical packages can analyze this design without problems.

1.4.2 Example: Change in density in crabs near a power plant

1.4.2.1 Introduction

A simple BACI design was used to assess the impact of cooling water discharge on the density of shore crabs. The beach near the outlet of the cooling water (the Impact beach) was sampled using several quadrats before and after the plant started operation, as was a Control beach on the other side of the body of water.

A schematic of the BACI design is:

The raw data is available in the baci-crabs.csv file available at the Sample Program Library available at: <../../Stat-Ecology-Datasets-Revised>. The data are imported in the usual way and part is shown below:

setwd(file.path("../../Stat-Ecology-Datasets-Revised/BACI/Crabs"))
crabs <- read.csv("baci-crabs.csv", header=TRUE, as.is=TRUE, strip.white=TRUE)

crabs$trt <- interaction(crabs$SiteClass, crabs$Period)
crabs$SiteClass <- factor(crabs$SiteClass)
crabs$Period    <- factor(crabs$Period, levels=c("Before","After"), ordered=TRUE) # sort correctly
crabs$trt       <- factor(crabs$trt)

crabs$logDensity <- log(crabs$Density) # we will wan to analyze on the log scale

cat("Listing of part of the raw data \n")
crabs[1:10,]

Listing of part of the raw data 
   SiteClass Site Period Quadrat Density            trt logDensity
1     Impact   I1 Before      Q1      23  Impact.Before   3.135494
2     Impact   I1 Before      Q2      30  Impact.Before   3.401197
3     Impact   I1 Before      Q3      27  Impact.Before   3.295837
4     Impact   I1 Before      Q4      25  Impact.Before   3.218876
5     Impact   I1  After      Q5      17   Impact.After   2.833213
6     Impact   I1  After      Q6      19   Impact.After   2.944439
7     Impact   I1  After      Q7      23   Impact.After   3.135494
8     Impact   I1  After      Q8      25   Impact.After   3.218876
9     Impact   I1  After      Q9      17   Impact.After   2.833213
10   Control   C1 Before     Q10      32 Control.Before   3.465736

Notice that only one site was measured at both the Impact and Control areas, and multiple, independent quadrats were measured at each site. We have labelled the quadrats using separate identifiers to remind ourselves that new quadrats were measured in the both the Before and After periods of the experiment. A common notation, which is not recommended, is to “reuse” the labels, e.g., you would have a Q1 in each site-year combination. This notation is NOT recommended as it not clear that separate quadrats were measured at each opportunity.

The variability in observations from a simple BACI (and more complex BACI as well) can be partitioned into several components, as will be reported in the ANOVA table.

First is the main effect of time (the Period effect). This is simply the different in the mean (over all sites and quadrats) before and after the impact took place:

This temporal (Period) effect is not of interest as it may represent the effects of changes in the global conditions unrelated to the impact.

Next, not all sites have exactly the same mean even before the impact takes place. The SiteClass main effect represents the difference in the mean among the sites when averaged over all times and quadrats:

The SiteClass effect is not of interest as it represents the effects of site-specific factors that we can neither measure nor control.

Finally, and of greatest interest, is the BACI effect, which is the DIFFERENTIAL CHANGE, i.e.,
the difference of the two changes in the means, that occurs between the Before and After periods.:

The BACI, or interaction between Period and SiteClass effect, represents the potential environmental impact and so at test for no interaction is equivalent to a test for no environmental impact.

1.4.2.2 Analysis

As recommended in Section 1.2, the analysis is performed on the \(\log{Density}\).

A table of means and standard deviations (on the logarithmic scale) can be computed in the usual way.

report <- plyr::ddply(crabs, c("Period","SiteClass"), function(x){
   res <- sf.simple.summary(x, "logDensity", crd=TRUE)
   return(res)
})
report

  Period SiteClass n nmiss     mean        sd         se      lcl      ucl
1 Before   Control 6     0 3.412531 0.1119179 0.04569030 3.295081 3.529982
2 Before    Impact 4     0 3.262851 0.1131096 0.05655481 3.082868 3.442834
3  After   Control 4     0 3.309636 0.1087065 0.05435326 3.136660 3.482613
4  After    Impact 5     0 2.993047 0.1765971 0.07897664 2.773773 3.212321

The sample standard deviations are approximately equal indicating that the assumption of equal population standard deviations is tenable. The quadrats were sampled far enough apart that they can be considered to be independent. Separate quadrats were sampled in the two time periods.

This is a simple two-factor CRD experiment because no quadrat is repeatedly measured over time, and all observations are independent of each other. The model is \[\mathit{log(Density)} = \mathit{SiteClass}~~\mathit{Period}~~\mathit{SiteClass:Period}\] where

SiteClass represents the effect of the Control vs. Impact sites;
Period represents the Before vs. After contrast; and
SiteClass:Period interaction represents the BACI effect where as noted earlier the non-parallelism (the interaction) represent evidence of an environmental impact.

The lm() function can be used to fit this model. The anova() function is applied to the result for the effect tests. The lm() function can be used for balanced and unbalanced data, but GREAT care is needed in getting the correct ANOVA tables with unbalanced data – the anova() function does NOT give the correct sums-of-squares (it gives what are known as incremental or Type I sums of squares). Refer to http://r-eco-evo.blogspot.com/2007/10/infamous-type-iii-ss-before-i-started_20.html for more details.

The Type III tests from the Anova() function from the car package requires that you need to set the effect contrasts to the sum-to-zero contrast rather than the default treatment contrast BEFORE fitting the lm() model! This is done using the contrast parameter in the lm() call as noted in the code. Refer to http://r.789695.n4.nabble.com/Type-I-v-s-Type-III-Sum-Of-Squares-in-ANOVA-td1573657.html for further information.

So use the lm() with the default contrasts for everything EXCEPT the ANOVA table! Note that if you specify the interaction term as the LAST term in the model, then all the various types of tests (I, II, III) will give identical results for the BACI contrast which is really the only term of interest.

The model is fit:

cat("The sum of squares and F-tests from the anova() below are INCORRECT\n")
cat("in unbalanced data because they are sequential and only adjust for effects\n")
cat("that enter the model prior to the term in question.")
result.lm <- lm( log(Density) ~ SiteClass + Period + SiteClass:Period, data=crabs)
cat("\nAnalysis of variance -- this is NOT CORRECT because design is unbalanced \n")
anova(result.lm)

cat("\n\nUse the Type III tests from the Anova() function from the car package")
cat("\nbut you need to set the treatment contrasts to sum rather than treatment")
cat("\nSee http://r.789695.n4.nabble.com/Type-I-v-s-Type-III-Sum-Of-Squares-in-ANOVA-td1573657.html\n")

result.lm2 <- lm( log(Density) ~ SiteClass + Period + SiteClass:Period, data=crabs,
                  contrasts=list(SiteClass="contr.sum", Period='contr.sum'))

The sum of squares and F-tests from the anova() below are INCORRECT
in unbalanced data because they are sequential and only adjust for effects
that enter the model prior to the term in question.
Analysis of variance -- this is NOT CORRECT because design is unbalanced 
Analysis of Variance Table

Response: log(Density)
                 Df  Sum Sq Mean Sq F value   Pr(>F)    
SiteClass         1 0.31631 0.31631 18.1646 0.000682 ***
Period            1 0.15503 0.15503  8.9027 0.009275 ** 
SiteClass:Period  1 0.03214 0.03214  1.8459 0.194340    
Residuals        15 0.26121 0.01741                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Use the Type III tests from the Anova() function from the car package
but you need to set the treatment contrasts to sum rather than treatment
See http://r.789695.n4.nabble.com/Type-I-v-s-Type-III-Sum-Of-Squares-in-ANOVA-td1573657.html

The order of the terms in the model is not important.

The output from most packages is voluminous and only part of it will be explored here.

First, are the effect tests for each term in the model (the Effect tests)

car::Anova(result.lm2,type=3)

Anova Table (Type III tests)

Response: log(Density)
                  Sum Sq Df    F value    Pr(>F)    
(Intercept)      194.343  1 11160.2604 < 2.2e-16 ***
SiteClass          0.251  1    14.4055  0.001759 ** 
Period             0.160  1     9.2039  0.008376 ** 
SiteClass:Period   0.032  1     1.8459  0.194340    
Residuals          0.261 15                         
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These indicate evidence that the mean \(\log{density}\) varied between the SiteClasses regardless of the potential impact; the mean response varied between Periods (the mean density changed between the two periods due to temporal effects unrelated to the impact); but there was no evidence of an environmental impact as the test for an interaction effect had a large P-value (\(p = 0.194\)).

Because the simple BACI is a two-factor CRD, it is relatively straight forward to find the profile plot and add confidence limits for each of the 4 means on the logarithmic scale:

profileplot <- ggplot(data=report, aes(x=Period, y=mean, group=SiteClass, color=SiteClass))+
  ggtitle("Profile plot of crab density")+
  ylab("logDensity with mean and 95% ci")+
  geom_point(position=position_dodge(w=0.1))+
  geom_line(position=position_dodge(w=0.1))+
  geom_errorbar(aes(ymax=ucl, ymin=lcl), width=0.1, position=position_dodge(w=0.1))+
  geom_point(data=crabs, aes(y=logDensity), position=position_dodge(w=0.2))
profileplot

The profile plot shows two approximately parallel lines, because there was no evidence of non-parallelism (i.e., no evidence of an interaction).

The residual and other diagnostic plots don’t show any problems.

diagplot <- autoplot(result.lm2)
diagplot

What is of primary interest, is the difference of the difference, i.e., how much did the change differ between the Control and Impact sites. This called the BACI contrast and is the estimated environmental impact. We first estimate the the expected marginal means. In simple designs, the expected marginal means are equal to the simple raw means, but in more complex designs they will differ. The expected marginal means will adjust for missing data and unbalance in the experimental design.

The estimates of the marginal means (and standard errors) are requested using the emmeans package.

result.emmo.SP <- emmeans::emmeans(result.lm2, ~SiteClass:Period)
cat("Estimated marginal means \n\n")
summary(result.emmo.SP, infer=TRUE)

Estimated marginal means 

 SiteClass Period emmean     SE df lower.CL upper.CL t.ratio p.value
 Control   Before   3.41 0.0539 15     3.30     3.53  63.344  <.0001
 Impact    Before   3.26 0.0660 15     3.12     3.40  49.452  <.0001
 Control   After    3.31 0.0660 15     3.17     3.45  50.161  <.0001
 Impact    After    2.99 0.0590 15     2.87     3.12  50.717  <.0001

Results are given on the log (not the response) scale. 
Confidence level used: 0.95

The expected marginal mean estimates are equal to the raw sample means (on the logarithmic scale) – this is will be true ONLY in balanced data. In the case of unbalanced data (see later), the expected marginal means may differ from the raw sample means, but seem like sensible ways to estimate the marginal means.

The estimated mean \(\log{density}\) in the Control site changed by about -0.1 between the Before and After periods; the mean density in the Impact site changed by -0.27. The BACI contrast is the difference-in-the-difference, i.e., \(-0.1--0.27=0.17\), But we would like to get an standard error for this estimate. This is obtained by specifying a contrast among the means.

The BACI contrast is: \[(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\]

The estimates of the BACI contrast is available form the summary() function applied to the contrast in the expected marginal means:

summary(contrast(result.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE)

 contrast estimate    SE df lower.CL upper.CL t.ratio p.value
 baci       -0.167 0.123 15   -0.429   0.0949  -1.359  0.1943

Results are given on the log (not the response) scale. 
Confidence level used: 0.95

There could a sign difference in the estimate depending on the order in which R sorts the levels of each term in the mode..

The P-value (\(p = 0.194\)) is the same as the test for interaction (as it must). It is pretty clear that the 95% confidence interval for the difference-in-the-difference will include 0, indicating no evidence of a BACI effect.

The key advantage of the estimate and 95% confidence interval over the P-value is that it enables the analyst to see if the test for interaction was not statistically significant (largish P-value) because the effect was small (good news) or because the experiment has so much noise that the standard error for the effect is enormous (bad news).

There is no statistical rule of thumb to distinguish between these two cases. The analyst must rely on biological knowledge. In this case, is a difference-in-the-difference of 0.17 (or 17 percentage points) of biological importance when the base densities are around 20-30 crabs per quadrat?

1.4.2.3 Limitations

The simple BACI design is less than satisfactory for most impact studies for several reasons.

First, the experimental and observation unit are different sizes. The impact is applied at the site level, but the measurements are taken within each unit at the quadrat level, i.e., are pseudo-replicates as outlined by Hurlbert (1984). This means that the scope of inference is limited – if you do detect an effect, you really only can conclude that this particular Impact site behaved different than this particular Control site. This leaves the study open to the criticism that the Control site was accidentally badly chosen and so the effect detected was an artefact of the study.

Second, the measurements are taken at a single point in time before and after the impact. Again, one could argue that the times chosen were (accidentally) poorly chosen and the results were specific to these time points. Multiple time points before and after the impact could be measured.

Third, there is only one Control and one Impact site. While impacts are normally applied to a single site, there is no reason why a single Control site must be measured. Again, if only a single Control site was chosen, one could argue that this single Control site was poorly chosen and that the results are again specific to this single Control site. Multiple Control sites could be measured to ensure that the results are generalizable to more than this specific Control site.

Fourth, if separate quadrats are measured at each site-year combination, additional variability is introduced making it harder to detect effects. Following the principles of experimental designs, blocking, where the same quadrat is measured over time improves the design at the cost of a more complex analysis.

1.5 BACI with Multiple sites; One year before/after

1.5.1 Overview

As noted in the previous section, a pointed critique of the simple BACI design is the use of a single Control site. The main criticism is that with a single Control site, the replicated measurements within each site are pseudo-replicates. The problem is that the SiteClass (Control or Impact) operates at the site level, but measurements are taken at the quadrat within the site level. This implies that inference is limited to these specific Impact and Control sites. While limiting inference to the single Impact site seem reasonable (e.g., only one dam is built), limiting inference to the single Control site seems too narrow.

Consequently, it a almost always advisable to have multiple Control sites in the impact-assessment design. The theory presented below is general enough to allow replicated impact sites as well.

A conceptual diagram of the BACI design with multiple Control sites is:

We will assume that all the sites will be measured both before and after the impact, but only one year of monitoring is done before or after impact. In later sections, we will extend this approach to allow for multiple years of monitoring as well.

All sites are usually measured in both years, i.e., both before and after impact occurs. The analysis below can also deal with sites that are measured only before the impact or after the impact (i.e., is missing some data). However, for these cases, it is important that good software be used (e.g., SAS, JMP, R) because sophisticated algorithms are required. These incomplete sites do have information that should be integrated into the analysis. In particular, sites that are measured only before or only after impact provide information about the variance components which are needed when the formal tests of hypotheses are done.

The use of multiple controls avoids the criticism that the results of the BACI experiment are solely due to a poor choice of Control sites. For example, in the above diagram, if only a single Control site was monitored, one could conclude that the impact was positive, neutral, or negative depending on which Control site was selected.

There are several, mathematically equivalent, ways to analyze these types of designs. The choice among the analysis is often dictated by the availability of software. However, if you want to use this assessment to investigate alternatives, such as increasing the subsampling, increasing the number of years of monitoring, or increasing the number of sites, then the “full” analysis that provides estimates of the individual variance components is needed.

1.5.2 Example: Density of crabs - revisited

1.5.2.1 Introduction

Let us extend the density of crabs example seen in an earlier section.

The beach near the outlet of the cooling water (the Impact beach) was sampled using several quadrats before and after the plant started operation. Two Control beaches at other locations in the inlet were also sampled.

A schematic of the experimental design is:

The raw data is available in the baci-crabs-mod.csv file available at the Sample Program Library. <../../Stat-Ecology-Datasets-Revised>. The data are imported into in the usual way and part of the data is shown below:

setwd(file.path("../../Stat-Ecology-Datasets-Revised/BACI/Crabs-mod"))
crabs <- read.csv("baci-crabs-mod.csv", header=TRUE, as.is=TRUE, strip.white=TRUE)

crabs$trt <- interaction(crabs$SiteClass, crabs$Site, crabs$Period)
crabs$Site      <- factor(crabs$Site)
crabs$SiteClass <- factor(crabs$SiteClass)
crabs$Period    <- factor(crabs$Period, levels=c("Before","After"), ordered=TRUE)
crabs$trt       <- factor(crabs$trt)

crabs$logDensity<- log(crabs$Density)

cat("Listing of part of the raw data \n")
crabs[1:10,]

Listing of part of the raw data 
   SiteClass Site Period Quadrat Density               trt logDensity
1     Impact   I1 Before      Q1      23  Impact.I1.Before   3.135494
2     Impact   I1 Before      Q2      30  Impact.I1.Before   3.401197
3     Impact   I1 Before      Q3      27  Impact.I1.Before   3.295837
4     Impact   I1 Before      Q4      25  Impact.I1.Before   3.218876
5     Impact   I1  After      Q5      17   Impact.I1.After   2.833213
6     Impact   I1  After      Q6      19   Impact.I1.After   2.944439
7     Impact   I1  After      Q7      23   Impact.I1.After   3.135494
8     Impact   I1  After      Q8      25   Impact.I1.After   3.218876
9     Impact   I1  After      Q9      17   Impact.I1.After   2.833213
10   Control   C1 Before     Q10      32 Control.C1.Before   3.465736

There was one Impact beach monitored (Site I1 before and after the impact) and two Control beaches (C1 and C2 both measured before and after impact). In each site-year combination, between four and six quadrats were measured. It is not necessary that the same number of quadrats be measured in each site-year combination and most good software can deal with this automatically. Notice that each site has a unique label as does each quadrat. This is a good data management practice as it makes it clearer that different sites are used and that all of the quadrats were located and measured independently.

We can again partition the variation in the data to that due to Period (Before vs.
After), SiteClass (Impact vs. Control), and the BACI contrast (the differential change). In addition, with multiple sites, there are now additional sources of random variation that must be accounted for:

The quadrat-to-quadrat variation (within a beach) represent micro-habitat effects that affect the very localized density of the crabs. The site-to-site variation within each site class (i.e., the multiple sites within the Control site class) represents the effect of site-specific factors that cause the means in the individual sites (all within the Control class) to vary. Finally, the Site-Period interaction represents the INconsistency in the response of the sites to the Before vs. After periods. For example, not all of the Control sites have exactly the same increase in the mean between the Before and After periods.

Note that if only one quadrat is measured per site in each year, then the quadrat-to-quadrat variation and the site-year variation are confounded together and cannot be separated.

The analyses presented below starts with the analysis of a highly “distilled” dataset and progressively move to the analysis of the raw data.

As recommended in Section 1.2, the analysis is performed on the \(\log{Density}\).

1.5.2.2 Converting to an analysis of differences

The measurements taken in each site-year combination are pseudo-replicates. Different quadrats were measured in each year, so there is no “linking” among the quadrats. This idea is reinforced in the dataset by using separate quadrat labels for each quadrat.

We first find the average count on the logarithmic scale over the four to six quadrats in each site-year combination. If the number of quadrats varied among sites, this will give an “approximate” analysis because the design is unbalanced, but is likely “good-enough”.

Use the plyr package to compute the mean density for each combination of SiteClass, Site, and Period:

report <- plyr::ddply(crabs, c("Period","SiteClass","Site"), function(x){
   res <- sf.simple.summary(x, "logDensity", crd=TRUE)
   return(res)
})
report

  Period SiteClass Site n nmiss     mean         sd         se      lcl
1 Before   Control   C1 6     0 3.412531 0.11191792 0.04569030 3.295081
2 Before   Control   C2 6     0 3.634335 0.08851389 0.03613565 3.541445
3 Before    Impact   I1 4     0 3.262851 0.11310962 0.05655481 3.082868
4  After   Control   C1 4     0 3.309636 0.10870652 0.05435326 3.136660
5  After   Control   C2 5     0 3.436145 0.10386916 0.04645170 3.307174
6  After    Impact   I1 5     0 2.993047 0.17659714 0.07897664 2.773773
       ucl
1 3.529982
2 3.727224
3 3.442834
4 3.482613
5 3.565115
6 3.212321

We then extract the averages on the logarithmic scale for each site-year combination:

# The averages are available in the report dataframe
crabs.avg <- report[,c("Site","SiteClass","Period","mean")]
crabs.avg

  Site SiteClass Period     mean
1   C1   Control Before 3.412531
2   C2   Control Before 3.634335
3   I1    Impact Before 3.262851
4   C1   Control  After 3.309636
5   C2   Control  After 3.436145
6   I1    Impact  After 2.993047

Next, you need to compute the difference in the sample means on the logarithmic scale between the Before and After period within each site.

Use the pivot_wider() function in the tidyr package to bring the two means for the same site on the same record and then compute the difference between the two periods.

# Reshape the file to get the Before and After measurements on the same line
# 
crabs.site.wide <- tidyr::pivot_wider(crabs.avg,
                                      id_cols=c("Site","SiteClass"),
                                      names_from="Period",
                                      values_from="mean"
                                      )
crabs.site.wide$diff <- crabs.site.wide$After - crabs.site.wide$Before
crabs.site.wide

# A tibble: 3 × 5
  Site  SiteClass Before After   diff
  <fct> <fct>      <dbl> <dbl>  <dbl>
1 C1    Control     3.41  3.31 -0.103
2 C2    Control     3.63  3.44 -0.198
3 I1    Impact      3.26  2.99 -0.270

Finally, we now test if the mean difference is the same for the Control and Impact groups. This is a simple two-sample t-test.

Use the t.test() function to test if the mean difference is the same in the Before and After Periods. Note that because of the very small effective sample size, only the equal variance t-test can be performed.

result <- t.test(diff ~ SiteClass, data=crabs.site.wide, var.equal=TRUE)
result$diff.in.means <- sum(result$estimate*c(1,-1))
names(result$diff.in.means)<- "diff.in.means"
result
cat("Estimated difference in the means on log scale: ",result$diff.in.means,
    "( SE ", result$stderr, ") \n")


    Two Sample t-test

data:  diff by SiteClass
t = 1.4451, df = 1, p-value = 0.3854
alternative hypothesis: true difference in means between group Control and group Impact is not equal to 0
95 percent confidence interval:
 -0.9293537  1.1678764
sample estimates:
mean in group Control  mean in group Impact 
           -0.1505426            -0.2698039 

Estimated difference in the means on log scale:  0.1192614 ( SE  0.0825278 )

The P-value for the test of the hypothesis of no difference in the mean difference between Control and Impact sites is p = 0.385 indicating no evidence of a difference in the means, i.e., no evidence of an impact. The output also includes the estimated difference in the mean differences along with a 95% confidence interval. Not surprising, the 95% confidence interval contains the value of 0.

At first glance, it looks as if we have thrown away much information – the multiple quadrats were averaged for each site-year combination, and only the difference between the averages before and after impact were used. In fact, all of the relevant information has been captured. The multiple quadrats within each site-year combination contain information only on the quadrat-to-quadrat variation within each site-year combination. The mean of these values will automatically capture the fact that 5 or 50 quadrats were measured because if a large number of quadrats were measured, the variation of the sample mean will be reduced which implies that there will be a reduction in the variation of the differences, which implies that the power of the test to detect differences in the mean difference will be improved.

Similarly, a BACI design finds evidence of an impact if the change between the Before and After period is different for the Control and Impact groups, i.e., if the mean difference between the Before and After periods is the same for the Impact and Control groups which is exactly what is tested by the two-sample t-test above.

This was a very small experiment so it is not surprising that no effect was detected. Later on, the power of this design will be examined.

1.5.2.3 Using ANOVA on the averages

Rather than going though the contortions of converting from the long to the wide format and finding differences, it is possible to do an ANOVA on the averages directly.

This design is an example of a Split-Plot-in-time design which is covered in more detail elsewhere in my course notes on my web pages.

In split-plot designs, there are two levels to the experiment. The first level is at the Site level. Here sites are chosen (at random) from each of the Impact and Control areas. Each site is then “split” by taking measurements at two time periods (Before and After).

Split-plot designs are more complex examples of two-factor designs. They differ from two-factor completely randomized designs where there is only a single level in the experiment. The model used to analyze data of this form must include terms for both the two factors (and their interaction which is the BACI effect of interest) and also for the variation at the two levels in the experiment. The experimental unit at the lowest level (in this case the time period within each site) is usually not specified in the model and is implicit. This corresponds to the Error line in the ANOVA table.

The model to be fit is: \[\mathit{Mean log(Density)} = \mathit{SiteClass}~~ \mathit{Site(R)}~~\mathit{Period}~~\mathit{SiteClass:Period}\] where

SiteClass is the classification of sites as either Impact or Control;
Site(R) is the random site effect within SiteClass. As noted previously, if you label each site with a unique label, it is not necessary to use the nesting notation of Site(SiteClass)(R). The sites are considered a random effect because we wish to make inference about the impact over all sites in the study area.
Period is the classification of time as Before or After, and
SiteClass:Period term represent the BACI contrast of interest.

The lmer() in the lmerTest package is used to fit the model.

Notice in the code below how the function distinguishes between the fixed effects and the random effects. Be sure that the random effects are defined as factors rather than continuous numbers or you will be a very unusual (and WRONG) model fit.

result.avg.lmer <- lmerTest::lmer(mean ~ SiteClass+Period+SiteClass:Period +
                    (1|Site), data=crabs.avg)

Many packages will use a technique called REML (Restricted Maximum Likelihood estimation) to obtain the test statistics and estimates. REML is a variant of maximum likelihood estimation and extracts the maximal information from the data. With balanced data, the F-tests from REML are identical to an older method called the Expected Mean Squares (EMS) method. Modern statistical theory prefers REML over the EMS methods.

The REML method comes with two variants. In one variant, estimated variance components are allowed to go negative. This might seem to be problematic as variances must be non-negative! However, the same problem can occur with the EMS method, and usually only occurs when sample sizes are small and variance components are close to 0. It turns out that the F-tests are “unbiased” when the unbounded variance component option is chosen even if some of the variance estimates are negative. As noted in the JMP help files,

“If you remain uncomfortable about negative estimates of variances, please consider that the random effects model is statistically equivalent to the model where the variance components are really covariances across errors within a whole plot. It is not hard to think of situations in which the covariance estimate can be negative, either by random happenstance, or by a real process in which deviations in some observations in one direction would lead to deviations in the other direction in other observations. When random effects are modeled this way, the covariance structure is called compound symmetry.

So, consider negative variance estimates as useful information. If the negative value is small, it can be considered happenstance in the case of a small true variance. If the negative value is larger (the variance ratio can get as big as 0.5), it is a troubleshooting sign that the rows are not as independent as you had assumed, and some process worth investigating is happening within blocks.

So if a variance component is negative and has a large absolute value, this is an indication that your model may not be appropriate for the data at hand. Please contact me for assistance. The default for SAS and R is to constrain the variance components to be non-negative and so the F-tests may vary slightly from those of JMP.

The anova() function applied to the result from the fit, gives the effect tests. Note that when using the lmer() function, you no longer have to worry about Type I, II, or III tests.

anova(result.avg.lmer, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
                   Sum Sq  Mean Sq NumDF DenDF F value Pr(>F)
SiteClass        0.010233 0.010233     1     1  4.5075 0.2802
Period           0.058897 0.058897     1     1 25.9427 0.1234
SiteClass:Period 0.004741 0.004741     1     1  2.0883 0.3854

The test for no interaction between the Period and SiteClass effects is the test of no BACI effect, i.e., no impact. The P-value of p = 0.385 is identical to that found in the previous analyses. The other F-tests represent the tests for the main effects of SiteClass or Period and are usually not of interest.

The estimates of the variance components provide some information on the relative sizes of variation found in this experiment:

The VarCorr() function applied to the result from the fit gives the variance components:

# Extract the variance components
vc.avg <- VarCorr(result.avg.lmer)
vc.avg

 Groups   Name        Std.Dev.
 Site     (Intercept) 0.118448
 Residual             0.047647

The estimated variance of site-to-site effects is 0.014 (square of 0.1184); while the variation of the average within a site (the residual variation) is 0.0023 (square of 0.0476);

Note that the latter must be interpreted carefully as it represents the variance on the AVERAGE of the replicated quadrats and not the quadrat-to-quadrat variation. It also include the site-year interaction. This variance component represent the potential of different changes between Before and After periods depending on site. For example, in some of the sites, the effect of Period may be to increase the response, while in other sites, the effect of Period is to decrease the response (this is after accounting for random error). If this occurs, then you have serious problems in detecting an environmental impact as it not even clear how to properly define an impact when the direction of the impact is not consistent in different sites.

A naked P-value should never be reported. Regardless if the test of interaction (the BACI effect) is statistically significant or not, an estimate of the effect should should also be reported.

The estimated marginal means for each combination of SiteClass and Period is:

result.avg.lmer.emmo.SP <- emmeans::emmeans(result.avg.lmer, ~SiteClass:Period)
cat("\nEstimated marginal means for SiteClass:Period \n\n")
summary(result.avg.lmer.emmo.SP)


Estimated marginal means for SiteClass:Period 

 SiteClass Period emmean     SE   df lower.CL upper.CL
 Control   Before   3.52 0.0903 1.15     2.67     4.37
 Impact    Before   3.26 0.1280 1.15     2.06     4.46
 Control   After    3.37 0.0903 1.15     2.52     4.22
 Impact    After    2.99 0.1280 1.15     1.79     4.19

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

An estimate of the BACI contrast that is independent of the internal parameterization used can be obtained by looking at the Period:SiteClass marginal mean estimates above. The BACI effect is the difference in the differences, i.e.,

\[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\]

The estimate is found by substituting in the estimated marginal means or \[\widehat{BACI}=3.37 - 3.52 - 2.99 + 3.26\]

The estimates of the BACI contrast is available form the summary() function applied to the expected marginal means:

# Estimate the BACI contrast along with a se
summary(contrast(result.avg.lmer.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE)

 contrast estimate     SE df lower.CL upper.CL t.ratio p.value
 baci       -0.119 0.0825  1    -1.17    0.929  -1.445  0.3854

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

Note that the P-values for the test of the interaction (the BACI effect) are all identical (as they must be) regardless of the parameterization used.

We can also get the standard profile plot on the logarithmic scale:

profileplot <- ggplot(data=report, aes(x=Period, y=mean, 
                    group=Site, color=SiteClass, shape=Site))+
  ggtitle("Profile plot of crab density")+
  ylab("Density with mean and 95% ci")+
  geom_point(position=position_dodge(w=0.1))+
  geom_line(position=position_dodge(w=0.1))+
  geom_errorbar(aes(ymax=ucl, ymin=lcl), width=0.1, position=position_dodge(w=0.1))+
  geom_point(data=crabs, aes(y=logDensity), position=position_dodge(w=0.2))
profileplot

The profile plot shows that the mean responses are almost parallel – it is no wonder no BACI effect was detected.

1.5.2.4 Using ANOVA using the individual data values

Finally, it is possible to do an ANOVA using the individual values. This will correctly account for the unbalance in the number of quadrats sampled at each site-year combination (and so the P-values will be slightly different than the above analyses based on averages). The advantage of this analysis method is that it also produces estimates of all of the variance components that can then be used for a power analysis.

The analysis of the raw data is a variant of a split-plot design with sub-sampling (the quadrats). The statistical model will include terms for the site level (the random effects of site) and also a term for the possible site-year interaction that was subsumed in to the residual error in the previous analysis.

The model is

\[\mathit{log(Density)} = \mathit{SiteClass}~~ \mathit{Site(R)}~~\mathit{Period}~~\mathit{SiteClass:Period}~~\mathit{Site:Period(R)}\] where

SiteClass is the classification of sites as either Impact or Control;
Site(R) is the random site effect within SiteClass sites. As noted in previously, if you label each site with a unique label, it is not necessary to use the nesting notation of Site(SiteClass)(R). The sites are considered a random effect because we wish to make inference about the impact over all sites in the study area.
Period is the classification of time as Before or After,
SiteClass:Period term represent the BACI contrast of interest, and the
Site:Period term represents non-parallel changes in the SITES over the two periods.

We hope that this variance component will be negligible – otherwise, it indicates that the effect of Before and After varies among the sites as well as between the Control and Impact classifications. This is generally not good news.

We use the lmer() function from the lmerTest package.

result.all.lmer <- lmerTest::lmer(log(Density) ~ SiteClass+Period+SiteClass:Period +
                 (1|Site) + (1|Site:Period), data=crabs)

The anova() function applied to the result from the fit, gives the effect tests.

anova(result.all.lmer, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
                   Sum Sq  Mean Sq NumDF  DenDF F value Pr(>F)
SiteClass        0.058088 0.058088     1 1.0204  4.0892 0.2885
Period           0.275358 0.275358     1 1.2061 19.3844 0.1096
SiteClass:Period 0.021130 0.021130     1 1.2061  1.4875 0.4095

Because the number of quadrats was not identical in all site-year combinations, the degrees of freedom can be fractional. Also notice that the P-value is slightly different from the previous results (but very similar). As long as the imbalance in the number of quadrats is small, the analyses on the averages as presented earlier, should not be affected much. The P-value for the test of interaction is p = 0.409 and there is no evidence of a BACI effect.

Again, regardless of the P-value, an estimate of the BACI effect should be obtained. The reported parameter estimates or the indicator parameterization estimates should not be used directly (unless you are very sure of what they mean!), and again construct the BACI contrast based on the expected marginal means:

We estimate the marginal means for the four combinations of SiteClass and Period.

result.all.lmer.emmo.SP <- emmeans::emmeans(result.all.lmer, ~SiteClass:Period)
cat("\nEstimated marginal means for SiteClass:Period \n\n")
summary(result.all.lmer.emmo.SP)


Estimated marginal means for SiteClass:Period 

 SiteClass Period emmean     SE   df lower.CL upper.CL
 Control   Before   3.52 0.0934 1.12     2.60     4.45
 Impact    Before   3.26 0.1360 1.28     2.21     4.32
 Control   After    3.37 0.0958 1.22     2.57     4.17
 Impact    After    2.99 0.1340 1.18     1.80     4.19

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

The BACI contrast is:

\[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\]

and

\[\widehat{BACI}=3.37 - 3.52 - 2.99 + 3.26\]

The estimated BACI effect is:

# Estimate the BACI contrast along with a se
summary(contrast(result.all.lmer.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE)

 contrast estimate    SE   df lower.CL upper.CL t.ratio p.value
 baci       -0.117 0.096 1.21   -0.939    0.705  -1.220  0.4095

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

The VarCorr() function applied to the result from the fit gives the variance components:

# Extract the variance components
vc <- VarCorr(result.all.lmer)
vc

 Groups      Name        Std.Dev.
 Site:Period (Intercept) 0.00000 
 Site        (Intercept) 0.12274 
 Residual                0.11919

The estimated site-to-site variation is 0.02 (square of 0.12); the estimated interaction of sites and Period is 0 (square of 0); This is small which is good, because it indicates that the response of the sites over time is fairly consistent. If this value were large, it would throw into doubt the whole BACI paradigm of this experiment. Finally, the residual variance component of 0.01 (square of 0.12); measures the quadrat-to-quadrat variation within each site-year combination.

1.5.2.5 Model assessment

Regardless of the model chosen, you should carefully examine the data for outliers and unusual points. If there are outliers, and the data was incorrectly entered, then, of course, the data should be corrected. If the outlier is “real”, then you may wish to run the analysis with and without the outlier present to see what effect it has. If the final conclusions are the same, then there isn’t a problem. However, if inclusion of the outlier greatly distorts the results, then you have a problem, but it is not something that statistical theory can help with. If the data point is real, it should be included!

Here is the model assessment plots from the final model on the raw data:

diagplot <- sf.autoplot.lmer(result.all.lmer)
plot(diagplot)

If you have a large number of sites, it is possible to estimate the site-residuals and plots of these may also be informative. In many cases, the number of sites is small so these are less than informative.

The assumption of Normality is often made. Unfortunately, with small sample sizes, it is very difficult to assess this assumption. Fortunately, ANOVA is fairly robust to non-Normality if the standard deviations are roughly equal in each site-year combination. A transformation (e.g., the \(\ln()\) transformation) may be required.

1.6 BACI with one Impact/Control site; Multiple years before/after

1.6.1 Overview

The previous section looked at designs with only a single year of measurement before and after the impact. This design can be improved by also monitoring for several years after and before the impact. While the term “year” is used generally in these notes, the same methods can be used when sampling takes place more frequently than years. For example, samples could be taken every season which would add additional structure to the data.

Once additional years of monitoring are used, there is now a concern about the potential type of responses that can occur following impact. For example, the impact could cause a permanent step change in the impact site(s) that persists for many years. Or, the impact could cause a temporary change in the response variable that slowly returns to pre-impact levels.

The simplest case is that of a permanent step change after impact that persists for several years. It is implicitly assumed that the relationship between the Control and Impact areas is consistent in the pre-impact years.

For these types of studies, the most sensible design is the BACI-paired design that was outlined earlier. Here, permanent monitoring sites are established in each of the Control and Impact areas. All sites are measured each year. The analyzes presented below can deal with cases where not all sites are measured in all years as long as the missing data is missing completely at random (MCAR). This implies that the missingness of measurement at a site is not related to the response or any other covariate in the study. The year effects are assumed to operate in parallel across all sites. This assumptions can be relaxed if you have multiple sites and every site is measured in every year.

We will also allow for sub-sampling at each site-year combination and will assume that all sub-samples are independent both within and across years. A case where this is not true, would be the use of permanent sub-sampling locations within each impact and control site. Please contact me for more details on this design.

Conceptually this is diagrammed as:

Note that the BACI contrast looks at changes in the mean response between the Before and After periods.

As before, you should enter the data carefully using unique labels for every distinct experimental or observational unit. For example, if separate quadrats are measured within each site-year combination, then each quadrat should have a different label. So if five quadrats are measured at site I1 (the Impact site) and at the two Control sites (C1 and C2), for two years both before and after the impact, there there are \(5 \times 3 \times 2 \times 4 =120\) quadrats and they should be labelled Q1, Q2, \(\ldots\) Q120, rather than Q1 to Q5 within each site-year combination. This latter (but common) labelling is confusing because the data appears to be coming from a design with permanent sub-sampling quadrats. Even more confusing, how can quadrat Q1 be sampled from two different sites?

The key to the analysis of these experiment is to recognize the different sizes of experimental and observational units in the impact designs. At the top level, the multiple sites within the Control and Impact areas are again the starting of a split-plot type of design. The effect of year is what is known in statistical parlance as a strip as it “operates” across all sites regardless if it is a Impact or Control site. The quadrats measured in each site-year combination are again treated as pseudo-replicates.

1.6.2 Example: Counting fish

1.6.2.1 Introduction

This is the example that appears in Smith (2002, Table 6). Samples of fish were taken for a period of 12 months before and 13 months after a nuclear power plant began operations. The power plant is cooled by water that is drawn from a river. When the water exits the plant, its temperature is elevated. The concern is that the warmed water will adversely affect the abundance and composition of fish below the plant. It wasn’t clear from Smith (2002) what the response measure is, so I’ll arbitrarily assume that it is counts of fish.

A schematic of the design is:

except that for this example, there is only one reading per site per year rather than multiple readings.

As before, we can identify the Period effect (Before vs. After), SiteClass effect (Impact vs. Control) and the BACI effect (the interaction between Period and SiteClass). There are also several sources of variation:

The quadrat-to-quadrat variation represents localized effects within each site. The Year-to-Year variation represent year-specific factors that cause the response to move higher or lower at both site classes. For example, a very cool year, could cause productivity of a population to be lower than normal in both the Impact and Control sites. Again, the site-year interaction variance component represents the non-parallel behavior of sites over time. We expect that year-specific factors to shift the means in the sites up and down in parallel, but not all sites respond in the same way.

Again note that if only a single measurement is taken in each site-year combination then the quadrat-to-quadrat and the site-year interaction variance components are completely confounded together and cannot be separated.

The raw data is available in the baci-fish.csv file available at the Sample Program Library at <../../Stat-Ecology-Datasets-Revised>. The data are imported in the usual way and part of the data is shown below:

setwd(file.path("../../Stat-Ecology-Datasets-Revised/BACI/Fish"))
fish <- read.csv("baci-fish.csv", header=TRUE, as.is=TRUE, strip.white=TRUE)

fish$trt <- interaction(fish$SiteClass, fish$Site, fish$Period)
fish$Period        <- factor(fish$Period)
fish$SiteClass     <- factor(fish$SiteClass)
fish$Site          <- factor(fish$Site)
fish$SamplingTimeF <- factor(fish$SamplingTime)
fish$trt           <- factor(fish$trt)

fish$logCount      <- log(fish$Count+ .5) # add 1/2 of smallest postive count

fish[1:10,]

   SamplingTime Year Period SiteClass Site Count               trt
1             1   75 Before   Control   C1     1 Control.C1.Before
2             1   75 Before    Impact   I1     1  Impact.I1.Before
3             2   75 Before   Control   C1     0 Control.C1.Before
4             2   75 Before    Impact   I1     1  Impact.I1.Before
5             3   75 Before   Control   C1     0 Control.C1.Before
6             3   75 Before    Impact   I1     0  Impact.I1.Before
7             4   75 Before   Control   C1     2 Control.C1.Before
8             4   75 Before    Impact   I1     6  Impact.I1.Before
9             5   75 Before   Control   C1    47 Control.C1.Before
10            5   75 Before    Impact   I1   103  Impact.I1.Before
   SamplingTimeF   logCount
1              1  0.4054651
2              1  0.4054651
3              2 -0.6931472
4              2  0.4054651
5              3 -0.6931472
6              3 -0.6931472
7              4  0.9162907
8              4  1.8718022
9              5  3.8607297
10             5  4.6395716

In this study there was one (permanent) site measured in each of the Control and Impact areas and the site is labelled as C1 and I1 respectively. At each site, monthly (paired) readings were taken at sampling times \(1, \ldots, 25\) denoted by the sampling time variable. Notice that in the pre-impact year, all months in one year were measured, while in the post-impact years sampling spans two years. The actual months of sampling were not given by Smith (2002), so without further assumptions it is not possible to use the calendar months as another covariate (e.g., counts in January are expected to be different than those in July in some systematic fashion).

No sub-sampling was conducted so it is not necessary to average over the sub-samples.

As only one site is measured for each of the Control and Impact areas, this experiment is an example of pseudo-replication and so conclusions will be limited to inference about these two sites only.

As recommended in Section 1.2, the analysis is performed on the \(\log{Count+.5}\). A value of 0.5 is added to avoid taking logarithms of zero. The usual rule is to add 1/2 of the smallest non-zero value.

We being by a simple plot of the data over time.

prelimplot <- ggplot(data=fish, aes(x=SamplingTime, y=logCount,
                          group=Site, color=SiteClass, shape=Site))+
  ggtitle("Fish counts over time")+
  geom_point()+
  geom_line()+
  geom_vline(xintercept=-0.5+min(fish$SamplingTime[as.character(fish$Period) == "After"]))
prelimplot

Notice that the two curve track each other over time indicating that the pairing by month will be effective in accounting for the natural variation over the seasons.

As in the previous section, there are several (equivalent) ways to analyze this data.

1.6.2.2 Analysis of the differences

This is an example of simple paired-design where the sampling times induce the pairing in the observations across the sites. We start by reshaping the data from the long- to the wide-format to create two columns - one for the Impact and one for the Control site.

Use the pivot_wider() function in the tidyr package to bring the two values taken in the same month on the same record and then compute the difference in the count between the two periods.

# Convert file from long- to wide-format
# to get the Control and Impact measurements on the same line
# and compute the difference
fish.month <- tidyr::pivot_wider(fish,
                                 id_cols=c("SamplingTime","Period"),
                                 names_from="SiteClass",
                                 values_from="logCount")
fish.month$diff <- fish.month$Impact - fish.month$Control
fish.month[1:10,]

# A tibble: 10 × 5
   SamplingTime Period Control Impact    diff
          <int> <fct>    <dbl>  <dbl>   <dbl>
 1            1 Before   0.405  0.405  0     
 2            2 Before  -0.693  0.405  1.10  
 3            3 Before  -0.693 -0.693  0     
 4            4 Before   0.916  1.87   0.956 
 5            5 Before   3.86   4.64   0.779 
 6            6 Before   4.15   3.60  -0.554 
 7            7 Before   4.36   1.87  -2.49  
 8            8 Before   2.80   4.97   2.16  
 9            9 Before   4.97   4.98   0.0138
10           10 Before   3.35   2.25  -1.10

We start with a plot of the differences on the logarithmic scale over the sampling times:

plotdiff <- ggplot(data=fish.month, aes(x=SamplingTime, y=diff))+
  ggtitle("Plot of differences over time")+
  ylab("Difference (Impact-Control)")+
  geom_point()+
  geom_line()+
  geom_vline(xintercept=-0.5+min(fish$SamplingTime[as.character(fish$Period) == "After"]))
plotdiff

We notice that there is very large swing in the difference in the Before period. This is “unfortunate” as this will give rise to a tremendous residual variation and make it difficult to detect effects.

A two-sample t-test is used to compare the mean difference (on the logarithmic scale) between the two Periods.

Use the t.test() function to test if the mean difference is the same in the Before and After Periods. This is the unequal-variance t-test which is preferred over the equal-variance t-test.

result <- try(t.test(diff ~ Period, data=fish.month),silent=TRUE)
if(class(result)=="try-error")
     {cat("Unable to do unequal variance t-test because of small sample size\n")} else
     { 
   result$diff.in.means <- sum(result$estimate*c(1,-1))
   names(result$diff.in.means)<- "diff.in.means"
   print(result)
   cat("Mean difference in log(Count) :,",result$diff.in.means," (SE ", result$stderr,")\n")
     }


    Welch Two Sample t-test

data:  diff by Period
t = 0.28689, df = 20.742, p-value = 0.777
alternative hypothesis: true difference in means between group After and group Before is not equal to 0
95 percent confidence interval:
 -0.8284787  1.0934134
sample estimates:
 mean in group After mean in group Before 
           0.4418295            0.3093621 

Mean difference in log(Count) :, 0.1324674  (SE  0.4617297 )

If the equal-variance t-test is used it replicates the results from ANOVA in the next section exactly.

The P-value is p = 0.777 which provides no evidence of a difference in mean of the differences between the two periods (the BACI contrast). The estimate of the BACI contrast (with se) is also provided.

The large differences in some months is worrisome – this indicates perhaps a lack of fit of this model. If a non-parametric test is done, it also fails to detect a difference:

Use the wilcox.test() function for a non-parametric test (Wilcoxon Rank Sum Test) to test if the median difference is the same in the Before and After Periods.

result <- wilcox.test(diff ~ Period, data=fish.month, conf.int=TRUE)
result


    Wilcoxon rank sum test with continuity correction

data:  diff by Period
W = 81, p-value = 0.8916
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -0.8611464  1.0985632
sample estimates:
difference in location 
            0.02032091

1.6.2.3 ANOVA on the individual data values

The model for the raw data is

\[ \log{(Count+.5)} = \mathit{SiteClass}~~~\mathit{Period}~~~\mathit{SiteClass:Period}~~~\mathit{SampleTime(R)} \] where the terms

SiteClass represents the Control vs. Impact effect;
Period represents the Before vs. After effect;
SiteClass:Period represents the BACI effect, i.e., the non-parallel response.

Because there is only one site measured in each of the Control or Impact areas, the site-effects are completely confounded with the SiteClass effects (as happens with pseudo-replicated data). Consequently, it is not necessary to enter the Site effect into the model. If Site is entered as a random effect, it will be displayed with 0 degrees of freedom. We do have multiple times measured before and after the impact. If the Sampling Times use unique labels, we can once again enter a random effect for SamplingTimes(R) directly into the model. Alternatively you could enter SamplingTimes (Period)(R) as the term in the model if the sampling times were not uniquely labelled.

We use the lmer() function from the lmerTest. You must insure that the numeric variable SamplingTime is defined as a factor and not as a continuous covariate.

# Because there is ONLY one measurement per year, the SamplingTime*Site and
# residual variance are total confounded and cannot be separated. This is 
# the residual term.
result.lmer <- lmerTest::lmer(log(Count+.5) ~ SiteClass+Period+SiteClass:Period +
                                     (1|SamplingTimeF),
                                        data=fish)

The tests for the fixed effects are:

anova(result.lmer, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
                  Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
SiteClass        1.76058 1.76058     1    23  2.7024 0.1138
Period           0.11746 0.11746     1    23  0.1803 0.6751
SiteClass:Period 0.05475 0.05475     1    23  0.0840 0.7745

The interaction term is the BACI contrast and the large p-value presents no evidence of an impact.

Never report a naked P-value. The estimated expected marginal means for the Period:SiteClass interaction to estimate the BACI contrast are:

result.lmer.emmo.SP <- emmeans::emmeans(result.lmer, ~SiteClass:Period)
cat("\nEstimated marginal means \n\n")
summary(result.lmer.emmo.SP)


Estimated marginal means 

 SiteClass Period emmean    SE   df lower.CL upper.CL
 Control   After    1.64 0.532 27.4    0.551     2.73
 Impact    After    2.08 0.532 27.4    0.993     3.18
 Control   Before   2.02 0.554 27.4    0.884     3.16
 Impact    Before   2.33 0.554 27.4    1.193     3.47

Degrees-of-freedom method: kenward-roger 
Results are given on the log(mu + 0.5) (not the response) scale. 
Confidence level used: 0.95

The BACI contrast is found as:

\[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\]

The estimates of the BACI contrast is available form the summary() function applied to the expected marginal means corresponding to the interaction effect:

# Estimate the BACI contrast along with a se
summary(contrast(result.lmer.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE)

 contrast estimate    SE df lower.CL upper.CL t.ratio p.value
 baci       -0.132 0.457 23    -1.08    0.813  -0.290  0.7745

Note: contrasts are still on the log(mu + 0.5) scale. Consider using
      regrid() if you want contrasts of back-transformed estimates. 
Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

The VarCorr() function applied to the result from the fit gives the variance components:

# Extract the variance components
vc <- VarCorr(result.lmer)
vc

 Groups        Name        Std.Dev.
 SamplingTimeF (Intercept) 1.74142 
 Residual                  0.80714

As seen by the plot, there is huge residual variation over and above the variation among sampling times. This large residual variation is what makes it difficult to detect any impact. From the previous analysis, much of this huge residual variation comes from a single sampling time where a very large difference was seen.

Notice that the sampling times do have an additional structure. For example, it appears that counts of fish vary seasonally. It is possible to fit a model that accounts for this seasonality, e.g., if the actual month of the year was known, but this simple paired analysis will remove the effects of this structure, so this more complicated analysis is not needed.

1.6.2.4 Model assessment

Model assessment plots are;

diagplot <- sf.autoplot.lmer(result.lmer)
plot(diagplot)

If you extract the residual and plot them, there impact of these large residual is clear. There is some evidence of non-normality in the residuals which isn’t too surprising given the huge swing in the differences seen in the earlier analysis.

1.6.3 Example: Counting chironomids

This example comes from Krebs (1999), Ecological Methodology, 2nd Edition, Box 10.3.

Estimates of chironomid (a type of invertebrate (an ohgochaste worm) present in aquatic sediments) abundance in sediments were taken at one station above and one below a pulp mill outflow pipe for 3 years before plant operation and six years after the plant started.

The raw data is available in the baci-chironomic.csv file available at the Sample Program Library at <../../Stat-Ecology-Datasets-Revised>. The data are imported into in the usual way and part is shown below:

setwd(file.path("../../Stat-Ecology-Datasets-Revised/BACI/Chironomid"))
cat(" BACI design measuring chironomid counts with multiple (paired) yearly measurements before/after \n\n")
chironomid <- read.csv("baci-chironomid.csv", header=TRUE, as.is=TRUE, strip.white=TRUE)
cat("Listing of part of the raw data \n")
head(chironomid)

 BACI design measuring chironomid counts with multiple (paired) yearly measurements before/after 

Listing of part of the raw data 
  Year Control.Site Impact.Site Period
1 1988           14          17 Before
2 1989           12          14 Before
3 1990           15          17 Before
4 1991           16          21  After
5 1992           13          24  After
6 1993           12          20  After

In this study there was one (permanent) site measured in each of the Control and Impact areas. At each site, yearly (paired) readings were taken with three baseline years and five post-baseline years. No sub-sampling was conducted at each year so it is not necessary to average over the sub-samples.

As recommended in Section 1.2, the analysis is performed on the \(\log{Density}\).

We being by a simple plot of the data over time after converting from wide- to long-format:

# We  change the data from wide to long format
chironomid.long <- tidyr::pivot_longer(chironomid, 
                                       cols=c("Control.Site","Impact.Site"),
                                       names_to = "SiteClass",
                                       values_to="Count")
chironomid.long$SiteClass     <-  factor(chironomid.long$SiteClass)
chironomid.long$YearF         <-  factor(chironomid.long$Year)
chironomid.long$Period        <-  factor(chironomid.long$Period)

chironomid.long$logCount      <- log(chironomid.long$Count + .5)

head(chironomid.long)

# A tibble: 6 × 6
   Year Period SiteClass    Count YearF logCount
  <int> <fct>  <fct>        <int> <fct>    <dbl>
1  1988 Before Control.Site    14 1988      2.67
2  1988 Before Impact.Site     17 1988      2.86
3  1989 Before Control.Site    12 1989      2.53
4  1989 Before Impact.Site     14 1989      2.67
5  1990 Before Control.Site    15 1990      2.74
6  1990 Before Impact.Site     17 1990      2.86

prelimplot <- ggplot(data=chironomid.long,
                     aes(x=Year, y=logCount, group=SiteClass, color=SiteClass))+
  ggtitle("Fish counts over time")+
  geom_point()+
  geom_line()+
  geom_vline(xintercept=-0.5+min(chironomid.long$Year[as.character(chironomid.long$Period) == "After"]))
prelimplot

Notice that the two curve track each other over time prior to the impact event, and then the shape are generally the same after the impact event indicating that the pairing by year will be effective in accounting for the natural variation over the years.

There are two (equivalent) ways to analyze this data. As recommended in Section 1.2, the analysis is performed on the \(\log{Density}\).

1.6.3.1 Analysis of the differences

This is an example of simple paired-design where the sampling times (years) induce the pairing in the observations across the sites. We find the difference (on the logarithmic scale) between the Impact and Control readings and then plot of the differences over time:

chironomid$diff <- log(chironomid$Impact.Site+.5) - log(chironomid$Control.Site+.5)
head(chironomid)

  Year Control.Site Impact.Site Period      diff
1 1988           14          17 Before 0.1880522
2 1989           12          14 Before 0.1484200
3 1990           15          17 Before 0.1213609
4 1991           16          21  After 0.2646926
5 1992           13          24  After 0.5959834
6 1993           12          20  After 0.4946962

We plot the difference on the logarithmic scale over time:

plotdiff <- ggplot(data=chironomid, aes(x=Year, y=diff))+
  ggtitle("Plot of differences over time")+
  ylab("Difference (Impact-Control)")+
  geom_point()+
  geom_line()+
  geom_vline(xintercept=-0.5+min(chironomid$Year[as.character(chironomid$Period) == "After"]))
plotdiff

A two-sample t-test is used to compare the mean difference between the two Periods.

The Welch test (unequal variance t-test) is:

result <- try(t.test(diff ~ Period, data=chironomid),silent=TRUE)
if(class(result)=="try-error")
     {cat("Unable to do unequal variance t-test because of small sample size\n")} else
     { 
   result$diff.in.means <- sum(result$estimate*c(1,-1))
   names(result$diff.in.means)<- "diff.in.means"
   print(result)
   cat("Estimated difference in means :",result$diff.in.means, " (SE ",result$stderr, ")\n")
     }


    Welch Two Sample t-test

data:  diff by Period
t = 5.2465, df = 6.2914, p-value = 0.001664
alternative hypothesis: true difference in means between group After and group Before is not equal to 0
95 percent confidence interval:
 0.1494910 0.4054139
sample estimates:
 mean in group After mean in group Before 
           0.4300635            0.1526110 

Estimated difference in means : 0.2774524  (SE  0.05288317 )

and the traditional equal-variance t-test is:

result.ev <- t.test(diff ~ Period, data=chironomid, var.equal=TRUE)
result.ev$diff.in.means <- sum(result$estimate*c(1,-1))
names(result.ev$diff.in.means)<- "diff.in.means"
result.ev
cat("Estimated difference in means on log scale:",result.ev$diff.in.means, " (SE ",result.ev$stderr, "\n")


    Two Sample t-test

data:  diff by Period
t = 3.7933, df = 7, p-value = 0.006774
alternative hypothesis: true difference in means between group After and group Before is not equal to 0
95 percent confidence interval:
 0.1044973 0.4504075
sample estimates:
 mean in group After mean in group Before 
           0.4300635            0.1526110 

Estimated difference in means on log scale: 0.2774524  (SE  0.07314273

If the equal-variance t-test is used it replicates the results from ANOVA in the next section exactly.

The P-value is small which provides strong evidence of a difference in the mean differences between the two Periods (the BACI contrast).

The estimate of the BACI contrast is \(0.28\) (SE \(0.073\)) (from the equal-variance analysis) and (SE \(0.053\)) (from the unequal-variance analysis).

1.6.3.2 ANOVA on the raw data

The model for the raw data is \[ \log{(Count+.5)} = \mathit{SiteClass}~~~\mathit{Period}~~~\mathit{SiteClass:Period}~~~\mathit{Year(R)} \] where

SiteClass represents the Control vs. Impact effect;
Period represents the Before vs. After effect;
SiteClass:Period represents the BACI effect, i.e., the non-parallel response.

Because there is only one site measured in each of the Control orImpact areas, the site-effects are completely confounded with the SiteClass effects (as happens with pseudo-replicated data). Consequently, it is not necessary to enter the Site effect into the model. If Site is entered as a random effect, it will be displayed with 0 degrees of freedom.

We do have multiple years measured before and after the impact. If the Year variable uses unique labels, we can once again enter a random effect for Year into the model. Alternatively you could enter Year(Period)&Random as the term in the model if the years were not uniquely labelled.

We again use the lmer() function in the lmerTest. You must insure that the numeric variable Year is treated as a factor and not as a continuous covariate.

result.lmer <- lmerTest::lmer(log(Count+.5) ~ SiteClass+Period+SiteClass:Period 
                              + (1|YearF),
                data=chironomid.long)

The anova() function applied to the result from the fit, gives the effect tests.

anova(result.lmer, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
                  Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
SiteClass        0.33951 0.33951     1     7 63.4614 9.364e-05 ***
Period           0.04107 0.04107     1     7  7.6773  0.027662 *  
SiteClass:Period 0.07698 0.07698     1     7 14.3891  0.006774 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interaction term is the BACI contrast and the small P-value (p = 0.007) indicates strong evidence of an impact. Notice that the P-value from the test for no interaction from the ANOVA matches the P-value of the Period effect in the t-test assuming that the variances were equal in the two Periods.

Use the expected marginal means for the Period:SiteClass interaction to estimate the BACI contrast as in previous sections:

\[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\]

We request the expected marginal means:

result.lmer.emmo.SP <- emmeans::emmeans(result.lmer, ~SiteClass:Period)
cat("\nEstimated marginal means \n\n")
summary(result.lmer.emmo.SP)


Estimated marginal means 

 SiteClass    Period emmean     SE   df lower.CL upper.CL
 Control.Site After    2.70 0.0455 10.6     2.60     2.80
 Impact.Site  After    3.13 0.0455 10.6     3.03     3.23
 Control.Site Before   2.65 0.0643 10.6     2.50     2.79
 Impact.Site  Before   2.80 0.0643 10.6     2.66     2.94

Degrees-of-freedom method: kenward-roger 
Results are given on the log(mu + 0.5) (not the response) scale. 
Confidence level used: 0.95

These values are (indirectly) used to estimate the BACI contrast. The estimates of the BACI contrast is available form the summary() function applied to the linear model results and looking for the term in the table corresponding to the interaction effect:

# Estimate the BACI contrast along with a se
summary(contrast(result.lmer.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE)

 contrast estimate     SE df lower.CL upper.CL t.ratio p.value
 baci       -0.277 0.0731  7    -0.45   -0.104  -3.793  0.0068

Note: contrasts are still on the log(mu + 0.5) scale. Consider using
      regrid() if you want contrasts of back-transformed estimates. 
Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

baci.chironomid.baci <- as.data.frame(summary(contrast(result.lmer.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE))

The estimated mean difference-in-the-difference is \(-0.28\) (SE \(0.07\)) which is the same as found in the t-test earlier.

The estimates of the variance components are:

# Extract the variance components
vc <- VarCorr(result.lmer)
vc

 Groups   Name        Std.Dev.
 YearF    (Intercept) 0.084026
 Residual             0.073143

The year-to-year variance is comparable in size to the residual variation which implies that the year effects are about the same size as measurement error.

1.6.3.3 Model assessment

Here is the model assessment plots from the final model on the raw data:

diagplot <- sf.autoplot.lmer(result.lmer)
plot(diagplot)

There is no evidence of any problems in the diagnostic plots.

1.7 BACI with Multiple impact/control sites; Multiple years before/after

1.7.1 Introduction

This is the most complex BACI design with multiple sites and multiple years. In most cases, there is only one Impact site with multiple Control sites, but it is possible to also have multiple Impact sites.

The number of years of monitoring in the Before and After periods does not have to be equal.

The number of sub-samples taken at each site-year combination also does not have to be equal.

A major concern for these types of studies is proper replication and the avoidance of pseudo-replication (Hurlburt, 1984). Replication provides information about the variability of the collected data under identical experimental conditions so that differences in the response among experimental conditions can be compared to variation in responses within experimental. This is the fundamental principle of ANOVA.

Consider first a survey to investigate fry density at various locations on a river. A simple design may take a single sample (i.e., single fry trap) at each of 4 locations

These four values are insufficient for any comparison of the fry density across the four locations because the natural variation present in readings at a particular location is not known.

In many ecological field studies, the concepts of experimental units and randomization of treatments to experimental units are not directly applicable making “replication” somewhat problematic. Replication is consequently defined as the taking of multiple INDEPENDENT samples from a particular location. The replicated samples should be located sufficiently far from the first location so that local influences that are site specific do not operate in common on the two samples. The exact distance between samples depends upon the biological process. For example if the locations are tens of kilometers apart, then spacing the samples hundreds of meters apart will likely do for most situations. This gives rise to the following design where two SITES are sampled at each location, each with a single trap operating:

Now a statistical comparison can be performed to investigate if the mean response is equal at four locations. The key point is that the samples should be independent but still representative of that particular location. Hence, taking two samples from the exact same location, or splitting the sample in two and doing two analyses on the split sample will not provide true replication.

Consequently, a design where duplicate samples (e.g., multiple traps) are taken from the exact same location and time would be an example of pseudo-replication

Note that the data from the last two designs looks “identical”, i.e., pairs of “replicated” observations from four locations. Consequently, it would be very tempting to analyze both experiments using exactly the same statistical model. However, there is a major difference in interpretation of the results.

The second design with real replicates enables statements to be made about differences in the mean response among those four general locations. However, the third design with pseudo-replication only allows statements to be made about differences among those four particular sampling sites which may not truly reflect differences among the broader locations.

Obviously the line between real and pseudo-replication is somewhat ill-defined. Exactly how far apart do sampling sites have to be before they can be considered to be independent. There is no hard and fast rule and biological consideration and knowledge of the processes involved in the environmental impact must be used to make a judgment call.

Here is a schematic of this general BACI design:

Suppose that at each sampling point, the number of minnow captured in a trap is counted. As before, we can identify the Period (Before vs. After), SiteClass (impact vs. Control) and the BACI interaction. As well there are several sources of variation in the monitoring design that influence how much sampling and when sampling should take place.

At the lowest level is trap-to-trap variation within a single site-year combination. Not all traps will catch the exact same number of minnows. The number of minnows captured is also a function of how long the traps soak with longer soak periods leading, in general, to more minnows being captured. However, by converting to a common soak-time, this extra variation has been reduced.

Next is site-to-site variation within the reaches of the river. This corresponds to natural variation in minnow density among sites due to habitat and other ecological variation.

Third is yearly variation. The number of minnows captured within a site varies over time (e.g., 2002 vs. 2003) because of uncontrollable ecological factors. For example, one year may have more precipitation or be warmer which may affect the fry density in all locations simultaneously.

Finally, there is the site-year interaction which represents the INconsistency in how the sites respond over time. For example, not all of the Control sites respond exactly the same way across the multiple years prior to impact. Note that if only one trap is measured each year, then the trap-to-trap and the site-year interaction variance components are completely confounded together and cannot be separated.

1.7.2 Example: Fry monitoring

This example is based (loosely) on a consulting project from an Independent Power Producer who was interested in monitoring the effects of an in-stream hydroelectric project. The response variable for the project was the minnow density at different locations in the stream.

The monitoring design has the river divided into six segments of which three are upstream of the diversion and three are downstream of the diversion. In each reach, several sites have been located where minnow fry congregate. In each of the sites, minnow traps are set for various lengths of time. At the end of the soaking period, the traps are removed and the number of minnows are counted and classified by species. The counts are standardized to a common period of time to adjust for the different soak-times. [This could be done directly in the analysis by using the soak-time as a covariate, but the soak-time data was not available.]

An initial pre-project monitoring study was run in 2000 to 2002. The project became operational in late 2002, and post-project monitoring continued in 2003 and 2004.

This design is a variant of a classic Before-After-Control-Impact (BACI) design where presumably the Control sites will be located in the upper reaches and potentially Impact sites in the lower reaches.

The raw data is available in the baci-fry.csv file available at the Sample Program Library at
<../../Stat-Ecology-Datasets-Revised>. The data are imported in the usual way and part is shown below:

setwd(file.path("../../Stat-Ecology-Datasets-Revised/BACI/Fry"))
# sink('baci-fry-R-001.txt', split=TRUE)
fry <- read.csv('baci-fry.csv', header=TRUE, as.is=TRUE, strip.white=TRUE)

fry$logfry       <- log(fry$Fry)

fry$Period       <- factor(fry$Period)
fry$Site         <- factor(fry$Site)
fry$SiteClass    <- factor(fry$SiteClass)
fry$Period       <- factor(fry$Period)
fry$YearF        <- factor(fry$Year)

fry[1:12,]

    SiteClass Site Sample Year Period Fry   logfry YearF
1  Downstream    D      1 2000 Before 185 5.220356  2000
2  Downstream    E      1 2000 Before 258 5.552960  2000
3  Downstream    E      2 2000 Before 630 6.445720  2000
4  Downstream    E      3 2000 Before 116 4.753590  2000
5  Downstream    F      1 2000 Before 208 5.337538  2000
6  Downstream    F      2 2000 Before 261 5.564520  2000
7  Downstream    D      1 2001 Before 321 5.771441  2001
8  Downstream    E      1 2001 Before  65 4.174387  2001
9  Downstream    E      2 2001 Before 755 6.626718  2001
10 Downstream    F      1 2001 Before 476 6.165418  2001
11 Downstream    F      2 2001 Before 521 6.255750  2001
12 Downstream    E      1 2002 Before 198 5.288267  2002

As noted earlier (Section 1.2), there is a good reason to analyze the \(\log(fry)\) rather than the fry count directly.

It is always desirable to examine the data closely for unusual patterns, outliers, etc.

First, let construct some summary statistics on each site-year combination:

We use the plyr package:

mean.fry <- plyr::ddply(fry, c("Year","YearF","Site","SiteClass","Period"), function(x){
  n <- nrow(x)
  nmiss <- sum(is.na(x$Fry))
  mean.fry <- mean(x$Fry, na.rm=TRUE)
  sd.fry   <- sd(  x$Fry, na.rm=TRUE)
  mean.logfry <- mean(x$logfry, na.rm=TRUE)
  sd.logfry   <- sd  (x$logfry, na.rm=TRUE)
  res<- data.frame(n, nmiss, mean.fry, sd.fry, mean.logfry, sd.logfry)
  })
mean.fry[1:5,]

  Year YearF Site  SiteClass Period n nmiss mean.fry    sd.fry mean.logfry
1 2000  2000    A   Upstream Before 2     0  25.5000  30.40559    2.618221
2 2000  2000    B   Upstream Before 2     0 306.5000 167.58431    5.644266
3 2000  2000    C   Upstream Before 2     0 124.5000 105.35891    4.602664
4 2000  2000    D Downstream Before 1     0 185.0000        NA    5.220356
5 2000  2000    E Downstream Before 3     0 334.6667 265.43800    5.584090
  sd.logfry
1 1.7422073
2 0.5767497
3 0.9767137
4        NA
5 0.8464942

The design is unbalanced with some sites having from 1 to 3 samples taken and not all sites being measured in all years (e.g., site D is only measured in 2000, 2001, and 2003).

We examine the standard deviation vs the mean on the logarithmic scale:

sdmeanplotlog <- ggplot(data=mean.fry, aes(x=mean.logfry, y=sd.logfry))+
  ggtitle("SD vs Mean - logged data")+
  geom_point()
sdmeanplotlog

The standard deviation does not appear to be dependent upon the mean.

Note that the average of values on the log-scale corresponds to the logarithm of the geometric mean on the ordinary scale. The geometric mean is less sensitive to outliers and will always be less than the arithmetic mean. Differences on the log-scale correspond to RATIOS on the anti-log scale. For example, if the counts of fry are reduced by a factor of 2 (i.e., halved), this corresponds to a subtraction on the log-scale by a value of 0.69.

Next look at the plot of the mean log(fry) density over time.

meanplot <- ggplot(data=mean.fry, aes(x=Year, y=mean.logfry,
                  group=Site, color=SiteClass, shape=Site))+
  ggtitle("Changes in mean over time")+
  geom_point(position=position_dodge(w=0.2))+
  geom_line(position=position_dodge(w=0.2))+
  geom_vline(xintercept=-0.5+min(mean.fry$Year[as.character(mean.fry$Period) == "After"]))
meanplot

There is no obvious evidence of an impact of the diversion. There is some evidence of a year-to-year effect with the curve generally moving together (e.g., the downward movement in 2004 in all sites).

1.7.2.1 Analysis of the averages

We start first with the analysis of the average log(fry) count from each site-year. The number of traps monitored in each year varies from 1 to 3 so the results from this analysis will not match precisely those from the analysis of the raw data but the results should be “close enough.”

The model to be fit is:

\[\mathit{MeanLogFry} = \mathit{SiteClass}~~ \mathit{Site(R)}~~\mathit{Period}~~\mathit{Year(R)}~~~\mathit{SiteClass:Period}\] where

SiteClass is the classification of sites as either Impact or Control;
Site(R) is the random site effect within SiteClass. As noted in previous chapters, if you label each site with a unique label, it is not necessary to use the nesting notation of Site(SiteClass)(R). The sites are considered a random effect because we wish to make inference about the impact over all sites in the study area.
Period is the classification of time as Before or After,
Year(R) represents the multiple years within each Period. As noted in previous chapters, if you label each year with a unique label, it is not necessary to use the nesting notation of Year(Period)(R) The years are considered a random effect because we wish to make inference about the impact over all years in the study area.
SiteClass:Period term represent the BACI contrast of interest.

The response variable is the mean log(fry) in each site-year. The fixed effects are SiteClass and Period and their interaction (which is the BACI contrast of interest).

When ever there are more than one random effects in a model, the usual practice is to enter the interaction of the random effects as well (the Site:Year interaction). This term allows the site effect to vary among years. If there is a large site-by-year interaction, this is worrisome as it implies that the effects of years are not consistent over time and vice verso. In this case, because we are dealing with averages, we don’t need to add such a term and it will be automatically fit as the residual variance components. If you do add this term, you get exactly the same results.

We use the lmer() function in the lmerTest package. You must insure that the numeric variable Year is declared a Factor and not as a continuous covariate.

# Notice that we use the YearF rather than Year in the random statements
# Because we are analyzing the mean, the Site:Year term is the residual error
result.mean.lmer <- lmerTest::lmer(mean.logfry ~ Period + SiteClass + 
                      Period:SiteClass+
                      (1|YearF)+(1|Site),
                  data=mean.fry)

The model is fit using REML a variant of maximum likelihood estimation. With balanced data, the F-tests are identical to an older method called the Expected Mean Squares (EMS) method. Modern statistical theory prefers REML over the EMS methods.

The REML method comes with two variants. In one variant (the default with JMP), estimated variance components are allowed to go negative. This might seem to be problematic as variances must be non-negative! However, the same problem can occur with the EMS method, and usually only occurs when sample sizes are small and variance components are close to 0. It turns out that the F-tests are “unbiased” when the unbounded variance component option is chosen even if some of the variance estimates are negative. As noted in the JMP help files,

“If you remain uncomfortable about negative estimates of variances, please consider that the random effects model is statistically equivalent to the model where the variance components are really covariances across errors within a whole plot. It is not hard to think of situations in which the covariance estimate can be negative, either by random happenstance, or by a real process in which deviations in some observations in one direction would lead to deviations in the other direction in other observations. When random effects are modeled this way, the covariance structure is called compound symmetry.

So, consider negative variance estimates as useful information. If the negative value is small, it can be considered happenstance in the case of a small true variance. If the negative value is larger (the variance ratio can get as big as 0.5), it is a troubleshooting sign that the rows are not as independent as you had assumed, and some process worth investigating is happening within blocks.

So if a variance component is negative and has a large absolute value, this is an indication that your model may not be appropriate for the data at hand. Please contact me for assistance.

The Fixed Effect Test are:

anova(result.mean.lmer, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
                   Sum Sq  Mean Sq NumDF   DenDF F value Pr(>F)
Period           0.043595 0.043595     1  2.9797  0.2987 0.6230
SiteClass        0.232904 0.232904     1  4.0194  1.5957 0.2748
Period:SiteClass 0.019698 0.019698     1 17.1098  0.1350 0.7179

There is no evidence of a BACI effect as the P-value is very large

The BACI effect is the difference in the differences, i.e.,

\[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\] where

\(C\) is Control,
\(T\) is Impact site classification,
\(B\) is the Before Period, and
\(A\) is the After Period.

We obtain the expected marginal means:

result.mean.lmer.emmo.SP <- emmeans::emmeans(result.mean.lmer, ~SiteClass:Period)
cat("\nEstimated marginal means \n\n")
summary(result.mean.lmer.emmo.SP)


Estimated marginal means 

 SiteClass  Period emmean    SE   df lower.CL upper.CL
 Downstream After    5.42 0.519 5.73     4.14     6.71
 Upstream   After    4.64 0.512 5.47     3.36     5.92
 Downstream Before   5.62 0.497 4.98     4.34     6.90
 Upstream   Before   4.73 0.494 4.87     3.45     6.01

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

The BACI contrast can be estimated using:

# Estimate the BACI contrast along with a se
summary(contrast(result.mean.lmer.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE)

 contrast estimate    SE   df lower.CL upper.CL t.ratio p.value
 baci        -0.11 0.299 17.1   -0.739     0.52  -0.367  0.7179

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

The BACI estimate is small relative to the standard error which is not surprising given that the hypothesis of no BACI effect was not rejected.

The estimated variance components can be obtained.

# Extract the variance components
vc <- VarCorr(result.mean.lmer)
vc

 Groups   Name        Std.Dev.
 Site     (Intercept) 0.78987 
 YearF    (Intercept) 0.24618 
 Residual             0.38205

The variance components show that site-to-site VARIANCEs (the square of the standard deviations reported above) are about \(10 \times\) larger than year-to-year VARIANCES (see the earlier plot) and about \(4 \times\) that of the residual variation (of the averages). Because we are dealing with the averages, the residual variation is a composite of the Site:Year interaction and cage-to-cage variation (see next section).

As always, don’t forget to look at the diagnostic plots:

diagplot <- sf.autoplot.lmer(result.mean.lmer)
plot(diagplot)

1.7.2.2 Analysis of the raw data

The raw data can also be analyzed directly. The model to be fit is:

\[\mathit{LogFry} = \mathit{SiteClass}~~ \mathit{Site(R)}~~\mathit{Period}~~\mathit{Year(R)}~~~\mathit{SiteClass:Period}~~\mathit{Site:Year(R)}\] where

SiteClass is the classification of sites as either Impact or Control;
Site(R) is the random site effect within SiteClass. As noted in previously, if you label each site with a unique label, it is not necessary to use the nesting notation of Site(SiteClass)(R). The sites are considered a random effect because we wish to make inference about the impact over all sites in the study area.
Period is the classification of time as Before or After,
Year(R) represents the multiple years within each Period. As noted in previous chapters, if you label each year with a unique label, it is not necessary to use the nesting notation of Year(Period)(R). The years are considered a random effect because we wish to make
SiteClass:Period term represent the BACI contrast of interest.

Because subsampling occurs at each combination of site and year, so need a “interaction” of the two random effects, i.e., Site:Year(R).

We use the lmer() function in the lmerTest. You must insure that the numeric variable Year is treated as a factor and not as a continuous covariate. Notice that we need three random effects and that the Site:Year is NOT an interaction between the two random effects, but a separate random effect for each site-year combination to allow for the multiple cages (subsamples) within the site-year combination.

# Notice that we use the YearF variable in the random effects
result.lmer <- lmerTest::lmer(logfry ~ Period + SiteClass + Period:SiteClass +
                      (1|YearF)+ (1|Site)+(1|YearF:Site), data=fry)

The Fixed Effect Tests again show no evidence of a BACI effect:

anova(result.lmer, ddf="Kenward-Roger")

Type III Analysis of Variance Table with Kenward-Roger's method
                  Sum Sq Mean Sq NumDF   DenDF F value Pr(>F)
Period           0.10071 0.10071     1  3.0855  0.1924 0.6898
SiteClass        0.90997 0.90997     1  4.0254  1.7386 0.2573
Period:SiteClass 0.00662 0.00662     1 16.1019  0.0126 0.9119

The BACI contrast is:

\[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\] where

\(C\) is Control,
\(T\) is Impact site classification,
\(B\) is the Before Period, and
\(A\) is the After Period

The expected marginal means are found in the usual way.

result.lmer.emmo.SP <- emmeans::emmeans(result.lmer, ~SiteClass:Period)
cat("\nEstimated marginal means for SiteClass:Period \n\n")
summary(result.lmer.emmo.SP)


Estimated marginal means for SiteClass:Period 

 SiteClass  Period emmean    SE   df lower.CL upper.CL
 Downstream After    5.48 0.537 6.19     4.17     6.78
 Upstream   After    4.62 0.524 5.74     3.32     5.91
 Downstream Before   5.63 0.506 5.05     4.33     6.92
 Upstream   Before   4.72 0.493 4.65     3.42     6.02

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

The estimated BACI effect is found as:

# Estimate the BACI contrast along with a se
summary(contrast(result.lmer.emmo.SP, list(baci=c(1,-1,-1,1))), infer=TRUE)

 contrast estimate    SE   df lower.CL upper.CL t.ratio p.value
 baci      -0.0476 0.423 16.1   -0.945     0.85  -0.112  0.9119

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

Despite the value appears to be `smaller’ than that obtained by the analysis of the averages, the difference is just an artefact of the differing number of samples taken in some years.

The estimated variance components are found in the usual way.

# Extract the variance components
vc <- VarCorr(result.lmer)
vc

 Groups     Name        Std.Dev.
 YearF:Site (Intercept) 0.00000 
 Site       (Intercept) 0.76697 
 YearF      (Intercept) 0.21415 
 Residual               0.72347

These tell a similar story as happened in the analysis of the averages. Note that the estimate of the Site:Year variance component is very small (and actually is reported as being negative by SAS or JMP if the variance components are left unbounded). Of course variances must be positive, but a negative estimate just indicates that the actual variance component is small (i.e., close to 0) and the negative value is an artefact of the data. You could re-run the model constraining the variance components to be non-negative and the results are similar.

Also note that because we have analyzed the individual measurements, we have separated the residual variance component (from the analysis of the averages in the previous section) into its Site:Year interaction and cage-to-cage variation components. Ignoring for now the problem of negative variance components, you see that the residual variance from the analysis of the averages (\(0.38^2\)) is a composite of Site:Year and cage-to-cage (residual) variation. It is approximately equal to the Site:Year variance component when fitting the individual values and the variance of the cage averages (based on an average of 1.85 cages per year) or \(0.38^2 \approx 0^2 + 0.72^2/1.85 =0.53^2\)

1.8 Closing remarks on the analysis of BACI studies

The analysis of BACI experiments can be complex because of the various features of the design. A complete discussion is available in

Underwood, A. J. (1992).
On beyond BACI: the detection of environmental impacts on populations in the real, but variable, world. Journal of Experimental Marine Biology, 161, 145-178.

There are numerous variants of BACI designs depending on the number of control/impact sites, the number of years sampling before or after the disturbance, and the number of samples taken in each site-year combination. A schematic of a general design is:

In this design, two sites in each of the control and impact areas will be sampled over time. It is not necessary that the number of sites be equal in the control and impacted areas. Each site will be measured for three years before the disturbance occurs at the impacted sites, and three years after the disturbance occurs. The number of years measured before and after disturbance also does not have to be equal. In each site-year combination, four measurements are taken (e.g., multiple quadrats; multiple fish density measurements, etc.) and these are assumed to be taken independently in each site-year combination.

Notice that the sites all have unique labels, rather than using the old-fashioned notation of site\(_1\) (control) and site\(_1\) (impact) where the numbering of the sites is no longer unique. Similarly, the years are also given unique labels for the same reason.

We are assuming that the response is a change in the mean that remains in effect after the environmental impact:

Here the effect of the disturbance is a step change that changes the mean response in the impacted site between the Before and After periods. The control sites are allowed to also have a shift in the mean due to unexplained temporal effects (e.g., climate change), but the shift in the mean for impacted site is different than the shift in the mean for the control sites. This differential change is the BACI effect which is the interaction between temporal change and site class in the statistical model. Both Underwood (1992) and Underwood (1994) considered situations where the response is not a step change, but might be, for example, a pulse change of one year immediately after impact. These are not explored in this section of the notes. Notice that all sites are measured in every year before and after the disturbance.

There are several sources of variation:

which make the analysis more difficult than simple experiments.

This general BACI design actually combines several designs with different sizes of experimental units. First consider the difference between the mean at the control and impacted sites. T his could be due to a number of factors. If the BACI design looks at the impact of a power plant on the productivity of a stream, the control sites could be on a different streams where the productivity is naturally different. We assume that the sites actually chosen are a random sample from all possible sites in the impacted or control areas – the discussion where we treat sites a fixed effects so that inference is restricted only to these actual sites measured is beyond these notes. This is an example of a strip-plot design, where the sites are “stripped” across the years:

Some case is needed about choosing sites so that the same spatial variability is present among sites within the control and impacted areas. For example, a watershed may have two separate streams selected for the effect of an in-river hydroelectric system and two separate control streams. Presumably, the spatial variation among the streams in both site classes is now the same. However, a design may have a single stream where a power-plant will be built. Measuring two sites on this single stream and then using two separate streams for control will now mix spatial variations in the impact and control site classes. It would be better in this case to treat the two sites measured on the single impacted stream as pseudo-replicate measurements on that single stream. Underwood (1992, 1994) considered designs where the spatial scale varied among the control sites – please contact me for help with such designs.

The analysis of variance table for the effect of SiteClass (i.e., Control vs. Impact) is:

Source	df	F.ratio
SiteClass	1	MS(SiteClass)/MS(Site)
Site	2
Total	3

The degrees-of-freedom for tests of Site effects is found as the sum (of the number of sites in each site class -1) or in this case \(2=(2-1) + (2-1)\). We are not normally interested in difference in the mean between the impacted and control sites.

Consider the Period effects (i.e., Before vs. After). Here is where we run into the first problem with BACI designs (and many related designs). The Period factor operates on a block of years and there is technically NO replicated experimental units for the period effect:

Technically, the years sampled within each Period are sub-samples (pseudo-replicates). Consequently, there is no formal statistical test available for the effect of Period unless you are willing to make some strong assumptions. For example, you could imagine long term cycles in the mean response that just happen to coincide with the stated before and after periods which would show up as a Period effect, but really are just natural variation over time. Fortunately, interest does not lie in the Period effects, so the lack of a formal statistical test is not a concern. Regardless of the source of period effects, because every site is measured in every year, each year serves as a “block” and so Impact vs. Control comparisons are unaffected by the year/period effects.

This gives the following ANOVA table which is also a strip-plot design. Notice that the Year effects are subsamples within Periods. Its degrees of freedom are computed similarly as to those for sites, i.e., \(4=(3-1) + (3-1)\).

Source	df	F.ratio
Period	1	none
Period EU	0
Years	4
Total	5

Period~EU in the ANOVA table means experimental unit for Period effects. Because there are no replicates of the Period experimental units, this has 0 df and so no test statistic can be computed.

These two design are applied orthogonally to each other, so the ANOVA table is extended by the “product” of the above tables. The discussion following the table explains how the \(F\)-ratios are created.

Source	df	F.ratio
SiteClass	1	MS(SiteClass)/MS(Site)
Site	2
Period	1	none
Period EU	0
Years	4
BACI=SiteClass:Period	1	TBA
SiteClass:Period EU	0	TBA
SiteClass:Year	4	TBA
Site:Period	2	TBA
Site:Period EU	0	TBA
Site:Year	8	TBA
Total	23	TBA

Whew… there are now two sizes of experimental units:

The SiteClass.Year and Site.Year terms in the ANOVA table both represent the same experimental unit and are typically pooled into a term representing the smallest experimental unit giving the reduced ANOVA table:

Source	df	F.ratio
SiteClass	1	MS(SiteClass)/MS(Site)
Site	2
Period	1	none
Period EU	0
Years	4
BACI=SiteClass:Period	1	MS(BACI)/MS(Site*Period)
SiteClass:Period EU	0
Site:Period	2
Site:Period EU	0
SiteClass:Year+Site:Year	12
Total	23

The BACI effect will be tested against the Site.Period experimental unit. Unfortunately, this \(F\)-ratio will only have few degrees of freedom (two in this case) and likely has low power to detect effects. This test statistic “protects” against the possibility that the differences among the sites changes from the Before to the After Period (i.e., that site effects are not consistent between the before and after periods). If you are will to ASSUME that differences among sites is consistent among periods, then it is also possible to pool this term with the final term in the ANOVA table. Dropping lines with 0 df gives the final (consolidated) ANOVA table:

Source	df	F.ratio
SiteClass	1	MS(SiteClass)/MS(Site)
Site	2

Period	1	none
Years	4

BACI=SiteClass:Period	1	MS(BACI)/MS(Site:Years + SiteClass:Year + Site:Period)
Site:Years + SiteClass:Year + Site:Period	14
Total	23

Underwood (1992) was also concerned about this type of pooling:

Technically, this post-hoc removal of a component of variance involves making the assumption that the component is zero. This raises the possibility of Type II error (accepting that a value is zero when, in fact, the test is not capable of detecting it). This is the rationale behind post-hoc pooling procedures to reduce the probability of Type II errors (there are detailed discussions in Winer, 1971; Underwood, 1981). In this paper, the problem will be ignored on the grounds that Type II errors leading to inappropriate pooling will have the effect of increasing the probability of Type I errors in the tests for impacts. Thus, impacts may be detected when, in fact, they are not present. This seems a more desirable error than concluding that no impact is present when there is one – a likely risk in some of the tests described below which have few degrees of freedom and, probably, not very great power.

Finally, the multiple samples taken within each site-year combination are pseudo-replicates. The BACI analysis can be done on the average of these values. If the design is balanced, i.e., sample number of sub-samples in each site-year combination, then the analysis on the average values is exactly equivalent to the analysis on the individual values. The analysis on the individual values does provide additional information on the within-site-year variation and this would be useful in designing the experiment where the tradeoffs between more/less sites, more/less years, and more/less sampling within a site-year given the associated costs of new sites, new years, and new sub-samples can be explored.

The assumption of a step change between the before and after periods is crucial. In:

Underwood, A. J. (1994).
On beyond BACI: Sampling designs that might reliably detect environment disturbances.
Ecological Applications, 4, 3-15.

is found a discussion for example, how should the data be analyzed if the expected response is pulse change that may occur only in the year after the impact as illustrated in the last two figures of Underwood (1994)’s Figure 2:

As the examples above show, the analysis of complex BACI designs takes some care. It is inadvisable to simply bash the numbers though a statistical package without a good understanding of exactly how the data were collected!

The key to a successful analysis of a complex experiment is the recognition of multiple levels (or sizes of experimental units) in the study. For example, with multiple Control sites and multiple quadrats measured in each site-year combination, there are two sizes of experimental units – the site and quadrat level.

It is impossible to provide a “cookbook” to give examples of all the possible BACI analyses. Rather than relying on a “cookbook” approach, it is far better to use a “no-name” approach to model building.

The no-name approach starts by adding terms to the model corresponding to the SiteClass and Period effects:

\[\mathit{Y} = \mathit{SiteClass} ~~ \mathit{Period} ~~ \mathit{SiteClass:Period} ~~ \ldots\]

where

SiteClass takes the values Impact and Control;
Period represents the values Before and After.

These terms in the model represent the main effects (i.e., in the absence of an interaction these represents the site and temporal effects). The interaction term in the model again represent the potential environmental impact, i.e., is the change in the mean between the Before and After Period the same for the Control and Impact sites?

To this model, terms must be included representing the various levels in the experiment, i.e., sites, years, etc. These terms can be specified in one of two ways. For example, the site term can be added as;

\[M1: \mathit{Y} = \mathit{SiteClass} ~~ \mathit{Period} ~~ \mathit{SiteClass:Period} ~~ \mathit{Site(SiteClass)(R)} \ldots\] \[M2: \mathit{Y} = \mathit{SiteClass} ~~ \mathit{Period} ~~ \mathit{SiteClass:Period} ~~ \mathit{Site(R)} \ldots\]

Both models are equivalent. The additional term Site(SiteClass) is the traditional (nested) specification for the multiple sites within each SiteClass when the sites are NOT given unique labels, while the term \(Site\) can be used (with modern software) when the site labels are unique.

The \((R)\) notation indicates that you need to specify terms as random effects i.e., you wish to generalize the results to more than the specific sites or years used in this experiment.

Finally, if you have subsampling, you need to specify a term to the model how the sub-samples should be grouped. This is often a combination of year-and-site and so the Site:Year(R) effect was added to the model.

Some textbooks recommend additional interactions terms be added to the model (e.g., the Site:SiteClass interaction term). I find that unless you have a very large experiment, the data is often too sparse (especially if not all sites are measured in all years) to add these extra interaction terms. I haven’t seen a real live experiment where they were useful. Refer back to the previous section on the statistical aspects of BACI designs.

After fitting the model, be sure to assess the fit of the model by looking at residual plots and the like.

Finally, I again suggest you do the analysis on \(\log{Y}\) for reasons explained earlier.

1.9 Power analysis and sample size determination

1.9.1 Introduction

A power analysis of these types of designs is vary difficult to do by hand (!). In this section we will compute power using methods described in Stroup (1999) or Littell et al. (2006).

The basic idea of Stroup’s method is to generate dummy data with NO variability and taking on only the mean response at the experimental factors applied to this observation. In the fry example, you would generate data corresponding to the 5 years of data, 6 sites, and multiple fry traps with the log(fry) values taking one of the 4 means corresponding to the

Impact-Before,,
Impact-After,
Control-Before, and
Control-After combinations.

The BACI contrast among these values should represent a biologically interesting difference. This “mean” data is then analyzed using the BACI model using estimates of the unknown variance components you think are appropriate for your study. These values could be obtained from real life data (as above) or from expert opinion. Then a simple function of the F-statistic gives you the estimated power of your design.

Briefly, power is the ability to detect a difference when that difference exists. The power of a design depends on several features:

The \(\alpha\) level. This is often set to 0.05 or 0.10. This represents the strength of evidence required before an effect is declared statistically significant.
The variance components. You will need information on the variance components of all parts of your design. Depending upon the exact BACI design, there can be up to four variance components for which estimates will be needed: the site-to-site variance, the year-to-year variation, the site-year interaction variance, and the within site-year (typically quadrats or measurements) variance component. A preliminary study, or literature review can be very helpful in getting these components. As you will see later, some of the variance terms have no impact on the power analysis. For example, measuring the same sites or time, or pairing measurements across sites by year will allow you to “block” and remove these sources of variation from the analysis. This is the key advantage of “re-using” the sites in each year or taking paired measurements over time.
The sample size. Here there are different levels to the samples. First is the number of quadrats measured in each site-year combination. This could vary, but for most power analyzes, it is commonly assumed that this is constant. Second, is the number of sites in the Impact or Control areas. In many cases, there is only a single Impact site but the theory is completely general below to allow for multiple sites in the impacted areas. Lastly, how many years are measured before and after the impact?
The effect size of interest. This is the most difficult part of the power analysis. You need to figure out how big of a BACI effect is important to detect. This seems rather obtuse, but can be obtained by drawing pictures of possible BACI outcomes (using hypothetical values) until people are sure that such an effect needs to be detected. This will be illustrated below.

What is a reasonable target for power? There are two “rules of thumb” commonly used. First, at \(\alpha=0.05\) a target power of \(0.80 = 80\%\) is required; second at \(\alpha=0.10\) a target power of \(0.90=90\%\) is required. These guidelines are somewhat arbitrary, but are widely used.

For many of the designs in this chapter, I have provided an Excel spreadsheet and JMP, R, or SAS code to help you determine power and sample size for various types of designs.

Let us consider the elements in more detail for general BACI designs, before considering power and sample size for a specific design.

Effect size of interest The effect size used in the power analysis is the BACI contrast. For many people this is difficult to ascertain. In the spreadsheet is an interactive graph that show the BACI plot for choices of the means Before/After and Control/Impact. Modify the values in the table until the graph shows a biologically significant difference that is important to detect.

The spreadsheet baci-power-prep.xls has a section in the worksheet that can be used to determine the size of the BACI contrast that is of biological importance.

For example, in the crab example, we started with both sites having mean densities of about 35 crabs/quadrat. We suspect that both population may decline over over time, but if the difference in decline is more than about 5 crabs, we will be highly interested in detecting this. We varies the means in the table until the corresponding graph showed an effect that appears to be biologically important:

The BACI contrast value is obtained then in the usual way as the “difference in the differences”:

\[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB} =30 - 35 - 25 + 35 =5 \]

If a negative value is obtained for the BACI contrast, use the absolute value.

Variance components

The site-to-site variance component measures how much different sites
may vary among the mean because of local, site-specific effects (e.g., one site may have slightly better habit than a second site). Fortunately, a good estimates of this variance
component is not crucial as it “cancels” because the same plot is repeated
measured over time. A very rough rule of thumb is to take the range in
means over sites and use the standard deviation as range/4.
The year-to-year variance components measures how year-specific factors (e.g., a year may be wetter than normal which tends to increase productivity at all sites) affect the response. Again, good estimates of this variance component are not that crucial because if all sites are measured in all years (i.e., a blocked design), the year-effect is accounted for in the analysis and again “cancels” in the same way that blocking effects are removed.
The site-year interaction variance component measures how inconsistent the response at sites are over time. Hopefully, it is small. Some “typical”
values found in practice is that the site-year variance is about 20-50% of the site standard deviation. If is smaller than this, you are doing well! This typically is the limiting factor in the BACI study as nothing in the experiment can be modified to account for this variance component.
The sub-sampling (or quadrat-to-quadrat or trap-to-trap) variance component asks how variable are the repeated
measurements taken in the same site-year combination.
A very rough rule of thumb is that for count data, the standard deviation
is about the \(\sqrt{average~counts}\) under the assumption a Poisson distribution. It can be and often is higher - known as over-dispersion.

Estimates of these variance components can be obtained from a preliminary study, but notice that at least two years of preliminary data are required to estimate the year-to-year and site-year variance components (Section 1.11)

Sample sizes There are two levels to the sample size specification.

At the lowest level is the number of quadrats (sub-samples) measured at each site-year combination. This is usually kept fixed for all sites and for all times and for Impact and Control sites, but is allowed to vary. However, all sites that are measured in the Impact-time combination are assumed to have the same number of quadrats. If you wish different sites to have different number of sub-samples, please contact me.

At the upper level, you need to specify the number of sites measured in the Impact and Control areas. In many cases, a single site is measured in the Impact area, but there are cases where multiple Impact sites can be measured. It is assumed that you will have at least 2 Control sites measured for this design.

Similarly, at the upper level, you need to specify the number of years (before and after impact) for which monitoring will take place.

\(\alpha\) level
Specify the \(\alpha\) level (usually 0.05 or 0.10). In many cases, the choice of power and \(\alpha\) level are given externally. For example, the Canadian Department of Environment often specifies that a power of 0.90 is the target at \(\alpha=0.10\).

Next we will examine how to determine the power/sample size for many of the BACI designs considered in this chapter.

1.9.2 Before-After studies

In Before-After studies, there are three attributes that are under the control of the experimenter:

The number of years of monitoring before and after the impact;
The number of sites monitored (the number of streams in the earlier example);
The number of measurements taken at each site-year combination (the number of quadrats measured in the earlier example).

Each of these attributes attempt to control a particular source of variation. In the Before-After study, there are four sources of variation:

Year-to-Year variation. This is caused by year-specific factors such as increased rainfall in a year. A year-specific effect would impact all of the sites in the study.
Site-to-Site variation (Stream-to-Stream variation in the earlier example). Not all sites have exactly the same mean response in a particular year. For example, some sites may be better habitat and generally have higher densities of organisms. We try and control for the the site-specific effects by measuring the same site in all years of the study. In this way, site is a blocking factor, and so the effects of site cancel when considering before vs. after contrasts.
Site-Year interaction variation (Stream-Year interaction in the earlier example). Even though there is a systematic difference among the streams, they may not all respond the same to year-specific factors. For example, sites with better habitat may be better able to weather a downturn in rainfall and so the change in density of organisms may not be as great as in sites with poorer habitat. A large site-year interaction variance would indicate that the results of the Before-After study are not very reproducible in different sites.
Residual variation (Quadrat-to-quadrat variation in the earlier example). Not all sampling locations in a site-year have identical responses. For example, if the response are smallish counts, then the number of organisms in different sample units tends to follow a Poisson distribution.

In some cases, some of the variance components are confounded. For example, if only a single site (stream) is measured, then it is impossible to disentangle the year-to-year variation from the site-year interaction variation and the combined variation is simply labelled as “year-to-year” variation.

In the following sections, we will show how to compute the power for both single- and multiple-site Before-After studies.

1.9.2.1 Single site studies

In these studies, only a single site is measured for multiple years before and after the impact, but at several sampling locations (quadrats) in each year. The full statistical model is:

\[Y_{ijk} = \mu_i + \epsilon_{ij}^{year} + \epsilon_{ijk}^{quadrat}\]

where

\(y_{ijk}\) is the measured response in Period \(i\) (either Before or After) in Year \(j\) (of that Period) and in quadrat \(k\);
\(\mu_i\) is the mean response in Period \(i\);
\(\epsilon_{ij}^{year}\) is the year-specific (random) effect with variance \(\sigma_{year}^2\); and finally
\(\epsilon_{ijk}^{quadrat}\) is the residual random error with variance \(\sigma^2\).

For simplicity, assume that the same number of quadrats (\(n_q\)) is measured in each site-year. Then, as seen in the analysis of subsampling designs, the analysis of the averages is identical to that of the original data. The statistical model for the average response is then:

\[\overline{Y}_{ij\cdot} = \mu_i + \epsilon_{ij}^{year} + \overline{\epsilon}_{ij\cdot}^{quadrat}\]

where now

\(\overline{\epsilon}_{ij\cdot}^{quadrat}\) is the residual random error of the AVERAGES with variance \(\sigma^2/k\).

The variance of the yearly average is

\[V[\overline{Y}_{ij\cdot} ] = \sigma^2_{year} + \frac{\sigma^2}{n_q}\]

We are now in the situation of a simple two-sample t-test whose power depends on the above variance and the number of years before and after the impact. Standard power programs for a two-sample t-test can be used.

delta <- .15
vc <- as.data.frame(VarCorr(ba.invert.result200.fit))

single.stream.indiv.sdResid <- vc[ vc$grp=="Residual",  "sdcor"]
single.stream.indiv.sdYear  <- vc[ vc$grp=="YearF",     "sdcor"]

stddev <- sqrt(single.stream.indiv.sdYear^2 + single.stream.indiv.sdResid^2/10)

In the invertebrate study, we found that the estimates of the combined year-to-year and stream-year interaction standard deviation (labelled as the year standard deviation) was 0.105 and the quadrat-to-quadrat variation (the residual variation) was 0.14. We had 5 years of monitoring before impact and want to know the power for various number of years of monitoring after impact, assuming that a proportional difference of 0.15 in the mean count was important to detect. We will assume that we can measure 10 quadrats each year.

Then the variance of the yearly average is \[V[\overline{Y}_{ij\cdot} ] = \sigma^2_{year} + \frac{\sigma^2}{n_q} = 0.105^2 + \frac{ 0.14^2}{10}= 0.114^2 \].

I have also written a specialized function that does the power computations once you specify the process and sampling standard deviations and the effect size to be detected. It also allows for multiple samples in each year in both balanced and unbalanced designs. Please consult the R code for more details.

The power is then computed using the pwr.t2n.test() function in the pwr package as:

delta <- .15 # proportional change to detect

vc <- as.data.frame(VarCorr(ba.invert.result200.fit))
single.stream.indiv.sdResid <- vc[ vc$grp=="Residual",  "sdcor"]
single.stream.indiv.sdYear  <- vc[ vc$grp=="YearF",     "sdcor"]

cat("Single stream - individual values - sdYear:  ", single.stream.indiv.sdYear , "\n")
cat("Single stream - individual values - sdResid: ", single.stream.indiv.sdResid , "\n")

stddev <- sqrt(single.stream.indiv.sdYear^2 + single.stream.indiv.sdResid^2/10)

power <- plyr::ldply(seq(5,20,1), function(n2){
  # Note that d=effect size = difference to detect / standard deviation 
  # in the pwr.t2n.test() function
  power1 <- pwr.t2n.test(n1=5, n2=n2, d=delta/stddev, 
                         alternative="two.sided")$power
  power2 <-before.after.power.stroup(Shift=delta, 
                                     X.before=rep(1:5, each=10), 
                                     X.after=5+rep(1:n2, each=10), 
                                     Process.SD =single.stream.indiv.sdYear, 
                                     Sampling.SD=single.stream.indiv.sdResid )
  
  res <- c(n2=n2, power.pwr=power1, power.stroup=power2[1,"power.2s"])
  res     
})
cat("Power for detecting ", delta," proportional change using one stream with 5 years before and \n")
cat("different number of years after; 10 quadrats/year\n")
power

Single stream - individual values - sdYear:   0.1047314 
Single stream - individual values - sdResid:  0.140223 
Power for detecting  0.15  proportional change using one stream with 5 years before and 
different number of years after; 10 quadrats/year
   n2 power.pwr power.stroup
1   5 0.4503075    0.4503075
2   6 0.4938853    0.4938853
3   7 0.5297182    0.5297182
4   8 0.5594879    0.5594879
5   9 0.5844845    0.5844845
6  10 0.6056903    0.6056903
7  11 0.6238548    0.6238548
8  12 0.6395539    0.6395539
9  13 0.6532339    0.6532339
10 14 0.6652444    0.6652444
11 15 0.6758615    0.6758615
12 16 0.6853059    0.6853059
13 17 0.6937555    0.6937555
14 18 0.7013547    0.7013547
15 19 0.7082224    0.7082224
16 20 0.7144564    0.7144564

It turn out that you would need at least 20 years measured after impact to have an 80% power to detect a proportional difference of 0.15 (!).

Note that the limiting factor for the power of this design is the combined year-to-year and site-year interaction variance. Even if a very large number of quadrats (i.e., \(n_q\)) was measured each year, the repeated quadrats only drive down the final variance contribution and have no effect on the influence of year-specific and site-year interaction effects.

1.9.2.2 Multiple sites

To improve the power, the multiple sites can be measured in all years. As you will see in a minute, the multiple sites serve as blocks and so the site effects cancel when estimating the before-after contrast. Furthermore, the multiple sites (streams) influence the site-year (stream-year) interaction contribution. But the year-to-year variation is still limiting unless you move to a full BACI design where years also now serve a blocks and the year-effects cancel when estimating the BACI contrast.

The full statistical model is:

\[Y_{ijkl} = \mu_i + \epsilon_{ij}^{year} + \epsilon_{k}^{site} + \epsilon_{ijk}^{site-year}+\epsilon_{ijkl}^{quadrat}\]

where

\(y_{ijkl}\) is the measured response in Period \(i\) (either Before or After), in year \(j\) (of that Period), site \(k\), and in quadrat \(l\);
\(\mu_i\) is the mean response in Period \(i\);
\(\epsilon_{ij}^{year}\) is the year-specific (random) effect with variance \(\sigma_{year}^2\);
\(\epsilon_{k}^{site}\) is the site-specific effect with variance \(\sigma^2_{site}\);
\(\epsilon_{ijk}^{site-year}\) is the site-year interaction effect with variance \(\sigma^2_{site-year}\); and finally
\(\epsilon_{ijkl}^{quadrat}\) is the residual random error with variance \(\sigma^2\).

From the previous analysis of the raw data, the estimated variance components are:

\(\sigma^2_{year} = 0.1021\);
\(\sigma^2_{stream} = 0.1032\);
\(\sigma^2_{year-stream-interaction} = 0\);
\(\sigma^2_{quadrats} = 0.1402\);

There is no simple way to use the Stroup method when not all sites are measured in all years or the number of quadrats vary across site-years. However, in the special case when all sites are measured in all years and there are \(n_s\) sites (streams) and the same number of quadrats (\(n_q\)) is measured in each site-year, then rather simple computations can be done and the power-analysis from a two-sample t-test can be used.

Because all sites are measured in all years, they play the same role as blocks and so the site (stream) effect cancels in the before-after contrast. The site (stream) variance component play no part in the power determination. For example, with 4 streams, and 5 quadrats/stream.year, compute an appropriate variance for the two-sample t-test power computation as: \[s^2 = \sigma^2_{year} + \frac{\sigma^2_{site-year}}{n_s} + \frac{\sigma^2_{quadrat}}{n_s \times n_q}\] Using the estimated variance components above gives \[s^2 = 0.1021^2 + \frac{0^2}{4} + \frac{0.1402^2}{4 \times 5} =0.1068^2\]

This can now be used in the standard power programs.

The power is then computed using the pwr.t2n.test() function in the pwr package as:

vc

multiple.stream.indiv.sdResid     <- vc[ vc$grp=="Residual",         "sdcor"]
multiple.stream.indiv.sdYear      <- vc[ vc$grp=="YearF",            "sdcor"]
multiple.stream.indiv.sdSite      <- vc[ vc$grp=="StreamF",          "sdcor"]
multiple.stream.indiv.sdSiteYear  <- vc[ vc$grp=="StreamF:YearF",     "sdcor"]

# Note that d=effect size = difference to detect / standard deviation
scenarios <- expand.grid(n_after=5:20, 
                         n_before =5,
                         sdYear = multiple.stream.indiv.sdYear,
                         sdSite = multiple.stream.indiv.sdSite ,
                         sdSiteYear=multiple.stream.indiv.sdSiteYear,
                         sdResid= multiple.stream.indiv.sdResid,
                         n_stream = 4,
                         n_quad  = 5,
                         alpha = .05,
                         delta = .2)

power <- plyr::adply(scenarios,1,function(x){
  # compute power for one row of the table
  stddev <- sqrt(  x$sdYear^2 + x$sdSiteYear^2/x$n_stream +
                   x$sdResid^2 /x$n_stream/x$n_quad)
  power <- pwr.t2n.test(n1=x$n_before, 
                        n2=x$n_after, 
                        d =x$delta/stddev, 
                        alternative="two.sided")$power
  data.frame(power=power)
})

cat("Power for detecting ", power$delta[1],", proportional change in 4 streams with 5 years before and \n")
cat("different number of years after; 5 quadrats/year\n")
cat("\nNote that power is now for a larger scope of inference \n")
power

            grp        var1 var2       vcov     sdcor
1 StreamF:YearF (Intercept) <NA> 0.00000000 0.0000000
2         YearF (Intercept) <NA> 0.01041489 0.1020534
3       StreamF (Intercept) <NA> 0.01065083 0.1032029
4      Residual        <NA> <NA> 0.01964983 0.1401778
Power for detecting  0.2 , proportional change in 4 streams with 5 years before and 
different number of years after; 5 quadrats/year

Note that power is now for a larger scope of inference 
   n_after n_before    sdYear    sdSite sdSiteYear   sdResid n_stream n_quad
1        5        5 0.1020534 0.1032029          0 0.1401778        4      5
2        6        5 0.1020534 0.1032029          0 0.1401778        4      5
3        7        5 0.1020534 0.1032029          0 0.1401778        4      5
4        8        5 0.1020534 0.1032029          0 0.1401778        4      5
5        9        5 0.1020534 0.1032029          0 0.1401778        4      5
6       10        5 0.1020534 0.1032029          0 0.1401778        4      5
7       11        5 0.1020534 0.1032029          0 0.1401778        4      5
8       12        5 0.1020534 0.1032029          0 0.1401778        4      5
9       13        5 0.1020534 0.1032029          0 0.1401778        4      5
10      14        5 0.1020534 0.1032029          0 0.1401778        4      5
11      15        5 0.1020534 0.1032029          0 0.1401778        4      5
12      16        5 0.1020534 0.1032029          0 0.1401778        4      5
13      17        5 0.1020534 0.1032029          0 0.1401778        4      5
14      18        5 0.1020534 0.1032029          0 0.1401778        4      5
15      19        5 0.1020534 0.1032029          0 0.1401778        4      5
16      20        5 0.1020534 0.1032029          0 0.1401778        4      5
   alpha delta     power
1   0.05   0.2 0.7374529
2   0.05   0.2 0.7859749
3   0.05   0.2 0.8215819
4   0.05   0.2 0.8482861
5   0.05   0.2 0.8687413
6   0.05   0.2 0.8847205
7   0.05   0.2 0.8974285
8   0.05   0.2 0.9076994
9   0.05   0.2 0.9161219
10  0.05   0.2 0.9231190
11  0.05   0.2 0.9290004
12  0.05   0.2 0.9339963
13  0.05   0.2 0.9382805
14  0.05   0.2 0.9419860
15  0.05   0.2 0.9452162
16  0.05   0.2 0.9480519

We have substantially increased power!

1.9.3 Simple BACI design

Let us now turn our mind to power/sample size determination for the simplest BACI design. As noted, the simple BACI design has one site measured at Impact and Control; one year measured Before and After impact; and independent samples taken at each site-year combination. This design is equivalent to a simple two-factor completely-randomized-design. We are now specifically interested in detecting the interaction – the BACI effect.

If you have a program to determine power/sample size directly for the interaction in a two-factor CRD design, it can be used given the means and standard deviations as noted below. In some cases, you may not have access to such a program, but the computations for power and sample size are relatively straight-forward.

The full statistical model for a general BACI design is:

\[Y_{ijkl} = \mu_{ik} + \epsilon_{ij}^{site} + \epsilon_{ky}^{year} + \epsilon_{ijky}^{site:year} + \epsilon_{ijkyl}^{quadrat}\]

where

\(Y_{ijkyl}\) is the measured response in SiteClass \(i\) (either Control or Impact), site \(j\) (in SiteClass \(i\)), at Period \(k\) (either Before or After), in year \(y\) (within Period \(k\)), and finally in subsample \(l\) (within each site-year combination \(ijky\));
\(\mu_{ik}\) is the mean response at SiteClass \(i\) and Period \(k\), i.e., the four means used to determine the BACI contrast being the means at
- Impact-Before,
- Impact-After,
- Control-Before,
- Control-after combinations;
\(\epsilon_{ij}^{site}\) is site-to-site variation with variance (the square of the standard deviation) of \(\sigma^2_{site}\);
\(\epsilon_{ky}^{year}\) is the year-to-year variation within the same period that affects all sites simultaneously;
\(\epsilon_{ijky}^{\mathit{site:year}}\) is the site-year interaction with variance of \(\sigma^2_{\mathit{Site:Year}}\); and
\(\epsilon_{ijkl}^{quadrat}\) is the subsampling variation with variance of \(\sigma^2_{quadrat}\).

In the simplest BACI experiment, there is only 1 site in each of the Impact or Control SiteClasses (so the \(k\) subscript only takes the value of 1), and only year measured in each of the two Periods (Before and After) so the \(y\) subscript also only takes the value of 1.

The variance of an individual observation is (by statistical theory):

\[V(Y_{ijkyl})= \sigma^2_{site} + \sigma^2_{year} + \sigma^2_{\mathit{Site:Year}} + \sigma^2_{quadrat}\]

We start by averaging over the \(n_{ijky}\) subsamples taken in each site-year. This gives the sets of means:

\[\overline{Y}_{ijky\cdot } = \mu_{ik} + \epsilon_{ij}^{site} + \epsilon_{ky}^{year} + \epsilon_{ijky}^{\mathit{Site:Year}} + \overline{\epsilon}_{ijky\cdot}^{quadrat}\]

Notice that because the average is over the subsamples within each site-year combination, the only thing that varies is the quadrats, and so the average is over the quadrat-to-quadrat error. The variance of these averages is:

\[V(\overline{Y}_{ijk\cdot}) = \sigma^2_{site} + \sigma^2_{year} + \sigma^2_{\mathit{Site:Year}} + \frac{\sigma^2_{quadrat}}{n_{ik}}\]

Only the final variance term is affect by averaging over the sub-samples (quadrats).

Now for each site, we take the difference between the after and before (average) readings – recall that in the simple BACI design, there is only 1 year in each Period, i.e., the \(y\) subscript is always 1. This gives:

\[ \begin{align*} \overline{Y}_{ijA1\cdot}&-\overline{Y}_{ijB1\cdot} \\ = & \mu_{IA} + \epsilon_{ij}^{site} + \epsilon_{A1}^{year}+ \epsilon_{ijA1}^{site-year} + \overline{\epsilon}_{ijA1 \cdot}^{quadrat} - ( \mu_{IB} + \epsilon_{ij}^{site} + \epsilon_{B1}^{year} + \epsilon_{ijB1}^{site-year} + \overline{\epsilon}_{ijB1 \cdot}^{quadrat} ) \\ = & \mu_{IA} - \mu_{IB} + \epsilon_{A1}^{year} + \epsilon_{ijA1}^{site-year} + \overline{\epsilon}_{ijA1 \cdot}^{quadrat} - \epsilon_{B1}^{year} - \epsilon_{ijB1}^{site-year} - \overline{\epsilon}_{ijB1 \cdot}^{quadrat} \end{align*} \]

Notice that the site-to-site variance terms “cancel out”. This occurs because the same site is measured on both the Before and After periods and the extra site variation cancels in the before-after contrast much like in paired designs.

The variance of the site differences is:

\[ \begin{align*} V(\overline{Y}_{ijA\cdot}&-\overline{Y}_{ijB\cdot}) = \sigma^2_{year}+ \sigma^2_{\mathit{Site:Year}} + \frac{\sigma^2_{quadrat}}{n_{IA}} + \sigma^2_{year}+ \sigma^2_{\mathit{Site:Year}} + \frac{\sigma^2_{quadrat}}{n_{IB}}\\ &= 2\sigma^2_{year} + 2 \sigma^2_{\mathit{Site:Year}} + \frac{\sigma^2_{quadrat}}{n_{IA}} + \frac{\sigma^2_{quadrat}}{n_{IB}} \end{align*} \]

Finally, the BACI contrast is found as the difference of the difference in the Impact and Control sites:

\[\begin{align*} BACI =& \overline{Y}_{I1A1\cdot}-\overline{Y}_{I1B1\cdot} - ( \overline{Y}_{C1A1\cdot}-\overline{Y}_{C1B1\cdot})\\ = & \mu_{IA} - \mu_{IB} - (\mu_{CA} - \mu_{CB} ) +\\ & \epsilon_{T1}^{site} + \epsilon_{A1}^{year}+ \epsilon_{I1A1}^{site:year} + \overline{\epsilon}_{I1A1 \cdot}^{quadrat} -\\ & ( \epsilon_{T1}^{site} + \epsilon_{B1}^{year} + \epsilon_{I1B1}^{site:year} + \overline{\epsilon}_{I1B1 \cdot}^{quadrat} ) -\\ & (\epsilon_{C1}^{site} + \epsilon_{A1}^{year}+ \epsilon_{C1A1}^{site:year} + \overline{\epsilon}_{C1A1 \cdot}^{quadrat} )+\\ & ( \epsilon_{C1}^{site} + \epsilon_{B1}^{year} + \epsilon_{C1B1}^{site:year} + \overline{\epsilon}_{C1B1 \cdot}^{quadrat} ) \\ = & \mu_{IA} - \mu_{IB} - \mu_{CA} + \mu_{CB} +\\ & \epsilon_{I1A1}^{site:year} - \epsilon_{I1B1}^{site:year} - \epsilon_{C1A1}^{site:year} + \epsilon_{C1B1}^{site:year} + \overline{\epsilon}_{I1A1 \cdot}^{quadrat} -\overline{\epsilon}_{I1B1 \cdot}^{quadrat} - \overline{\epsilon}_{C1A1 \cdot}^{quadrat} + \overline{\epsilon}_{C1B1 \cdot}^{quadrat} \\ \end{align*} \]

Now notice that both the Year and Site effect both cancel out (as they must with any blocking variable) because all sites were measured in the same year before and after impact. The Site:Year interaction variance component still remains and this is problematic! The problem is that the Site:Year random effects are completely confounded with the respective means in the simple BACI design. This is exactly the problem that occurs in pseudo-replication where it is impossible to distinguish treatment effects from random effects. Because you only measured one site in the Impact and Control site classes and one year in the Before and After periods, you cannot tell if the observed interaction effect is due to the underlying BACI effect (the difference in the means) or due to random, unknowable effects specific to each site-year combination. For example, suppose that your Control site tended to have lower than average counts in one year because of some hidden problem at that site at that year. Then your Control site would be lower than normal for one year and you might detect an “interaction effect” for that experiment that is simple random chance.

Consequently, the best we can do in a simple BACI is define a pseudo-BACI effect: \[ \begin{align*} BACI^{*} = {}& \mu^{*}_{IA} - \mu^{*}_{IB} - \mu^{*}_{CA} + \mu^{*}_{CB} +\\ & + \overline{\epsilon}_{I1A1 \cdot}^{quadrat} -\overline{\epsilon}_{I1B1 \cdot}^{quadrat} - \overline{\epsilon}_{C1A1 \cdot}^{quadrat} + \overline{\epsilon}_{C1B1 \cdot}^{quadrat} \end{align*} \] where the respective pseudo-means are specific to that choice of sites and years.

The final variance of the pseudo-BACI effect (where we must assume that the site-year variance is negligible and we are happy to restrict our findings to these particular years and particular sites) is found as:

\[V(BACI^{*})= 4 \frac{\sigma^2_{quadrat}}{n_C^{sites}} + \frac{\sigma^2_{quadrat}}{n_C^{sites}} + \frac{\sigma^2_{quadrat}}{n_I^{sites}} + \frac{\sigma^2_{quadrat}}{n_I^{sites}}\]

By taking multiple-subsamples, the contribution from the quadrat-to-quadrat variance can be reduced to a fairly small value, but no amount of sampling within that site-year combination will resolve the confounding of the site-year interaction terms with the means.

For example, in the crab example, we started with both sites having mean densities of about 35 crabs/quadrat. In both bases, densities declined in the After Period. We are interested in a difference-in-the-difference of about 20 percentage points (i.e., a BACI-effect of about 0.2).

The BACI contrast value is obtained then in the usual way as the “difference of the differences”: \[BACI=(\mu_{CA}-\mu_{CB}) -(\mu_{IA}-\mu_{IB})= \mu_{CA}-\mu_{CB} -\mu_{IA}+\mu_{IB}\] the actual values for the means are arbitrary, so use the values of \[BACI=0-0-0+0.2\] to specify the BACI effect.

We need information on the STANDARD DEVIATIONS for the quadrat-to-quadrat variation in the experiment. A very rough rule of thumb is that for count data, the standard deviation
is about the \(\sqrt{average~counts}\) assuming that the counts follow a poisson distribution but it can be considerably higher. If you have access to a preliminary survey, the average standard deviation among the quadrats measured in each site-year can be used, or if you have a preliminary ANOVA, the Root Mean Square Error (RMSE) also measures the “average” standard deviation. Based on the results from the analysis presented earlier, the estimated standard deviation is approximately 0.13.

Refer to Section 1.11 on how to estimate the variance components needed for the power analysis.

I’ve created a general power-analysis function for BACI designs called baci.power.r available in the Sample Program Library.

In the case of the simple BACI design, the Site:Year random effect is confounded with the the four SiteClass.Period means and so we simply specify that its value is 0 in the baci.power() function. Similarly, we only have 1 site in each SiteClass and 1 year in each Period, so the site-to-site variation and year-to-year standard deviation are both set to zero.

Here is some sample code to do the power analysis:

scenarios <- expand.grid(n_quad=seq(5,20,2),
                         baci_effect=0.2,
                         sdResid=sdResid)
scenarios

power <- plyr::adply(scenarios,1,function(x){
   #browser()
   power <- baci.power(
            n_IA=x$n_quad, n_IB=x$n_quad, n_CA=x$n_quad, n_CB=x$n_quad, 
            ns_I=1, ns_C=1, 
            ny_B=1, ny_A=1, 
            mu_IA=x$baci_effect, mu_IB=0, mu_CA=0, mu_CB=0, 
            sdYear=0, sdSite=0, sdSiteYear=0, sdResid=x$sdResid)
   power
})

power

cat("\n")
power[,c("alpha","n_IA","n_IB","n_CA","n_CB","baci","power")]

  n_quad baci_effect   sdResid
1      5         0.2 0.1319614
2      7         0.2 0.1319614
3      9         0.2 0.1319614
4     11         0.2 0.1319614
5     13         0.2 0.1319614
6     15         0.2 0.1319614
7     17         0.2 0.1319614
8     19         0.2 0.1319614
  n_quad baci_effect alpha sdSite sdYear sdSiteYear   sdResid n_IA n_IB n_CA
1      5         0.2  0.05      0      0          0 0.1319614    5    5    5
2      7         0.2  0.05      0      0          0 0.1319614    7    7    7
3      9         0.2  0.05      0      0          0 0.1319614    9    9    9
4     11         0.2  0.05      0      0          0 0.1319614   11   11   11
5     13         0.2  0.05      0      0          0 0.1319614   13   13   13
6     15         0.2  0.05      0      0          0 0.1319614   15   15   15
7     17         0.2  0.05      0      0          0 0.1319614   17   17   17
8     19         0.2  0.05      0      0          0 0.1319614   19   19   19
  n_CB ns_I ns_C ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci    baci.se dfdenom
1    5    1    1    1    1   0.2     0     0     0  0.2 0.11802982      16
2    7    1    1    1    1   0.2     0     0     0  0.2 0.09975341      24
3    9    1    1    1    1   0.2     0     0     0  0.2 0.08797424      32
4   11    1    1    1    1   0.2     0     0     0  0.2 0.07957569      40
5   13    1    1    1    1   0.2     0     0     0  0.2 0.07319899      48
6   15    1    1    1    1   0.2     0     0     0  0.2 0.06814455      56
7   17    1    1    1    1   0.2     0     0     0  0.2 0.06401066      64
8   19    1    1    1    1   0.2     0     0     0  0.2 0.06054802      72
        ncp    Fcrit     power    Tcrit os.power1    os.power2
1  2.871286 4.493998 0.3570512 1.745884 0.4910198 5.353837e-04
2  4.019801 4.259677 0.4859071 1.710882 0.6190894 1.619837e-04
3  5.168315 4.149097 0.5967341 1.693889 0.7190601 5.406011e-05
4  6.316830 4.084746 0.6888782 1.683851 0.7954833 1.921239e-05
5  7.465344 4.042652 0.7634033 1.677224 0.8528155 7.140875e-06
6  8.613858 4.012973 0.8223342 1.672522 0.8951393 2.746100e-06
7  9.762373 3.990924 0.8680675 1.669013 0.9259577 1.084973e-06
8 10.910887 3.973897 0.9029987 1.666294 0.9481344 4.382558e-07

  alpha n_IA n_IB n_CA n_CB baci     power
1  0.05    5    5    5    5  0.2 0.3570512
2  0.05    7    7    7    7  0.2 0.4859071
3  0.05    9    9    9    9  0.2 0.5967341
4  0.05   11   11   11   11  0.2 0.6888782
5  0.05   13   13   13   13  0.2 0.7634033
6  0.05   15   15   15   15  0.2 0.8223342
7  0.05   17   17   17   17  0.2 0.8680675
8  0.05   19   19   19   19  0.2 0.9029987

The power curve can be plotted:

power.plot <- ggplot(data=power, aes(x=n_quad, y=power))+
   ggtitle("Estimated power",
            subtitle=paste("alpha: ", power$alpha[1]))+
   geom_point()+
   geom_line()+
   ylim(0,1)+
   geom_hline(yintercept=0.8, color="blue")+
   xlab("Number of quadrats at each site:year")
power.plot

For this example, if a BACI contrast of 0.2 is biologically important, then about 15 sub-samples are required in each site-year combination. Varying the number of subsamples among site-years has some, but little effect on the overall power compared to changes in the total sample size.

If the number of sub-samples in each site-year is large, use the “trick” explained in Section 1.10.

1.9.4 Multiple sites in Control/Impact; one year Before/After

As you saw in the previous section, the simple BACI design with one site measured in one year at Impact and Control has a fundamental problem of pseudo-replication. We now extend the design to having multiple sites (with still a single year of measurements in each period) in this section.

We start again with the full statistical model for a general BACI design is:

\[Y_{ijkyl} = \mu_{ik} + \epsilon_{ij}^{site} + \epsilon_{ky}^{year} + \epsilon_{ijky}^{site:year} + \epsilon_{ijkyl}^{quadrat}\]

where

\(Y_{ijkyl}\) is the measured response in SiteClass \(i\) (either Control or Impact), site \(j\) (in SiteClass \(i\)), at Period \(k\) (either Before or After), in year \(y\) (within Period \(k\)), and finally in subsample \(l\) (within each site-year combination \(ijky\));
\(\mu_{ik}\) is the mean response at SiteClass \(i\) and Period \(k\), i.e., the four means used to determine the BACI contrast being the means at
- Impact-Before,
- Impact-After,
- Control-Before, and
- Control-After combinations;
\(\epsilon_{ij}^{site}\) is site-to-site variation with variance (the square of the standard deviation) of \(\sigma^2_{site}\);
\(\epsilon_{ky}^{year}\) is the year-to-year variation within the same period that affects all sites simultaneously;
\(\epsilon_{ijky}^{site:year}\) is the site-year interaction with variance of \(\sigma^2_{\mathit{Site:Year}}\); and
\(\epsilon_{ijkl}^{quadrat}\) is the subsampling variation with variance of \(\sigma^2_{quadrat}\).

In this BACI experiment, there are now (possibly) multiple sites in either or both of the Impact or Control SiteClasses, but still only one year measured in each of the two periods (Before or After) so the \(y\) subscript also only takes the value of 1.

The variance of an individual observation is (by statistical theory):

\[V(Y_{ijkyl})= \sigma^2_{site} + \sigma^2_{year} + \sigma^2_{\mathit{Site:Year}} + \sigma^2_{quadrat}\]

We start by averaging over the \(n_{ijky}\) subsamples taken in each site. This gives the sets of means:

\[\overline{Y}_{ijky\cdot } = \mu_{ik} + \epsilon_{ij}^{site} + \epsilon_{ky}^{year} + \epsilon_{ijky}^{\mathit{Site:Year}} + \overline{\epsilon}_{ijky\cdot}^{quadrat}\]

Notice that because the average is over the subsamples within each site-year combination, the only thing that varies is the quadrats, and so the average affects the quadrat-to-quadrat error. The variance of these averages is:

\[V(\overline{Y}_{ijk\cdot}) = \sigma^2_{site} + \sigma^2_{year} + \sigma^2_{\mathit{Site:Year}} + \frac{\sigma^2_{quadrat}}{n_{ik}}\]

Only the final variance term is affected by averaging.

Now for each site, we take the difference between the After and Before (average) readings – recall that in this design BACI design, there is still only 1 year in each period, i.e., the \(y\) subscript is always 1. This gives:

\[ \begin{align*} \overline{Y}_{ijA1\cdot}&-\overline{Y}_{ijB1\cdot} \\ = & \mu_{IA} + \epsilon_{ij}^{site} + \epsilon_{A1}^{year}+ \epsilon_{ijA1}^{site:year} + \overline{\epsilon}_{ijA1 \cdot}^{quadrat} - ( \mu_{IB} + \epsilon_{ij}^{site} + \epsilon_{B1}^{year} + \epsilon_{ijB1}^{site:year} + \overline{\epsilon}_{ijB1 \cdot}^{quadrat} ) \\ = & \mu_{IA} - \mu_{IB} + \epsilon_{A1}^{year} + \epsilon_{ijA1}^{site:year} + \overline{\epsilon}_{ijA1 \cdot}^{quadrat} - \epsilon_{B1}^{year} - \epsilon_{ijB1}^{site:year} - \overline{\epsilon}_{ijB1 \cdot}^{quadrat} \end{align*} \]

Notice that the site-to-site variance terms “cancel out”. This occurs because the same site is measured on both the Before and After periods and so accounts for the extra site variation much like in paired designs.

The variance of the site differences is:

We now average over the (possible multiple) sites in the Control areas: \[\overline{Y}_{C\cdot A1\cdot}-\overline{Y}_{C\cdot B1\cdot} = \mu_{CA} - \mu_{CB} + \epsilon_{A1}^{year}+ \overline{\epsilon}_{C\cdot A1}^{\mathit{Site:Year}} + \overline{\epsilon}_{C\cdot A1\cdot}^{quadrat} - \epsilon_{B1}^{year}- \overline{\epsilon}_{C\cdot B1}^{\mathit{Site:Year}} - \overline{\epsilon}_{C\cdot B1\cdot}^{quadrat} \]

which has variance of: \[V(\overline{Y}_{C\cdot A\cdot}-\overline{Y}_{C\cdot B\cdot}) = 2\sigma^2_{year} + 2 \frac{\sigma^2_{\mathit{Site:Period}}}{n^{sites}_C} + \frac{\sigma^2_{quadrat}}{n_{CA}n_C^{sites}} + \frac{\sigma^2_{quadrat}}{n_{CB}n_C^{sites}}\]

A similar expression can be obtained for the average of (possibly) replicated treatment sites.

Finally, the BACI contrast is found as the difference of the difference in the Impact and Control sites:

\[\begin{align*} BACI =& \overline{Y}_{I\cdot A1\cdot}-\overline{Y}_{I\cdot B1\cdot} - ( \overline{Y}_{C\cdot A1\cdot}-\overline{Y}_{C\cdot B1\cdot})\\ = & \mu_{IA} - \mu_{IB} - (\mu_{CA} - \mu_{CB} ) +\\ & \overline{\epsilon}_{I\cdot}^{site} + \epsilon_{A1}^{year}+ \overline{\epsilon}_{I\cdot A1}^{site:year} + \overline{\epsilon}_{I1A1 \cdot}^{quadrat} -\\ & ( \overline{\epsilon}_{I\cdot}^{site} + \epsilon_{B1}^{year} + \overline{\epsilon}_{I\cdot B1}^{site:year} + \overline{\epsilon}_{I1B1 \cdot}^{quadrat} ) -\\ & (\overline{\epsilon}_{C\cdot}^{site} + \epsilon_{A1}^{year}+ \overline{\epsilon}_{C\cdot A1}^{site:year} + \overline{\epsilon}_{C1A1 \cdot}^{quadrat} )+\\ & ( \overline{\epsilon}_{C\cdot}^{site} + \epsilon_{B1}^{year} + \overline{\epsilon}_{C\cdot B1}^{site:year} + \overline{\epsilon}_{C1B1 \cdot}^{quadrat} ) \\ = & \mu_{IA} - \mu_{IB} - \mu_{CA} + \mu_{CB} +\\ & \overline{\epsilon}_{I\cdot A1}^{site:year} - \overline{\epsilon}_{I\cdot B1}^{site-year} - \overline{\epsilon}_{C\cdot A1}^{site:year} + \overline{\epsilon}_{C\cdot B1}^{site-year} + \overline{\epsilon}_{I1A1 \cdot}^{quadrat} -\overline{\epsilon}_{I1B1 \cdot}^{quadrat} - \overline{\epsilon}_{C1A1 \cdot}^{quadrat} + \overline{\epsilon}_{C1B1 \cdot}^{quadrat} \\ \end{align*} \]

Now notice that both the Year and Site effect both cancel out (as they must with any blocking variable) because ALL sites were both measured in the same year before and after impact. The Site:Year interaction variance component still remains but now is no longer problematic as it is no longer confounded with the means. This is primary reason why multiple sites (at least multiple Control sites) should be used with any BACI experiment.

The variance of the BACI contrast is found to be:

\[V(BACI)= 2 \frac{\sigma^2_{\mathit{Site:Year}}}{n^{sites}_C} + 2 \frac{\sigma^2_{\mathit{Site:Year}}}{n^{sites}_I} + \frac{\sigma^2_{quadrat}}{n_{CA}n_C^{sites}} + \frac{\sigma^2_{quadrat}}{n_{CB}n_C^{sites}} + \frac{\sigma^2_{quadrat}}{n_{IA}n_I^{sites}} + \frac{\sigma^2_{quadrat}}{n_{IB}n_I^{sites}}\]

The standard deviation of the BACI contrast is the square root of this value. Finally, the BACI value and the standard deviation of the BACI contrast can be used with any one-sample power analysis program to determine the power of the design.

Refer to Section 1.11 on how to estimate the variance components needed for the power analysis. The previous analysis of this data gave the following estimates of the variance components:

 Groups      Name        Std.Dev.
 Site:Period (Intercept) 0.00000 
 Site        (Intercept) 0.12274 
 Residual                0.11919

The BACI effect size is determined in the usual way. Again assume that a BACI contrast of 0.2 is of biological interest.

As in the previous section, we used the baci-power.r routine. Here is some sample code to do the power analysis based on the baseline values and several other scenarios:

cat("Estimated variance components are \nsdSiteYear ", round(sdSiteYear,2), 
               ";\n sdSite ", round(sdSite,2),
               "; \nsdResid ",round(sdResid,2),"\n")
cat("\n")

scenarios <- expand.grid(n_quad=seq(5,40,5),
                         ns_I = c(1,2),
                         ns_C = seq(2,10,2),
                         baci_effect=0.2,
                         sdSiteYear=sdSiteYear,
                         sdSite =sdSite,
                         sdResid=sdResid)
cat("Some scenarios\n")
head(scenarios)

power <- plyr::adply(scenarios,1,function(x){
  #browser()
  power <- baci.power(
    n_IA=x$n_quad, n_IB=x$n_quad, n_CA=x$n_quad, n_CB=x$n_quad, 
    ns_I=x$ns_I, ns_C=x$ns_C, 
    ny_B=1, ny_A=1, 
    mu_IA=x$baci_effect, mu_IB=0, mu_CA=0, mu_CB=0, 
    sdYear=0, sdSite=x$sdSite, sdSiteYear=x$sdSiteYear, sdResid=x$sdResid)
  power
})
cat("\nSome of the power computations\n")
head(power)

cat("\n")
head(power[,c("alpha","ns_I","ns_C","n_IA","n_IB","n_CA","n_CB","baci","power")])

Estimated variance components are 
sdSiteYear  0 ;
 sdSite  0.12 ; 
sdResid  0.12 

Some scenarios
  n_quad ns_I ns_C baci_effect sdSiteYear    sdSite   sdResid
1      5    1    2         0.2          0 0.1227443 0.1191853
2     10    1    2         0.2          0 0.1227443 0.1191853
3     15    1    2         0.2          0 0.1227443 0.1191853
4     20    1    2         0.2          0 0.1227443 0.1191853
5     25    1    2         0.2          0 0.1227443 0.1191853
6     30    1    2         0.2          0 0.1227443 0.1191853

Some of the power computations
  n_quad baci_effect alpha    sdSite sdYear sdSiteYear   sdResid n_IA n_IB n_CA
1      5         0.2  0.05 0.1227443      0          0 0.1191853    5    5    5
2     10         0.2  0.05 0.1227443      0          0 0.1191853   10   10   10
3     15         0.2  0.05 0.1227443      0          0 0.1191853   15   15   15
4     20         0.2  0.05 0.1227443      0          0 0.1191853   20   20   20
5     25         0.2  0.05 0.1227443      0          0 0.1191853   25   25   25
6     30         0.2  0.05 0.1227443      0          0 0.1191853   30   30   30
  n_CB ns_I ns_C ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci    baci.se dfdenom
1    5    1    2    1    1   0.2     0     0     0  0.2 0.09232057       1
2   10    1    2    1    1   0.2     0     0     0  0.2 0.06528050       1
3   15    1    2    1    1   0.2     0     0     0  0.2 0.05330131       1
4   20    1    2    1    1   0.2     0     0     0  0.2 0.04616029       1
5   25    1    2    1    1   0.2     0     0     0  0.2 0.04128701       1
6   30    1    2    1    1   0.2     0     0     0  0.2 0.03768972       1
        ncp    Fcrit     power    Tcrit os.power1    os.power2
1  4.693135 161.4476 0.1356417 6.313752 0.2659881 6.776132e-04
2  9.386270 161.4476 0.1899989 6.313752 0.3682909 3.843098e-05
3 14.079405 161.4476 0.2315491 6.313752 0.4427867 2.629404e-06
4 18.772539 161.4476 0.2661000 6.313752 0.5020942 1.961622e-07
5 23.465674 161.4476 0.2961041 6.313752 0.5514234 1.539306e-08
6 28.158809 161.4476 0.3228404 6.313752 0.5935277 1.248884e-09

  alpha ns_I ns_C n_IA n_IB n_CA n_CB baci     power
1  0.05    1    2    5    5    5    5  0.2 0.1356417
2  0.05    1    2   10   10   10   10  0.2 0.1899989
3  0.05    1    2   15   15   15   15  0.2 0.2315491
4  0.05    1    2   20   20   20   20  0.2 0.2661000
5  0.05    1    2   25   25   25   25  0.2 0.2961041
6  0.05    1    2   30   30   30   30  0.2 0.3228404

A graph of the power results is

power.plot <- ggplot(data=power, aes(x=n_quad, y=power, color=as.factor(ns_C)))+
  ggtitle("Estimated power",
          subtitle=paste("alpha: ", power$alpha[1]))+
  geom_point()+
  geom_line()+
  ylim(0,1)+
  geom_hline(yintercept=0.8, color="blue")+
  xlab("Number of quadrats at each site:year")+
  scale_color_discrete(name="# Site\nControl")+
  facet_wrap(~ns_I, ncol=2, labeller=label_both)
power.plot

Notice that increasing the sub-sampling from 5 to 40 in each site-year combination has little effect on the power. The problem is that these are pseudo-replicates and not true replicates. Consequently, replication of the number of sites is much more effective.

Now, because the number of Impact sites is limited (there is only 1), it may be sensible to increase sampling at the Impact site at the quadrat level as well.

# you could also have different sample sizes in impact an control areas.
results <-  baci.power(n_IA=20, n_IB=20, n_CA=5, n_CB=5, 
                            ns_I=1, ns_C=4, 
                            ny_B=1, ny_A=1, 
                            mu_IA=0.2, mu_IB=0, mu_CA=0, mu_CB=0, 
                            sdYear=0, sdSite=sdSite, sdSiteYear=sdSiteYear, sdResid=sdResid)
results

  alpha    sdSite sdYear sdSiteYear   sdResid n_IA n_IB n_CA n_CB ns_I ns_C
1  0.05 0.1227443      0          0 0.1191853   20   20    5    5    1    4
  ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci    baci.se dfdenom     ncp    Fcrit
1    1    1   0.2     0     0     0  0.2 0.05330131       3 14.0794 10.12796
      power    Tcrit os.power1    os.power2
1 0.7061564 2.353363  0.876833 5.764102e-07

For this particular experiment, quite a few additional sites would be needed to have a reasonable chance of detecting this size of a biologically important difference.

If the number of sub-samples in each site-year is large, use the “trick” explained in Section 1.10.

1.9.5 One site in Control/Impact; multiple years before/after; no subsampling

The derivation for the power analysis for this design is relatively straightforward because there is no subsampling in each site.

We start again with the full statistical model for a general BACI design is:

\[Y_{ijkl} = \mu_{ik} + \epsilon_{ij}^{site} + \epsilon_{ky}^{year} + \epsilon_{ijky}^{site:year} + \epsilon_{ijkyl}^{quadrat}\]

where

\(Y_{ijkyl}\) is the measured response in SiteClass \(i\) (either Control or Impact), site \(j\) (in SiteClass \(i\)), at Period \(k\) (either Before or After), in year \(y\) (within Period \(k\)), and finally in subsample \(l\) (within site-year combination \(ijky\));
\(\mu_{ik}\) is the mean response at SiteClass \(i\) and Period \(k\), i.e., the four means used to determine the BACI contrast being the means at
- Impact-Before,
- Impact-After,
- Control-Before, and
- Control-After combinations;
\(\epsilon_{ij}^{site}\) is site-to-site variation with variance (the square of the standard deviation) of \(\sigma^2_{site}\);
\(\epsilon_{ky}^{year}\) is the year-to-year variation within the same period that affects all sites simultaneously;
\(\epsilon_{ijky}^{site:year}\) is the site-year interaction with variance of \(\sigma^2_{\mathit{Site-Year}}\); and
\(\epsilon_{ijkl}^{quadrat}\) is the subsampling variation with variance of \(\sigma^2_{quadrat}\).

In this BACI experiment, there are now only one site in each of the Impact or Control SiteClasses so the \(k\) subscript only takes the value of 1, but now multiple years measured in each of the two periods (Before or After) so the \(y\) subscript no takes many values. There is only one measurement at each site-year combination, so the \(l\) subscript only takes the value 1 as well.

The variance of an individual observation is (by statistical theory):

\[V(Y_{ijkyl})= \sigma^2_{site} + \sigma^2_{year} + \sigma^2_{\mathit{Site-Year}} + \sigma^2_{quadrat}\]

Because both sites are measured every sampling event (a paired design), we first take the difference in response between the Impact and Control sites at each time point:

\[ \begin{align*} d_{ky1}= &Y_{T1ky1} -Y_{C1ky1} = \mu_{Tk} + \epsilon_{T1}^{site} + \epsilon_{ky}^{year} + \epsilon_{T1ky}^{site:year} + \epsilon_{T1kyl}^{quadrat} - ( \mu_{Ck} + \epsilon_{C1}^{site} + \epsilon_{ky}^{year} + \epsilon_{C1ky}^{site:year} + \epsilon_{C1kyl}^{quadrat})\\ =& \mu_{Tk} + \epsilon_{T1}^{site} + \epsilon_{T1ky}^{site:year} + \epsilon_{T1kyl}^{quadrat} - ( \mu_{Ck} + \epsilon_{C1}^{site} + \epsilon_{C1ky}^{site:year} + \epsilon_{C1kyl}^{quadrat})\\ \end{align*} \]

Notice that the sampling-variation caused by different sampling times cancels – this is exactly what happens with paired designs.

The average differences in the Before and After periods is then found:

\[ \begin{align*} \overline{d}_{B\cdot 1} =& \mu_{IB} + \epsilon_{T1}^{site} + \overline{\epsilon}_{I1B\cdot}^{site:year} + \overline{\epsilon}_{I1B\cdot 1}^{quadrat} - ( \mu_{CB} + \epsilon_{C1}^{site} + \overline{\epsilon}_{C1B\cdot}^{site:year} + \overline{\epsilon}_{C1B\cdot 1}^{quadrat})\\ \overline{d}_{A\cdot 1} =& \mu_{IA} + \epsilon_{T1}^{site} + \overline{\epsilon}_{I1A\cdot}^{site:year} + \overline{\epsilon}_{I1A\cdot 1}^{quadrat} - ( \mu_{CA} + \epsilon_{C1}^{site} + \overline{\epsilon}_{C1A\cdot}^{site:year} + \overline{\epsilon}_{C1A\cdot 1}^{quadrat}) \end{align*} \]

Finally the BACI contrast is the difference of these differences

\[ \begin{align*} BACI= & \overline{d}_{B\cdot 1} -\overline{d}_{A\cdot 1}= \mu_{IB} - \mu_{CB} - \mu_{IA} + \mu_{CA} +\\ & \overline{\epsilon}_{I1B\cdot}^{site:year} - \overline{\epsilon}_{C1B\cdot}^{site:year} -\overline{\epsilon}_{I1A\cdot}^{site:year} +\overline{\epsilon}_{C1A\cdot}^{site:year} + \\ & \overline{\epsilon}_{I1B\cdot 1}^{quadrat} -\overline{\epsilon}_{C1B\cdot 1}^{quadrat} - \overline{\epsilon}_{I1A\cdot 1}^{quadrat} + \overline{\epsilon}_{C1A\cdot 1}^{quadrat} \end{align*} \]

Notice how the site-effects again cancel out because the same sites are measured in both periods (Before and After).

The BACI contrast has variance:

\[V(BACI)= \frac{2 \sigma^2_{\mathit{Site-Year}}}{n_B} + \frac{2 \sigma^2_{\mathit{Site-Year}}}{n_A} + \frac{2 \sigma^2}{n_B} + \frac{2 \sigma^2}{n_A}\]

This can then be used with any power program for a single mean.

Refer to Section 1.11 on how to estimate the variance components needed for the power analysis. The estimated variance components for the fish example seen earlier are:

 Groups        Name        Std.Dev.
 SamplingTimeF (Intercept) 1.74142 
 Residual                  0.80714

Note however, that with only one measurement per year, the Site:Year and Residual (subsampling) variances cannot be disentangled and are completely confounded. Consequently, the estimated residual variation is a combination of the Site:Year and residual variation and is used as is. From the previous analysis of the fish data, w e found that the year-to-year variance component (the SamplingTime variance component) had a (standard deviation) value of 1.74 and the residual variation (combination of the Site:Year and s subsampling variation) had a (standard deviation) value of 0.81. .

As in the previous section, we used the baci-power() function.. Here is some sample code to do the power analysis based on the baseline values and several other scenarios. Note that we set the number of sub-samples to 1 and set the sub-sampling variance component to zero.

# Note that because there is only 1 site, there is NO variance component
# associated with site. Use any value here. In general, the site variance
# component is not important because when you meaure the same sites over time
# the site effects cancel out (like blocking) and so site-to-site variability
# is not important. The site-year variation really is what drives the BACI power.

sdResid = vc[ vc$grp=="Residual",      "sdcor"]
sdYear  = vc[ vc$grp=="SamplingTimeF", "sdcor"]

scenarios <- expand.grid(n_quad=1,
                         n.months_B =seq(12,20,2),
                         n.months_A =seq(13,20,1),
                         baci_effect=0.5,
                         sdYear = sdYear, sdResid=sdResid)
head(scenarios)

power <- plyr::adply(scenarios,1,function(x){
  #browser()
  power <- baci.power(
    n_IA=x$n_quad, n_IB=x$n_quad, n_CA=x$n_quad, n_CB=x$n_quad, 
    ns_I=1, ns_C=1, 
    ny_B=x$n.months_B, ny_A=x$n.months_A, 
    mu_IA=x$baci_effect, mu_IB=0, mu_CA=0, mu_CB=0, 
    sdYear=x$sdYear, sdSite=0, sdSiteYear=x$sdResid, sdResid=0)
  power
})

head(power)

  n_quad n.months_B n.months_A baci_effect   sdYear  sdResid
1      1         12         13         0.5 1.741419 0.807142
2      1         14         13         0.5 1.741419 0.807142
3      1         16         13         0.5 1.741419 0.807142
4      1         18         13         0.5 1.741419 0.807142
5      1         20         13         0.5 1.741419 0.807142
6      1         12         14         0.5 1.741419 0.807142
  n_quad n.months_B n.months_A baci_effect alpha sdSite   sdYear sdSiteYear
1      1         12         13         0.5  0.05      0 1.741419   0.807142
2      1         14         13         0.5  0.05      0 1.741419   0.807142
3      1         16         13         0.5  0.05      0 1.741419   0.807142
4      1         18         13         0.5  0.05      0 1.741419   0.807142
5      1         20         13         0.5  0.05      0 1.741419   0.807142
6      1         12         14         0.5  0.05      0 1.741419   0.807142
  sdResid n_IA n_IB n_CA n_CB ns_I ns_C ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci
1       0    1    1    1    1    1    1   12   13   0.5     0     0     0  0.5
2       0    1    1    1    1    1    1   14   13   0.5     0     0     0  0.5
3       0    1    1    1    1    1    1   16   13   0.5     0     0     0  0.5
4       0    1    1    1    1    1    1   18   13   0.5     0     0     0  0.5
5       0    1    1    1    1    1    1   20   13   0.5     0     0     0  0.5
6       0    1    1    1    1    1    1   12   14   0.5     0     0     0  0.5
    baci.se dfdenom      ncp    Fcrit     power    Tcrit os.power1   os.power2
1 0.4569542      23 1.197277 4.279344 0.1824510 1.713872 0.2800207 0.003387034
2 0.4396541      25 1.293355 4.241699 0.1943614 1.708141 0.2951721 0.002960494
3 0.4262185      27 1.376181 4.210008 0.2047466 1.703288 0.3081427 0.002641309
4 0.4154683      29 1.448319 4.182964 0.2138748 1.699127 0.3193705 0.002394624
5 0.4066636      31 1.511714 4.159615 0.2219566 1.695519 0.3291837 0.002198952
6 0.4490524      24 1.239784 4.259677 0.1877758 1.710882 0.2868063 0.003188366

A plot of the power values is:

power.plot <- ggplot(data=power, aes(x=ny_A, y=power, color=as.factor(ny_B)))+
  ggtitle("Estimated power",
          subtitle=paste("alpha: ", power$alpha[1],"; baci_effect:", power$baci_effect[1]))+
  geom_point()+
  geom_line()+
  ylim(0,1)+
  geom_hline(yintercept=0.8, color="blue")+
  xlab("Number of months of monitoring after impact")+
  scale_color_discrete(name="# months\nbefore")
power.plot

Note that the power is extremely small for all but very large differences even with monthly samples taken before and after impact. The key problem is the large variation in the differences among the different months. This may indicate that the design should be modified to incorporate sub-sampling to try and reduce some of the variation seen across months.

The power and sample size for the chironomid example is very similar and not repeated here.

If the number of sub-samples in each site-year is large, use the “trick” explained in Section 1.10.

1.9.6 Multiple sites in Control/Impact; multiple years Before/After; subsampling

Whew! Now for a power/sample size analysis for the general BACI design with multiple sites in the SiteClasses (typically only the Control sites are replicated), multiple years in each period (Before/After), and multiple subsamples.

The derivation of the BACI contrast and its variance follows the example shown in previous section and only the final results will be shown.

The BACI contrast is the difference of the differences after averaging over subsamples, multiple sites in each SiteClass, and multiple years in each period:

\[ \begin{align*} BACI= & \overline{d}_{B\cdot \cdot} -\overline{d}_{A\cdot \cdot}= \mu_{IB} - \mu_{CB} - \mu_{IA} + \mu_{CA} +\\ & \overline{\epsilon}_{I\cdot B\cdot}^{site:year} - \overline{\epsilon}_{C\cdot B\cdot}^{site:year} -\overline{\epsilon}_{I\cdot A\cdot}^{site:year} +\overline{\epsilon}_{C\cdot A\cdot}^{site:year} + \\ & \overline{\epsilon}_{I\cdot B\cdot \cdot }^{quadrat} -\overline{\epsilon}_{C\cdot B\cdot \cdot }^{quadrat} - \overline{\epsilon}_{I\cdot A\cdot \cdot }^{quadrat} + \overline{\epsilon}_{C\cdot A\cdot \cdot }^{quadrat} \end{align*} \]

Notice how the site-effects and year-effects again cancel out because the same sites are measured in all years (before and after impact).

The general BACI contrast has variance:

\[V(BACI)= \frac{\sigma^2_{\mathit{Site:Year}}}{ny_B ns_I} + \frac{\sigma^2_{\mathit{Site-Year}}}{ny_B ns_C} + \frac{\sigma^2_{\mathit{Site-Year}}}{ny_A ns_I} + \frac{\sigma^2_{\mathit{Site-Year}}}{ny_A ns_C} + \frac{\sigma^2}{ny_B ns_I n_{IB}} + \frac{\sigma^2}{ny_B ns_C n_{CB}} + \frac{\sigma^2}{ny_A ns_I n_{IA}} + \frac{\sigma^2}{ny_A ns_C n_{CA}} \]

where

\(ny_B\) is the number of years monitored before impact;
\(ny_A\) is the number of years monitored after impact;
\(ns_I\) is the number of sites measured in the Impact SiteClass;
\(ns_C\) is the number of sites measured in the Control SiteClass;
\(n_{IB}\), \(n_{IA}\), \(n_{CB}\), and \(n_{CA}\) is the number of subsamples taken in each site-year combination.

This can then be used with any power program for a single mean in the same fashion as before.

Refer to Section 1.11 on how to estimate the variance components needed for the power analysis.

Consider the fry example. Here the variance components are:

site-to-site standard deviation 0.77; This is irrelevant because all sites are measured in all years.
year-to-year standard deviation 0.21; This is also irrelevant because all sites are measured in all years.
site-year standard deviation 0
sub-sampling standard deviation 0.72.

A BACI contrast of 0.5 is of interest, so choose 4 means to give the relevant value. Finally, various combinations of the number of cages, number of sites, number of years are tried to obtain the following power.

As in the previous section, we used the baci-power() function. Here is some sample code to do the power analysis based on the baseline values and several other scenarios.

sdSiteYear= vc[ vc$grp=="YearF:Site",    "sdcor"]
sdResid   = vc[ vc$grp=="Residual",      "sdcor"]
sdYear    = vc[ vc$grp=="YearF",         "sdcor"]
sdSite    = vc[ vc$grp=="Site",          "sdcor"]

cat("Estimated variance components are \n",
        "sdSiteYear ", round(sdSiteYear,2), 
    ";\n sdSite ",     round(sdSite,2),
    ";\n sdYear ",     round(sdYear,2),
    "; \nsdResid ",    round(sdResid,2),"\n")
cat("\n")

scenarios <- expand.grid(n_quad=seq(3,9,3),
                         ns_I = 3, ns_C=3,
                         ny_B =3,
                         ny_A =c(2,3,4),
                         baci_effect=seq(0,0.8,.1),
                         sdYear = sdYear, sdSite=sdSite, sdSiteYear=sdSiteYear, sdResid=sdResid)
head(scenarios)

power <- plyr::adply(scenarios,1,function(x){
  #browser()
  power <- baci.power(
    n_IA=x$n_quad, n_IB=x$n_quad, n_CA=x$n_quad, n_CB=x$n_quad, 
    ns_I=x$ns_I, ns_C=x$ns_C, 
    ny_B=x$ny_B, ny_A=x$ny_A, 
    mu_IA=x$baci_effect, mu_IB=0, mu_CA=0, mu_CB=0, 
    sdYear=x$sdYear, sdSite=x$sdSite, sdSiteYear=x$sdSiteYear, sdResid=x$sdResid)
  power
})

head(power)

cat("\n")
head(power[,c("alpha","n_IA","n_IB","n_CA","n_CB","ny_B","ny_A","ns_I","ns_C","baci","power")])

Estimated variance components are 
 sdSiteYear  0 ;
 sdSite  0.77 ;
 sdYear  0.21 ; 
sdResid  0.72 

  n_quad ns_I ns_C ny_B ny_A baci_effect    sdYear    sdSite sdSiteYear
1      3    3    3    3    2           0 0.2141545 0.7669679          0
2      6    3    3    3    2           0 0.2141545 0.7669679          0
3      9    3    3    3    2           0 0.2141545 0.7669679          0
4      3    3    3    3    3           0 0.2141545 0.7669679          0
5      6    3    3    3    3           0 0.2141545 0.7669679          0
6      9    3    3    3    3           0 0.2141545 0.7669679          0
   sdResid
1 0.723465
2 0.723465
3 0.723465
4 0.723465
5 0.723465
6 0.723465
  n_quad baci_effect alpha    sdSite    sdYear sdSiteYear  sdResid n_IA n_IB
1      3           0  0.05 0.7669679 0.2141545          0 0.723465    3    3
2      6           0  0.05 0.7669679 0.2141545          0 0.723465    6    6
3      9           0  0.05 0.7669679 0.2141545          0 0.723465    9    9
4      3           0  0.05 0.7669679 0.2141545          0 0.723465    3    3
5      6           0  0.05 0.7669679 0.2141545          0 0.723465    6    6
6      9           0  0.05 0.7669679 0.2141545          0 0.723465    9    9
  n_CA n_CB ns_I ns_C ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci   baci.se dfdenom
1    3    3    3    3    3    2     0     0     0     0    0 0.3113298      19
2    6    6    3    3    3    2     0     0     0     0    0 0.2201434      19
3    9    9    3    3    3    2     0     0     0     0    0 0.1797463      19
4    3    3    3    3    3    3     0     0     0     0    0 0.2784618      24
5    6    6    3    3    3    3     0     0     0     0    0 0.1969022      24
6    9    9    3    3    3    3     0     0     0     0    0 0.1607700      24
  ncp    Fcrit power    Tcrit os.power1 os.power2
1   0 4.380750  0.05 1.729133      0.05      0.05
2   0 4.380750  0.05 1.729133      0.05      0.05
3   0 4.380750  0.05 1.729133      0.05      0.05
4   0 4.259677  0.05 1.710882      0.05      0.05
5   0 4.259677  0.05 1.710882      0.05      0.05
6   0 4.259677  0.05 1.710882      0.05      0.05

  alpha n_IA n_IB n_CA n_CB ny_B ny_A ns_I ns_C baci power
1  0.05    3    3    3    3    3    2    3    3    0  0.05
2  0.05    6    6    6    6    3    2    3    3    0  0.05
3  0.05    9    9    9    9    3    2    3    3    0  0.05
4  0.05    3    3    3    3    3    3    3    3    0  0.05
5  0.05    6    6    6    6    3    3    3    3    0  0.05
6  0.05    9    9    9    9    3    3    3    3    0  0.05

A plot of the power function is:

power.plot <- ggplot(data=power, aes(x=baci_effect, y=power, color=as.factor(n_quad)))+
  ggtitle("Estimated power",
          subtitle=paste("alpha: ", power$alpha[1]))+
  geom_point()+
  geom_line()+
  ylim(0,1)+
  geom_hline(yintercept=0.8, color="blue")+
  xlab("BACI effect size (log scale)")+
  scale_color_discrete(name="# quadrats")+
  facet_wrap(~ny_A, ncol=2, labeller=label_both)
power.plot

You see that about 9 cages need to be employed in each site-year combination to get reasonable power.

If the number of sub-samples in each site-year is large, use the “trick” explained in Section 1.10.

1.10 Power shortcut when number of sub-samples in each site.year is large

In some cases, sub-sampling variation is very large requiring a large number of samples taken in each site-year. This can cause issues in finding the power because of numerical issues.

Specifically, the total number of observations in the design is \(n_{quad} \times (ns_I + ns_C) \times (ny_B + ny_A)\). For example, if the design has 10 quadrats measured in 1 impact and 3 control sites with 3 years before and 3 years after, the total number of observation is \(10 \times 4 \times 6=240\). This, in turn implies that the variance-covariance matrix of the observations is a \(240 \times 240\) matrix which must be inverted. This is numerically unstable and can take very long to find the power for this individual scenario.

The power for these types of designs can be found by considering an BACI analysis on the “averages” (as seen in the previous sections). In this case, the number of sub-samples is set to 1, and the site-year and residual variation are modified as

\[\sigma^2_{\mathit{Site-Year},avg}= \sqrt{\sigma^2_{\mathit{Site-Year}} + \frac{\sigma^2_{\mathit{Residual}}}{n_{quad}}}\]

\[\sigma^2_{\mathit{Resid},avg} = 0\]

For example consider the following two computations of power:

cat("Estimated variance components are \n",
    "sdSiteYear ", round(sdSiteYear,2), 
    ";\n sdSite ",     round(sdSite,2),
    ";\n sdYear ",     round(sdYear,2),
    "; \nsdResid ",    round(sdResid,2),"\n")
cat("\n")

# We can make an equivalent baci design with 1 observations/site-year (the average)
# and revised sdSiteYear and sdResid

# for example consider a baci design with n_quad=10 in all site.years

cat("\nPower computed using individual observations\n")
power.indiv <- baci.power(n_IA=10, n_IB=10, n_CA=10, n_CB=10,
           ns_I=3, ns_C=3, 
           ny_B=3, ny_A=2, 
           mu_IA=.4, mu_IB=0, mu_CA=0, mu_CB=0, 
           sdYear=sdYear, sdSite=sdSite, sdSiteYear=sdSiteYear, sdResid=sdResid)
power.indiv


# this has equivalent power when you analyze the "averages" and create revised sdSiteYear and sdResid
sdSiteYear.avg <- sqrt(sdSiteYear^2 + sdResid^2/10)
sdResid.avg    <- 0

cat("\nPower computed using averages\n")
power.avg <- baci.power(n_IA=1, n_IB=1, n_CA=1, n_CB=1,
                          ns_I=3, ns_C=3, 
                          ny_B=3, ny_A=2, 
                          mu_IA=.4, mu_IB=0, mu_CA=0, mu_CB=0, 
                          sdYear=sdYear, sdSite=sdSite, sdSiteYear=sdSiteYear.avg, sdResid=sdResid.avg)
power.avg

Estimated variance components are 
 sdSiteYear  0 ;
 sdSite  0.77 ;
 sdYear  0.21 ; 
sdResid  0.72 


Power computed using individual observations
  alpha    sdSite    sdYear sdSiteYear  sdResid n_IA n_IB n_CA n_CB ns_I ns_C
1  0.05 0.7669679 0.2141545          0 0.723465   10   10   10   10    3    3
  ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci   baci.se dfdenom      ncp   Fcrit
1    3    2   0.4     0     0     0  0.4 0.1705223      19 5.502466 4.38075
      power    Tcrit os.power1    os.power2
1 0.6049787 1.729133 0.7310784 4.589899e-05

Power computed using averages
  alpha    sdSite    sdYear sdSiteYear sdResid n_IA n_IB n_CA n_CB ns_I ns_C
1  0.05 0.7669679 0.2141545  0.2287797       0    1    1    1    1    3    3
  ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci   baci.se dfdenom      ncp   Fcrit
1    3    2   0.4     0     0     0  0.4 0.1705223      19 5.502466 4.38075
      power    Tcrit os.power1    os.power2
1 0.6049787 1.729133 0.7310784 4.589899e-05

This can be embedded in the scenario loop used before to give:

scenarios <- expand.grid(n_quad=seq(3,9,3),
                         ns_I = 3, ns_C=3,
                         ny_B =3,
                         ny_A =c(2,3,4),
                         baci_effect=seq(0,0.8,.1),
                         sdYear = sdYear, sdSite=sdSite, sdSiteYear=sdSiteYear, sdResid=sdResid)
cat("Part of the scenarios dataset\n")
head(scenarios)

power.avg <- plyr::adply(scenarios,1,function(x){
  #browser()
  # do the short cut
  x$sdSiteYear.avg <- sqrt(x$sdSiteYear^2 + x$sdResid^2/x$n_quad) # Note change here
  x$sdResid.avg    <- 0
  
  power <- baci.power(
    n_IA=1, n_IB=1, n_CA=1, n_CB=1, # notice change here
    ns_I=x$ns_I, ns_C=x$ns_C, 
    ny_B=x$ny_B, ny_A=x$ny_A, 
    mu_IA=x$baci_effect, mu_IB=0, mu_CA=0, mu_CB=0, 
    sdYear=x$sdYear, sdSite=x$sdSite, 
    sdSiteYear=x$sdSiteYear.avg, sdResid=x$sdResid.avg) # note change here
  power
})

cat("\nPart of the power dataset computed on averages\n")
head(power.avg)

Part of the scenarios dataset
  n_quad ns_I ns_C ny_B ny_A baci_effect    sdYear    sdSite sdSiteYear
1      3    3    3    3    2           0 0.2141545 0.7669679          0
2      6    3    3    3    2           0 0.2141545 0.7669679          0
3      9    3    3    3    2           0 0.2141545 0.7669679          0
4      3    3    3    3    3           0 0.2141545 0.7669679          0
5      6    3    3    3    3           0 0.2141545 0.7669679          0
6      9    3    3    3    3           0 0.2141545 0.7669679          0
   sdResid
1 0.723465
2 0.723465
3 0.723465
4 0.723465
5 0.723465
6 0.723465

Part of the power dataset computed on averages
  n_quad baci_effect alpha    sdSite    sdYear sdSiteYear sdResid n_IA n_IB
1      3           0  0.05 0.7669679 0.2141545  0.4176927       0    1    1
2      6           0  0.05 0.7669679 0.2141545  0.2953534       0    1    1
3      9           0  0.05 0.7669679 0.2141545  0.2411550       0    1    1
4      3           0  0.05 0.7669679 0.2141545  0.4176927       0    1    1
5      6           0  0.05 0.7669679 0.2141545  0.2953534       0    1    1
6      9           0  0.05 0.7669679 0.2141545  0.2411550       0    1    1
  n_CA n_CB ns_I ns_C ny_B ny_A mu_IA mu_IB mu_CA mu_CB baci   baci.se dfdenom
1    1    1    3    3    3    2     0     0     0     0    0 0.3113298      19
2    1    1    3    3    3    2     0     0     0     0    0 0.2201434      19
3    1    1    3    3    3    2     0     0     0     0    0 0.1797463      19
4    1    1    3    3    3    3     0     0     0     0    0 0.2784618      24
5    1    1    3    3    3    3     0     0     0     0    0 0.1969022      24
6    1    1    3    3    3    3     0     0     0     0    0 0.1607700      24
  ncp    Fcrit power    Tcrit os.power1 os.power2
1   0 4.380750  0.05 1.729133      0.05      0.05
2   0 4.380750  0.05 1.729133      0.05      0.05
3   0 4.380750  0.05 1.729133      0.05      0.05
4   0 4.259677  0.05 1.710882      0.05      0.05
5   0 4.259677  0.05 1.710882      0.05      0.05
6   0 4.259677  0.05 1.710882      0.05      0.05

This gives the same power plot as in the previous section:

power.plot <- ggplot(data=power.avg, aes(x=baci_effect, y=power, color=as.factor(n_quad)))+
  ggtitle("Estimated power computed using short cut",
          subtitle=paste("alpha: ", power$alpha[1]))+
  geom_point()+
  geom_line()+
  ylim(0,1)+
  geom_hline(yintercept=0.8, color="blue")+
  xlab("BACI effect size (log scale)")+
  scale_color_discrete(name="# quadrats")+
  facet_wrap(~ny_A, ncol=2, labeller=label_both)
power.plot

1.11 Obtaining estimates of variance components from preliminary data

In the examples used previously, the variance components were obtained after the project was initiated and monitoring was ongoing. This might be useful when a sister project (already completed) is used as an example to obtain estimates of the variance components.

However, in most cases, the power/sample size determinations are performed before the project is initiated. In this case, you would have some preliminary data (e.g., from the “before” period) and wish to use this to plan your BACI study.

In most cases, the site.year interaction variance component (e.g., sdSiteYear) is the “limiting” factor that determines power. The sub-sampling variance (i.e., within a site.year) is easily “overcome” by doing more sampling within a year. However, the site.year interaction variance component affects the number of years before/after project initiation and number of sites impact/control that are needed. It turns out that you need at least two years of data over two sites to estimate the Site.Year interaction variance component

The Site and Year variance components “cancel” when estimating the BACI contrast because Site and Year serve as blocking factors, i.e., all years (before/after) are measured in each site, and all sites are measured in all years.

The model for this preliminary data is

\[\log{Y} \sim \mathit{Site(R)} + \mathit{Year(R)} + \mathit{Site:Year(R)} + \mathit{other-covariates}\]

Notice that no SiteClass term are needed because at this point, the distinction between Impact and Control sites is moot. Similarly, there is no term is needed because the project has not yet started in this preliminary data.

This model will give you estimates of the variance components that can be feed into the power/sample size projections.

With limited data, it is possible that the estimate of the Site.Year variance component is zero. While you could use this in the power/sample size estimates, it is highly unlikely that the site.year interaction variance is zero. I suggest that you use some small value (e.g., 0.1) for the Site.Year variance component in this case.

1.12 Multi-Period BACI designs

1.12.1 Introduction

In some cases, design with more than two periods are of interest. For example, the study might divide the project period into Before, During and After construction.

One could approach this study by choosing two periods and doing a standard BACI analysis. However, this would “ignore” useful data from the other period and each pairing would have its own variance components and may not be consistent over the periods of interest.

We can fit an omnibus analysis and extract the pairs of periods for comparison that uses all of the data and would use one set of variance components.

1.12.2 Simulated data

Simulated data will be used with 2 Impact and 2 Control sites and 3, 3, and 4 years in the Before, During, and After periods (labelled as Before, P2, and P3) so that the periods sort properly.

# generate some test data

set.seed(324234)
mydata <- expand.grid(Site=1:4, Year=1:10)
mydata$SiteClass <- car::recode(mydata$Site,
                        " 1:2='Impact'; 3:4='Control' ")
mydata$Period <- car::recode(mydata$Year,
                        " 1:3='Before'; 4:6='P2'; 7:10='P3'")

site.effect <- data.frame(Site=unique(mydata$Site), site.effect=rnorm(length(unique(mydata$Site)), sd=1))
year.effect <- data.frame(Year=unique(mydata$Year), year.effect=rnorm(length(unique(mydata$Year)), sd=2))

mydata <- merge(mydata, site.effect)
mydata <- merge(mydata, year.effect)


mydata$Y <- 10 + mydata$site.effect + mydata$year.effect + rnorm(nrow(mydata))

xtabs(~Period+Year, data=mydata)

        Year
Period   1 2 3 4 5 6 7 8 9 10
  Before 4 4 4 0 0 0 0 0 0  0
  P2     0 0 0 4 4 4 0 0 0  0
  P3     0 0 0 0 0 0 4 4 4  4

xtabs(~SiteClass+Site,  data=mydata)

         Site
SiteClass  1  2  3  4
  Control  0  0 10 10
  Impact  10 10  0  0

1.12.3 Analysis

The same BACI model is fit to the yearly values.

\[ \log{Y}= \mathit{SiteClass} + \mathit{Period} + \mathit{SiteClass:Period} + \mathit{Site(R)} + \mathit{Year(R)}\] If subsampling was involved (e.g., multiple observations in each site.year ) then the Site:Year(R) interaction would also be needed.

This is fit using the lmer() function from the lmertest package.

# fit a model with more than 1 period effect
fit <- lmerTest::lmer( log(Y)~ SiteClass+Period+SiteClass:Period + (1|Site) +(1|Year), data=mydata)

The effect tests are:

anova(fit, ddf='Kenward-Roger')

Type III Analysis of Variance Table with Kenward-Roger's method
                   Sum Sq  Mean Sq NumDF   DenDF F value Pr(>F)
SiteClass        0.024964 0.024964     1  2.0094  1.3514 0.3645
Period           0.029813 0.014907     2  7.0000  0.8069 0.4838
SiteClass:Period 0.073184 0.036592     2 25.0000  1.9808 0.1590

The hypothesis test for the SiteClass:Period effect (the BACI effect) is a test for parallelism across all 3 periods. If this has a small P-value, there may be non-parallelism somewhere among the pairs of periods, but it is unknown where.

The expected marginal means are:

# expected marginal means
fit.inter.emmo <- emmeans::emmeans(fit, ~SiteClass:Period)
# Look at the ordering of terms in the interaction contrast in the summary()
summary(fit.inter.emmo, infer=c(TRUE,FALSE))

 SiteClass Period emmean    SE   df lower.CL upper.CL
 Control   Before   2.28 0.163 9.25     1.91     2.64
 Impact    Before   2.02 0.163 9.25     1.65     2.39
 Control   P2       2.45 0.163 9.25     2.08     2.82
 Impact    P2       2.33 0.163 9.25     1.96     2.70
 Control   P3       2.33 0.147 8.49     1.99     2.66
 Impact    P3       2.28 0.147 8.49     1.94     2.61

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

The comparisons among pairs of periods can be isolated using a contrast statement. Presumably, we are interested in the comparing P2 vs Before and P3 vs Before but the P2 vs P3 contrast could also be added.

Notice that you need to know the ordering of the SiteClass:Period terms as shown in the previous table:

# Set up the contrast. Refer to ordering in previous table

baci.list <- list(
   p2.vs.before=c(1, -1, -1, 1, 0 ,0),
   p3.vs.before=c(1, -1,  0, 0,-1 ,1))

We check that the joint pair of contrasts matches the interaction effect test above:

# check that df match up,i.e. this is a partition of the interaction contrasts

test(contrast(fit.inter.emmo, baci.list), joint=TRUE, verbose=TRUE)

Joint test of the following linear predictions
      contrast equals
1 p2.vs.before      0
2 p3.vs.before      0

 df1 df2 F.ratio p.value
   2  25   1.981  0.1590

Notice that the P-value of the joint tests matches that from the test of no interaction (as it must).

We can now get the individual contrasts and estimates of the BACI effect for each paired comparison.

# now get the individual contrasts
summary(contrast(fit.inter.emmo, baci.list), joint=FALSE, infer=TRUE)

 contrast     estimate    SE df lower.CL upper.CL t.ratio p.value
 p2.vs.before    0.131 0.111 25 -0.09733     0.36   1.182  0.2482
 p3.vs.before    0.206 0.104 25 -0.00772     0.42   1.985  0.0582

Degrees-of-freedom method: kenward-roger 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95

1.12.4 Variance components

The estimated variance components are found in the usual way

VarCorr(fit)

 Groups   Name        Std.Dev.
 Year     (Intercept) 0.22742 
 Site     (Intercept) 0.11306 
 Residual             0.13592

A power analysis is more difficult because there are two (or more) BACI contrasts of interest.

2 Bayesian analysis of BACI designs.

The previous sections used the traditional frequentist approach in the analysis of the BACI design.

Consider the fry example analyzed earlier. We will illustrate the traditional hypothesis testing method of analysis and then repeat the analysis using a Bayesian approach.

All model fitting should also include model assessement (e.g., residual plots and other diagnostic plots in the case of likelihood-based fitting methods). Similar assessment tools exists for Bayesian models, but are not discussed here.

2.1 Study design

This example is based (loosely) on a consulting project from an Independent Power Producer who was interested in monitoring the effects of an in-stream hydroelectric project.

The response variable for the project was the minnow density at different locations in the stream.

2.1.1 Sampling protocol

The monitoring design has the river divided into six segments of which three are upstream of the diversion and three are downstream of the diversion. In each segment, several sites have been located where minnow fry congregate. In each of the sites, minnow traps are set for various lengths of time.

At the end of the soaking period, the traps are removed and the number of minnows are counted and classified by species. The counts are standardized to a common period of time to adjust for the different soak-time. [This could be done directly in the analysis by using the soak-time as a covariate, but the soak-time data was not available.]

An initial pre-project monitoring study was run in 2000 to 2002. The project became operational in late 2002, and post-project monitoring continued in 2003 and 2004.

There are 5 years of data, with 3 years measured before the project and with 2 years after the project was implemented.

        Year
Period   2000 2001 2002 2003 2004
  After     0    0    0   10    9
  Before   12   11   10    0    0

There are 6 Sites measured, with 3 sites measured upstream of the project (Controls) with 3 sites measured downstream of the project (Impact).

            Site
SiteClass     A  B  C  D  E  F
  Downstream  0  0  0  3 11 10
  Upstream   11 11  6  0  0  0

A each site.year combination, a number of fry traps were set and the number of fry collected in a standardized amount of time was recorded:

    Year
Site 2000 2001 2002 2003 2004
   A    2    2    3    2    2
   B    2    3    2    2    2
   C    2    1    1    1    1
   D    1    1    0    1    0
   E    3    2    2    2    2
   F    2    2    2    2    2

Notice that site D was not measured in 2002 and 2004 and that the number of traps is not equal at each site.year.

2.1.2 Preliminary plot

A plot of the mean \(\log{fry}\) over time is shown in Figure 1:

Figure 1: Changes in mean log(fry) density over time

2.2 Analyses

We will repeat the fit the BACI model using the standard linear mixed model and then repeat the analysis using a Bayesian approach to contrast the two results.

2.2.1 Standard (frequentist) linear mixed model

An extended-BACI model was fit:

\[log(fry) \sim SiteClass + Period + SiteClass:Period + Site(R) + Year(R) + Year:Site(R)\] where

\(SiteClass\) is the effect of impact (downstream) vs control (upstream)
\(Period\) is the effect of the period (before vs after)
\(SiteClass:Period\) is the BACI effect, i.e. the differential change between before and after for impact vs control.
\(Site\) is the (random) effect of site
\(Year\) is the (random) effect of year
\(Year:Site\) is the (random) site.year interaction

The residual error is implicit.

We use the lmerTest package (Kuznetsova et al. 2017 ) in R (R Core Team, 2025) to fit the BACI model using REML

# Notice that we use the YearF variable in the random effects
baci.fit <- lmerTest::lmer(logfry ~ Period + SiteClass + Period:SiteClass +
                      (1|YearF)+ (1|Site)+(1|YearF:Site), data=fry)

The ANOVA table is shown in Table 1

Table 1: ANOVA table from standard BACI fit

term	sumsq	meansq	NumDF	DenDF	statistic	p.value
Period	0.101	0.101	1	3.1	0.19	0.690
SiteClass	0.910	0.910	1	4.0	1.74	0.257
Period:SiteClass	0.007	0.007	1	16.1	0.01	0.912

There is no evidence of an impact (Period:SiteClass, BACI, interaction p.value is very large at 0.912).

The estimated marginal means for each SiteClass.Period combination is shown in Table 2

Table 2: Estimated marginal means from standard BACI fit.

SiteClass	Period	emmean	SE	df	lower.CL	upper.CL
Downstream	After	5.48	0.54	6.2	4.17	6.78
Upstream	After	4.62	0.52	5.7	3.32	5.91
Downstream	Before	5.63	0.51	5.1	4.33	6.92
Upstream	Before	4.72	0.49	4.7	3.42	6.02

The estimated BACI effect (the differential change between before vs after for upstream vs downstream ) is computed as (5.48 - 5.63) \(-\) (4.62 - 4.72) is shown in Table 3

Table 3: Estimated BACI contrast

contrast	estimate	SE	df	lower.CL	upper.CL	t.ratio	p.value
baci	-0.048	0.423	16.1	-0.945	0.850	-0.112	0.912

Notice that the p-value of the BACI contrast matches the p-value from the SiteClass:Period interaction (as it must).

The estimated differential change in the mean is around -0.048 with a 95% confidence interval ranging from -0.945 to 0.85 which is very wide.

Because the 95% confidence interval includes 0, there is no evidence of a BACI effect. However, the 95% confidence interval also includes the value of 0.50 (a 50% change), so a project effect of this magnitude cannot be ruled out.

The estimated variance components (used in the power analysis) are shown in Table 4

Table 4: Estimated variance components estimated from standard BACI fit

grp	sdcor
YearF:Site	0.00
Site	0.77
YearF	0.21
Residual	0.72

The Year.Site interaction is very small (estimated value of 0) which indicates that the trend in the sites are close to being parallel.

2.2.2 Bayesian analysis

A primer on the use of Bayesian models for (linear) models is available in Kruschke and Liddell (2018).

A Bayesian analysis melds two components in the analysis

the prior distribution on the parameters of the model
the likelihood function describing the data

into a posterior belief about the parameters.

The actual computer code to fit the Bayesian model is:

baci.bayesian <- brms::brm(logfry ~ Period + SiteClass + Period:SiteClass +
                       (1|YearF)+ (1|Site)+(1|YearF:Site), 
                       data=fry
                       ,sample_prior=TRUE
                       ,seed=23432432)

Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 17.0.0 (clang-1700.6.3.2)’
using SDK: ‘NA’
clang -arch x86_64 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Users/cschwarz/Rlibs/Rcpp/include/"  -I"/Users/cschwarz/Rlibs/RcppEigen/include/"  -I"/Users/cschwarz/Rlibs/RcppEigen/include/unsupported"  -I"/Users/cschwarz/Rlibs/BH/include" -I"/Users/cschwarz/Rlibs/StanHeaders/include/src/"  -I"/Users/cschwarz/Rlibs/StanHeaders/include/"  -I"/Users/cschwarz/Rlibs/RcppParallel/include/"  -I"/Users/cschwarz/Rlibs/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Users/cschwarz/Rlibs/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/x86_64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Users/cschwarz/Rlibs/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Users/cschwarz/Rlibs/RcppEigen/include/Eigen/Dense:1:
In file included from /Users/cschwarz/Rlibs/RcppEigen/include/Eigen/Core:19:
/Users/cschwarz/Rlibs/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
  679 | #include <cmath>
      |          ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
Warning message:
In system2("xcrun", "--show-sdk-version", TRUE, TRUE) :
  running command ''xcrun' --show-sdk-version 2>&1' had status 1


SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 4.5e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.45 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.441 seconds (Warm-up)
Chain 1:                0.314 seconds (Sampling)
Chain 1:                0.755 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 2.9e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.29 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.472 seconds (Warm-up)
Chain 2:                0.315 seconds (Sampling)
Chain 2:                0.787 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1.4e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.441 seconds (Warm-up)
Chain 3:                0.48 seconds (Sampling)
Chain 3:                0.921 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1.4e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.414 seconds (Warm-up)
Chain 4:                0.334 seconds (Sampling)
Chain 4:                0.748 seconds (Total)
Chain 4:

2.2.2.1 The likelihood function

The likelihood function is the basic BACI model:

\[log(fry) \sim SiteClass + Period + SiteClass:Period + Site(R) + Year(R) + Year:Site(R)\] We assume a normal distribution for the random effects and residual noise, i.e.,

the Site random effects follow a Normal distribution with mean 0 and standard deviation \(\sigma_{site}\)
the Year random effects follow a Normal distribution with mean 0 and standard deviation \(\sigma_{year}\)
the Site.Year random effects follow a Normal distribution with mean 0 and standard deviation \(\sigma_{site.year}\)
the Residual random effects follow a Normal distribution with mean 0 and standard deviation \(\sigma_{residual}\)

2.2.2.2 Prior distributions

In a Bayesian framework, all parameters of the statistical model must have prior distributions, and in the case of the BACI linear mixed model, the prior distributions fall into one of two categories:

distributions for the standard deviation of random effect of Year, Site, Year.Site, and Residual; and
distributions for the fixed-effect parameters, i.e. the SiteClass, Period, and SiteClass:Period interactions.

The choice of priors is somewhat arbitrary and a full Bayesian analysis must consider the sensitivity of the final results to the choice of prior distributions.

One choice for prior distribution for the standard deviation of the random effects are gamma distributions (a skewed distribution that only takes positive values) as illustrated in Figure 2

vc.priorsamps.wide<-brms::prior_draws(baci.bayesian)

vc.priorsamps.long <- tidyr::pivot_longer(vc.priorsamps.wide,
                                          cols=names(vc.priorsamps.wide)[-1],
                                         names_to="vc",
                                         values_to="prior")

ggplot(data=vc.priorsamps.long, aes(x=prior, y=after_stat(density)))+
   geom_histogram(alpha=0.5)+
   geom_density()+
   facet_wrap(~vc, ncol=2,scales="free")+
   xlim(0,20)

Figure 2: Prior distribution for the SD in the BACI model

We see that the prior distributions have higher densities towards 0 but allow quite large values (the right tails). In particular, the prior distribution for the standard deviation for the Year.Site interaction will pull the final estimate away from 0.

A common choice for the prior distribution of fixed effects are “non-informative” (flat) priors as shown in Figure 3

# We will generate a flat prior with a mean of 0 and an sd of 10 for each fixed parameter
vc.priorsamp.fixed <- plyr::ldply(c("Intercept","SiteClass","Period","SiteClass:Period"), function(x){
     data.frame(fixed=x, prior=rnorm(50000,mean=0, sd=50))
})


ggplot(data=vc.priorsamp.fixed, aes(x=prior, y=after_stat(density)))+
   geom_histogram(alpha=0.5)+
   geom_density()+
   facet_wrap(~fixed, ncol=2,scales="free")+
   xlim(-20,20)

Figure 3: Prior distribution for the fixed effects in the BACI model

These flat priors indicate that virtually any values between -20 and 20 is plausible as the value for each fixed effect.

Here is where the Bayesian paradigm provides an advantage over pure likelihood approaches. For example, you may have a strong prior belief that the project has no effect. Consequently, you may decide that the prior distribution for the BACI contrast should have more weight around 0 (e.g., peaked around 0). The danger is that if your prior is incorrect, it will take much more data to “overwhelm” this (strong) incorrect prior and provide a realistic assessment of the project effects.

2.2.2.3 Posterior belief - combining the prior and likelihood distribution

The prior distribution is combined with the likelihood of the BACI model to produce posterior belief distributions. The posterior distribution is a summary of how the prior distribution is updated with information from the actual data.

Ideally, the data will overwhelm the prior and the choice of prior will be immaterial. In the case of sparse data, the prior distribution can have a large impact on the posterior-belief distribution. In a full Bayesian analysis the sensitivity of the posterior-belief distribution to the choice of priors needs to be investigated.

In all but trivial cases, the combination of prior and data is performed using Markov Chain Monte Carlo (MCMC) sampling. This can be implemented for linear mixed models using the brms package (Bürkner, 2017) in R. There are many other packages that can be used as well.

The result is a sample from the posterior-belief distribution for all relevant parts of the model. Summary statistics on the various posterior-belief distributions provide information on “effect sizes”, and the uncertainty of the “effect estimates”. In most cases, a sample of 2000-5000 values from the posterior-belief distribution is adequate.

For example, Figure 4 shows the posterior belief distribution for the BACI contrast, the prime term of interest

post.baci <- brms::as_draws_df(baci.bayesian, variable="b_PeriodBefore:SiteClassUpstream")
post.baci <- plyr::rename(post.baci, c("b_PeriodBefore:SiteClassUpstream"="BACI"))  
cat("Samples from the posterior\n")

Samples from the posterior

head(post.baci)

# A draws_df: 6 iterations, 1 chains, and 1 variables
    BACI
1 -0.358
2 -0.044
3  0.235
4 -0.136
5  0.018
6 -0.089
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}

ggplot(data=post.baci, aes(x=BACI, y=after_stat(density)))+
  ggtitle(#"Posterior distribution of BACI contrast"
    " ", subtitle="Red lines indicate 95% credible interval")+
  geom_histogram(alpha=0.5)+
  geom_density()+
  geom_vline(xintercept=0)+
  geom_vline(xintercept=quantile(post.baci$BACI, prob=c(0.025,0.975)), color="red")

Figure 4: Posterior belief distribution for the BACI contrast

The posterior-belief distribution appears to be slightly peaked just below 0 (the value of 0 indicates no project effect). The red lines indicate the 95% credible interval for the BACI effect, i.e. the bracket a region that contains 95% of our belief.

The wide distribution indicates that the posterior belief is quite uncertain. We can summarize the posterior belief distribution using summary statistics (mean, sd, and 95% credible interval) are:

c(mean=mean(post.baci$BACI),
  sd  =sd  (post.baci$BACI),
  lci =quantile(post.baci$BACI, prob=0.025),
  uci =quantile(post.baci$BACI, prob=0.975))

       mean          sd    lci.2.5%   uci.97.5% 
-0.04670305  0.47413064 -0.96217880  0.89319357

We see that the

mean of the posterior belief distribution is close to the estimate of the BACI contrast from the standard linear model fit,
the SD is close to the baci fit’s SE, and
the upper and lower credible intervals are comparable to the 95% confidence intervals.

This is no accident and is an artefact of the use of noninformative uniform priors and the fairly large dataset where the data “overwhelms” the prior.

So the Bayesian analysis, so far, has not presented anything new. However, we can now made statements about our prior belief of an effect by looking at the previous histogram and computing the proportion of posterior samples in various ways.

For example, we can now say that there is 0.29 posterior belief that the absolute(BACI effect) is 0.5 (a 50% differential in the means) or greater (Figure 5).

post.baci$BACI.g5 <- abs(post.baci$BACI) >= .5
belief <- mean(abs(post.baci$BACI) > .5)

ggplot(data=post.baci, aes(x=BACI))+
  ggtitle("Posterior belief that abs(baci contrast)> .5 (corresponding to large effect)",
       subtitle="Blue are is posterior belief that abs(baci contrast) > .5")+
  geom_histogram(alpha=0.5, aes(fill=as.factor(BACI.g5)), breaks=seq(-2,2,.1))+
  #geom_density()+
  geom_vline(xintercept=0)+
  geom_vline(xintercept=c(-.5,.5), color="red")+
  scale_fill_discrete(name="abs(baci) > .5")+
  annotate("text", label=paste0("Posterior belief abs(baci)>.5: ", round(belief,2)),
             x=-Inf, y=Inf, hjust=0, vjust=1.5)

Figure 5: Posterior belief that abs(baci contrast)> .5 (corresponding to large effect)

Similarly, there is a 0.18 posterior belief that the absolute(BACI effect) is 0.1 or less (Figure 6).

post.baci$BACI.l1 <- abs(post.baci$BACI) <= .1
belief <- mean(abs(post.baci$BACI) <= .1)

ggplot(data=post.baci, aes(x=BACI))+
  ggtitle("Posterior belief that abs(baci contrast) < .1 (corresponding to small effect)",
       subtitle="Blue are is posterior belief that abs(baci contrast) < .1")+
  geom_histogram(alpha=0.5, aes(fill=as.factor(BACI.l1)), breaks=seq(-2,2,.1))+
  #geom_density()+
  geom_vline(xintercept=0)+
  geom_vline(xintercept=c(-.1,.1), color="red")+
  scale_fill_discrete(name="abs(baci) < .1")+
  annotate("text", label=paste0("Posterior belief abs(baci)<.1: ", round(belief,2)),
            x=-Inf, y=Inf, hjust=0, vjust=1.5)

Figure 6: Posterior belief that abs(baci contrast) < .1 (corresponding to small effect)

So unlike the classical BACI analysis, you can quantify the belief of a large impact or small impact in a natural way. This is the prime advantage of a Bayesian analysis.

Confidence intervals cannot be used in this way. As indicated by Kruschke and Liddell (2018):

Highest density interval vs. confidence interval Here we reiterate some essential differences between a Bayesian credible interval (HDI) and a frequentist confidence interval (CI). An important quality of the posterior 95% HDI is that it really does indicate the 95% most probable values of the parameter, given the data. The posterior distribution depends only on the actually observed data (and the prior), and does not depend on the stopping or testing intentions of the analyst. The frequentist CI is often misinterpreted as if it were a posterior distribution, because what analysts intuitively want from their analysis is the Bayesian posterior distribution, as we discuss more later.

The Bayesian credible interval does not depend on how many hypothesis tests are being done (i.e., there are no corrections for multiple testing such as Tukey’s method).

However, it makes no sense to ask “What is posterior belief of no-effect?” because this would ask for the probability that the baci effect = 0 exactly which is 0. Similarly, it makes no sense to ask “What is the posterior belief of an effect?” because this would ask for the probability that the baci effect \(\ne\) 0 which is 1. Only posterior beliefs of intervals are sensible.

We can get a complete summary of a Bayesian fit for all parameters of the model (not shown) in similar ways.

# summarize the bayesian fit
temp <- summary(baci.bayesian)
temp

 Family: gaussian 
  Links: mu = identity 
Formula: logfry ~ Period + SiteClass + Period:SiteClass + (1 | YearF) + (1 | Site) + (1 | YearF:Site) 
   Data: fry (Number of observations: 52) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~Site (Number of levels: 6) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     1.07      0.52     0.45     2.42 1.00     1598     1774

~YearF (Number of levels: 5) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.41      0.38     0.02     1.45 1.00     1200     1682

~YearF:Site (Number of levels: 28) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.15      0.12     0.01     0.43 1.00     2003     1554

Regression Coefficients:
                               Estimate Est.Error l-95% CI u-95% CI Rhat
Intercept                          5.44      0.80     3.76     7.01 1.00
PeriodBefore                       0.17      0.58    -0.93     1.41 1.00
SiteClassUpstream                 -0.85      1.03    -2.82     1.39 1.00
PeriodBefore:SiteClassUpstream    -0.05      0.47    -0.96     0.89 1.00
                               Bulk_ESS Tail_ESS
Intercept                          1497     1395
PeriodBefore                       2094     1136
SiteClassUpstream                  1506     1736
PeriodBefore:SiteClassUpstream     3380     2239

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.75      0.09     0.60     0.94 1.00     3617     2980

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

2.2.2.4 Variance components

We can also get the posterior belief distributions of the variance effects and summarize the posterior-beliefs in Table 5.

vc.bayesian.1 <- plyr::ldply(temp$random, function(x){
     x
})
vc.bayesian.2 <- temp$spec_pars
vc.bayesian.2$.id <- "Residual"

vc.bayesian <- plyr::rbind.fill(vc.bayesian.1, vc.bayesian.2)
#vc.bayesian

Table 5: Summary statistics about posterior beliefs about variance components

.id	Estimate	Est.Error	l-95% CI	u-95% CI	Rhat	Bulk_ESS	Tail_ESS
Site	1.069	0.519	0.446	2.417	1.00	1,598.03	1,774.17
YearF	0.413	0.384	0.017	1.450	1.00	1,200.39	1,682.21
YearF:Site	0.147	0.117	0.006	0.426	1.00	2,002.67	1,553.78
Residual	0.750	0.086	0.602	0.937	1.00	3,617.34	2,979.77

The mean of the posteriors fairly different from the classical baci fit. This is due, in part, to the influence of the prior distribution for the standard deviations seen earlier. In particular, notice how the estimate variance component for the year.site interaction has been pulled away from 0.

Fortunately, the Site and Year variance components “cancel” when estimating the BACI contrast and so have no impact on the result of interest.

A fuller Bayesian analysis would investigate the sensitivity of the results to different choices of prior distributions

The uncertainty of the variance components (the Est.Error in the above table) is NOT available from a classical baci fit (but could be obtained with some work).

2.2.3 Summary

As seen above, the Bayesian analysis will give similar results to the standard linear mixed model results. This not surprising because with sufficient data, the prior distributions contribution to the posterior belief is negligible.

The key advantage to the Bayesian analysis is the ability of quantity the belief about the size of the BACI effect in a natural way.

2.3 More on power and sample size

2.3.1 Based on Power in Standard linear mixed model

When using the standard linear mixed model, sample size is determined through a power analysis. In the power analysis, the variable you need to specify is the effect size of interest (this is hard). Then you consider what combinations of design parameters (e.g., number of sites, number of years, number of fry traps) will lead to a high power (typically 80%) of detecting this effect. Often graphs are displayed from which once can consider the different design parameters and their influence on power (Figure 7)

The Year.Site variance component was estimated as 0 in the linear mixed model analysis; this is unrealistic, so a value of .1 was used.

Figure 7: Estimates of BACI power under various design consideration

Here, for example, an 80% power to detect a .50 diffential change is obtained with 9 years of monitoring after project implementation (bottom facet) using 3 traps per site.year and 6 sites upstream and 6 sites downsteam, or 6 traps per site.year and 3 sites upstream and 3 sites downstream.

One consideration for the choice among the the options depends on the relative costs of the various options, e.g. more traps per site.year is relatively inexpensive.

Many of the parameters that affect the sample size computation are somewhat arbitrary. For example, what is the critical effect size; why was .5 used rather than .6 or 4? Why choose alpha=0.5 and power .80? Why not use alpha=.10 and power of 90%?

2.3.2 Based on size of confidence interval for BACI effect in Standard linear mixed model

The choice of effect size in the power analysis and the target power is some arbitrary. An alternative approach is to find sample sizes that give a desired degree of precision in the BACI estimate. For example, our goal might be that the 95% confidence has width less than some specified value.

This goal implies that regardless of the actual effect size, the width of the confidence interval is usually narrow, so that we have attained a suitably high precision in the estimate.

Because the 95% confidence interval is directly related to the SE of the estimate, we can specify a target value in terms of the SE. For example, if the 95% confidence interval on the BACI contrast is \(\pm .4\), then the SE of the BACI contrast is about 0.2. Because the BACI analysis is done on the log(response), this corresponds to a “relative” standard error of .2 on the anti-log scale, similar to the target for the individual year.site estimates used in the BACI analysis.

The SE of the BACI contrast can be shown to be: \[[SE(BACI)]^2= V(BACI)= \frac{\sigma^2_{\textit{site-year}}}{ny_B ns_I} + \frac{\sigma^2_{\textit{site-year}}}{ny_B ns_C} + \frac{\sigma^2_{\textit{site-year}}}{ny_A ns_I} + \frac{\sigma^2_{\textit{site-year}}}{ny_A ns_C} + \frac{\sigma^2}{ny_B ns_I n_{IB}} + \frac{\sigma^2}{ny_B ns_C n_{CB}} + \frac{\sigma^2}{ny_A ns_I n_{IA}} + \frac{\sigma^2}{ny_A ns_C n_{CA}}\]

where

\(ny_B\) is the number of years monitored before impact;
\(ny_A\) is the number of years monitored after impact;
\(ns_I\) is the number of sites measured in the Treatment SiteClass;
\(ns_C\) is the number of sites measured in the Control SiteClass;
\(n_{IB}\), \(n_{IA}\), \(n_{CB}\), and \(n_{CA}\) is the number of subsamples taken in each site and year in the treatment-before combination etc.

Notice that the Site or Year variance does not influence the SE of the BACI contrast because every year serves as a block and every site serves as block.

We can then evaluate the SE(BACI) under various design parameters (Figure 8)

Figure 8: Estimates of SE of BACI contrast under various design consideration

Notice how the SE of the BACI contrast does NOT depend on the effect size, nor the alpha level etc.

Again refering to the bottom facet of the plot, a design with 9 years of monitoring after project implementation using 3 traps per site.year and 6 sites upstream and 6 sites downsteam would have a SE of about 0.17, as would a design with 6 traps per site.year and 3 sites upstream and 3 sites downstream. Many designs would have a SE below 0.2 (corresponding to a relative SE of 0.2 on the anti-log scale).

The fact that the design endpoints for 80% power at an effect size of 0.5 and a SE of the BACI contrast of 0.2 are similar is not a coincidence. If the effect size is 0.5, then the BACI estimates will have a sampling distribution around the the value of 0.5. At each value of the sampling distribution, a confidence interval for the BACI effect will be determined (about \(\pm\) 2 SE). We want 80% of these confidence intervals to exclude 0 (to get 80% probability of rejecting the null hypothesis), or that part of the sampling distribution that is more than 1 SE below the effect size of 0.5. [This would actually give 83% of the sample distribution.]. So at -1 SE below 0.5, the lower bound of the 95% ci is another 2 SE below and we want this to be just above 0. THis implies that 0.5 must be around 3 SE away from 0, or that the SE(baci contrast) should be around 0.17 which was seen above.

So as a general rule, the design parameters that lead to 80% power in a BACI design conducted on the log-scale, will correspond to obtaining a SE for the BACI effect of around 0.17 or a .17 relative SE on the anti-log scale.

These results assume that the variance components used in the power/sample size will hold when the actual data are collected, so, in some sense are “average” outcomes.

2.3.3 Bayesian sample size determination

In the Bayesian context, we set as our goal a desired degree of precision in the posterior estimate. For example, our goal might be that the 95% credible has width less than some value, either on average or at least 80% of the time. This goal implies that regardless of what effect size values happen to be emphasized by the posterior distribution, the width of the posterior is usually narrow, so that we have attained a suitably high precision in the estimate.

If interest lies in controlling the average weigth of credible interval, then the results of the previous section are directly applicable because of the high relationship between the values of the endpoints of the credible intervals and the endpoints of confidence intervals, and the high relationship between the SD of the (Bayesian) estimate of the BACI contrast and the SE of the BACI contrast.

In the section targetting the SE of the BACI contrast, no allowances have been made for variation in the variance components in individual data sets even if, on average, the variance components are correct. THis implies that the width of the credible intervals will “vary” around the mean width and so additional analysis is required to ensure that the width is less than some value at least xx% of the time.

This further analysis will require us to simulate many datasets using the specificed variance components and study design parameters (number of traps, number of years, number of site, etc). Then for each simulated dataset, the credible interval for the BACI contrast and the SD of the BACI contrast are found. Then various statistics about the credible interval widths or SD of the BACI contrast can be examined.

For example, consider the following design options considered previously:

a design with 9 years of monitoring after project implementation using 3 traps per site.year and 6 sites upstream and 6 sites downsteam;
a design with 9 years of monitoring after project implementation using 6 traps per site.year and 3 sites upstream and 3 sites downstream.

Summary statistics about the SD of the BACI contrast across simulated dataset are shown in Table 6. Note that the number of simulations was set to a relatively small number to avoid excessive computations when rendering, but normally would be 100+.

Table 6: Summary statistics about BACI SD over simulated datasets under selected design scenarios

scenario	n.sims	Mean BACI SD	Prop of baci SD < .20	mean.baci.ci.width
n.Sites: 3; n.Years.b: 3; n.Years.a: 9; n.traps: 6	20	0.176	1.00	0.69
n.Sites: 6; n.Years.b: 3; n.Years.a: 9; n.traps: 3	20	0.169	1.00	0.66

We see that the mean SD is around 0.17 (as predicted in the previous two sections). This that the mean width of the 95% credible intervals will be about 4x larger as shown in the table.

The SD of the BACI contrast is not that variable with about 90% of the estimated SD across simulated dataset less than 0.20. This implies that there is high probability (assuming variance components etc are realistic) that the relative SD on the anti-log scale will be < 0.20 as well.

Other design scenarios can be examined in a similar fashion. But the message for this example, is that the various methods to obtain a sample size adequate to detect an effect, to ensure that the mean SE is at a certain level, or that most of the credible intervals have a small enough width can lead to similar results (as they must).

3 References

Paul-Christian Bürkner (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80(1), 1-28. doi:10.18637/jss.v080.i01

Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press.

Kruschke, J.K., Liddell, T.M. (2018). The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon Bull Rev 25, 178–206. https://doi.org/10.3758/s13423-016-1221-4

Kuznetsova A, Brockhoff PB, Christensen RHB (2017). “lmerTest Package: Tests in Linear Mixed Effects Models.” Journal of Statistical Software, 82, 1-26. doi: 10.18637/jss.v082.i13 (URL: https://doi.org/10.18637/jss.v082.i13).

Littell, R.C., Milliken, G.A., Stroup, W.W., Wolfinger, R.D., and Schabenberger, O. (2006).
for Mixed Models. 2nd Edition. Chapter 12.

R Core Team (2025). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Stroup, W. W. (1999).
Mixed model procedures to assess power, precision, and sample size in the design of experiments.
Pages 15-24 in Proc. Biopharmaceutical Section. American Statistical Association, Baltimore, MD Available at: http://www.stat.sfu.ca/~cschwarz/Stat-650/Notes/\MyPrograms/Power/stroup-1999-power.pdf.