Why less complexity produces better forecasts: an independent data evaluation of kelp habitat models
Understanding how species are distributed in the environment is increasingly important for natural resource management, particularly for keystone and habitat – forming species, and those of conservation concern. Habitat suitability models are fundamental to developing this understanding; however their use in management continues to be limited due to often‐vague model objectives and inadequate evaluation methods. Along the Northeast Pacific coast, canopy kelps (Macrocystis pyrifera and Nereocystis luetkeana) provide biogenic habitat and considerable primary production to nearshore ecosystems. We investigated the distribution of these species by examining a series of increasingly complex habitat suitability models ranging from process‐based models based on species’ ecology to complex generalised additive models applied to purpose‐collected survey data. Seeking empirical limits to model complexity, we explored the relationship between model complexity and forecast skill, measured using both cross‐validation and independent data evaluation. Our analysis confirmed the importance of predictors used in models of coastal kelp distributions developed elsewhere (i.e. depth, bottom type, bottom slope, and exposure); it also identified additional important factors including salinity, and potential interactions between exposure and salinity, and slope and tidal energy. Comparative results showed how cross‐validation can lead to over‐fitting, while independent data evaluation clearly identified the appropriate model complexity for generating habitat forecasts. Our results also illustrate that, depending on the evaluation data, predictions from simpler models can out‐perform those from more complex models. Collectively, the insights from evaluating multiple models with multiple data sets contribute to the holistic assessment of model forecast skill. The continued development of methods and metrics for evaluating model forecasts with independent data, and the explicit consideration of model objectives and assumptions, promise to increase the utility of model forecasts to decision makers.