Balance of predictor variables in partial least squares regression

April 19, 2018 @Ranae Ranae Dietzel

@Ranae wrote:

I am using partial least squares regression (PLS) to model the relative effects of soil and weather variables on the magnitude of an annual phenomenon, nitrous oxide emissions. I am doing this on an annual basis across many sites.

Soils at each site are different, but soils within each site stay the same every year.

Weather is different at every site and different from year to year.

So, I have 56 responses (nitrous oxide), 56 corresponding weather predictors (one for each year), but only 10 soil predictors (one for each soil type).

I am using the pls package in R. I think I did an okay job pre-processing my data (BoxCox, center, scale). I end up with 17 columns of soil properties (the properties repeated for each year, not unique rows), 5 columns of weather variables (each row unique) and my dependent variable nitrous oxide (each row unique).

I just need one line of code:

plsFit<-plsr(nitrous_oxide ~ ., validation = "LOO", data=all_data)

Everything runs great, but I can't help but feel the lack of balance.

Should I really be treating soil and weather variables in the same way?

Applied Predictive Modeling has been extremely helpful in trying to do what I want to do, but boy do I have a lot of questions specific to my dataset.

Posts: 5

Participants: 2

Read full topic

Previous Article
How do you pronounce `tbl`
How do you pronounce `tbl`

@jdlong wrote: I'm presenting on dplyr this week and I'm trying to make sure I h...

Next Article
What's your favorite intro to R?
What's your favorite intro to R?

@krose wrote: What is the best introduction to R for beginners with no previous ...