Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size
2024

Comparing Algorithms for Filling in Missing Weather Data

Sample size: 45 publication Evidence: moderate

Author Information

Author(s): de Souza Valter Cesar, Rodrigues Sergio Augusto, Filho Luís Roberto Almeida Gabriel

Primary Institution: São Paulo State University (Unesp)

Hypothesis

This study aims to evaluate the performance of alternative multivariate procedures for principal component analysis (PCA) in imputing missing data in meteorological time series.

Conclusion

The NIPALS-PCA and EM-PCA methods are effective for imputing missing reference evapotranspiration data, especially in scenarios with lower percentages of missing data.

Supporting Evidence

  • NIPALS-PCA showed the lowest MAPE of 15.4% in the 10% missing data scenario.
  • EM-PCA performed best in the 50% missing data scenario with a MAPE of 19.1%.
  • Both NIPALS-PCA and EM-PCA demonstrated good results in imputation with nRMSE between 10% and 20%.

Takeaway

This study looked at how to fill in missing weather data using different methods, finding that some methods work better than others depending on how much data is missing.

Methodology

The study used simulation to create scenarios of missing data and compared the performance of NIPALS-PCA, EM-PCA, and simple mean imputation across different percentages of missing data.

Limitations

The results may not be generalizable to situations where missing values occur in a non-random manner, and the initial data completion method may have influenced the outcomes.

Participant Demographics

Data collected from 45 automatic weather stations in the São Paulo region, Brazil.

Digital Object Identifier (DOI)

10.1371/journal.pone.0315574

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication