A COMPARATIVE ANALYSIS OF MACHINE LEARNING MODELS FOR PREDICTING PM2.5 CONCENTRATION IN BEIJING: DATASET CHARACTERISTICS, HYPERPARAMETERS, AND TEMPORAL VARIABILITY

ALKISHRI MOOSA, NUR SYUFIZA AHMED SHUKOR, JABAR H. YOUSIF

Manuscript Title:

A COMPARATIVE ANALYSIS OF MACHINE LEARNING MODELS FOR PREDICTING PM2.5 CONCENTRATION IN BEIJING: DATASET CHARACTERISTICS, HYPERPARAMETERS, AND TEMPORAL VARIABILITY

Author:

ALKISHRI MOOSA, NUR SYUFIZA AHMED SHUKOR, JABAR H. YOUSIF

DOI Number:

DOI:10.5281/zenodo.10153164

Published : 2023-11-10

About the author(s)

1. ALKISHRI MOOSA - PhD Student, Faculty of Communication, Visual Art and Computing, Universiti Selangor Malaysia.
2. NUR SYUFIZA AHMED SHUKOR - Associate Professor, Faculty of Communication, Visual Art and Computing, Universiti Selangor Malaysia.
3. JABAR H. YOUSIF - Associate Professor, Faculty of Computing and IT, Sohar University Oman.

Full Text : PDF

Abstract

Researchers are making great efforts to develop novel, superior, and accurate machine learning (ML) models for air pollution prediction using area characteristics. However, a performance comparison is limited by several factors, as it is almost impossible to compare efficiency with all different models as the number of models proposed by researchers increases and different conditions and datasets are implemented. In addition, the results cannot be generalized to all future time periods because the characteristics of the area and the sources of pollution may vary from time to time. In this paper, we provide a periodic review of the state of the art in the application of ML techniques in the context of PM2.5 concentration prediction, focusing on the analysis of dataset size, hyper parameters, and preprocessing techniques applied in Beijing. Seven articles from 2015 to 2023 with 42 prediction models were collected and reviewed according to the same geographical area and dependent variable, PM2.5. In particular, we examined the hyper parameters of the models to describe the differences in model architecture. We also examine how using the same predictive model in a geographic area for the same pollutant at different times can result in different performance indices. The results show that it is not possible to prefer one predictive model over the other based on its performance at different times, even when applied at exactly the same location and with the same output.

Keywords

Air Pollution Prediction, Beijing, Machine Learning, PM2.5, Time Series Patterns.