An investigation about pretrainings for the multi-modal sensor data

Mashita, Tomohiro; Tamata, Kenshiro; Ioka, Ryota; Itoh, Ryo; Matsuzaki, Hiroki; Miyake, Toshihide

doi:10.22323/1.458.0027

Abstract

This paper investigates the effect of pretraining and fine-tuning for a multi-modal dataset. The detaset used in this study is accumulated in a garbage disposal facility for the facility control and consists of 25000 sequential images and corresponding sensor values. The main task for this dataset is to classify the state of garbage incineration from an input image for the combustion state control. In this kind of task, pretraining with an unsupervised dataset and fine-tuning with a small supervised dataset is a typical and effective approach to reducing the costs of making supervised data. To find effective pretraining, we investigated and compared some pretraining with the sensor values and an autoencoder. Moreover, we compared some sensor selection methods for pretraining with sensors. The results show the performance and discussion about fine-tuned models with frozen and unfrozen pretraining parameters and the sensor selection.