Speaker
Description
Quick and relatively accurate computer vision solutions can be built by using machine learning. Even then, the solutions are often imperfect and depend a lot on the quality of the datasets they are trained on since the final model can replicate problems found in the datasets as results. We propose a solution for correcting model results and dataset annotations using a recursive model known as a neural cellular automaton (NCA).
NCA combine convolutional neural networks with the theory behind cellular automata, creating a small model that can be three layers deep at most using only convolutional layers. This model was chosen for the task due to its proven ability to restore objects within an image to their desired state and its small size; making it an easy attachment to any prebuilt solution/model.
Our experiments for the task focused on improving the annotations and results for segmentation and depth estimation datasets and models, respectively. We experimented on datasets for medical images presenting segmentations (Kvasir-SEG) and depth estimation (EndoSLAM) by seeing if it can improve the quality of results of various model variations for YOLOv5, YOLOV8, MiDaS and Depth Anything. The experimentation focuses on bringing the results of the smaller variants of those models closer to those of the larger ones, without retraining the model. NCA proves to have the ability to improve the initial results of the models using a 4-channel image as input represented by the concatenation of the initial image and the model output, which changes during every NCA iteration.