Speaker
Description
Starting from the hypothesis “less is more” this research proposes an auto-mated sewer damage detection method employing automated data filtering to reduce the quantity and maximize the utility of existing datasets. The study utilizes the Sewer-ML dataset and proposes data optimisation strategy that reduces the data to high-quality/relevance data by applying CLIP-IQA (Contrastive Language-Image Pre-training - Image Quality Assessment) to filter images based on their quality, as well as prediction reliability and label checks for automated filtering of high-quality data. A damage detection model, based on Residual Networks (ResNets), is trained on the refined da-taset, achieving a significant increase in performance. The Sewer-ML dataset contains more than 1.3 million images; however, many of these are low-quality or misrepresented, which can negatively impact training outcomes. Our results demonstrate that using less than 2% of the dataset—carefully selected through automated filtering—can yield performance comparable to models trained on the full dataset, validating the effectiveness of the pro-posed method.