Accurate medical image segmentation is vital for disease diagnosis and aids physicians in closely examining relevant regions. This creates a need for AI models that streamline diagnosis and reduce errors.
Current segmentation networks often suffer from excessive parameters, high computational demands (gigaflops), and limited accuracy. To address these challenges, this research proposes a transformer-based architecture utilizing a Swin transformer as an encoder.
The Swin transformer encoder improves segmentation by deeply extracting image features using a sliding window and attention mechanism, enabling precise identification of key structures.
The main objective is to optimize the segmentation architecture by balancing high accuracy with fewer parameters and reduced computational cost.
Within the decoder, a dynamic feature fusion block (DFFB) was designed to strengthen feature extraction by capturing multi-scale details. This allows the model to interpret structural information at various levels and better segment complex areas.
Additionally, a dynamic attention enhancement block further refines features from the DFFB output. It applies both spatial and channel attention mechanisms to highlight critical regions in the images, boosting overall segmentation accuracy.
"This model can extract key image features more accurately by employing sliding windows and an attention mechanism."
"The dynamic feature fusion block enables the model to analyze the structural information of medical images at various levels."
"The dynamic attention enhancement block utilizes spatial and channel attention mechanisms to emphasize key areas, thereby enhancing the model’s overall accuracy."
Author's summary: This study proposes a Swin transformer-based model with dynamic multi-scale attention blocks that enhances medical image segmentation accuracy while optimizing computational efficiency.