Introduction
Lung cancer remains the leading cause of cancer-related mortality worldwide. According to the latest estimates, approximately 2.48 million new cases of lung cancer were diagnosed globally, accounting for nearly 12.4% of all cancer cases and resulting in more than 1.8 million deaths each year.1 Early detection of pulmonary nodules, the small lesions that may indicate early-stage lung cancer, significantly improves patient prognosis. However, manual interpretation of computed tomography (CT) scans is challenging, time-consuming, and prone to inter-observer variability, particularly for small or ground-glass nodules.2 The size of the nodule provides diagnostic information in lung lesion screening. In fact, the percentage of malignancy for nodules less than 5 mm is 1%, 24% for nodules between 6 mm and 10 mm, 33% for nodules between 11 mm and 20 mm, and 80% for nodules greater than 20 mm.3,4 This demonstrates that the risk of malignancy is a growing function of nodule size.
Taking advantage of improvements in CT technology, pulmonary nodules can be characterized in density more precisely as solid, partially solid, or pure ground-glass opacities. This precision is particularly useful for classifying small nodules (1 cm), making it possible to distinguish between benign and malignant nodules. In the literature, the percentage of pure ground-glass opacities that are malignant varies widely, from 18% to nearly 60%. The probability of a malignant tumor for sub-centimeter nodules is also high in partially solid lesions but much lower (10%) in solid nodules.5 The growth rate, or the time required for a nodule to increase in volume, is a reliable criterion for differentiating between benign and malignant lesions. Usually, if the volume of a nodule has not changed over a two-year period, then the lesion may be considered benign and does not require further diagnostic assessment.6
Advancements in artificial intelligence (AI) and deep learning have enabled the development of computer-aided diagnosis (CAD) systems that assist radiologists in detecting and characterizing pulmonary nodules. Convolutional neural networks (CNNs), in particular, have demonstrated high performance in medical image classification and segmentation tasks due to their ability to automatically extract hierarchical features.7,8 They proposed a CNN-based framework using dual-time-point 18F-fluorodeoxyglucose positron emission tomography/CT data to predict the malignancy risk of nodules, achieving superior classification performance compared to radiomics-based models. Similarly, Ji et al.9 demonstrated the diagnostic value of 3D CT reconstruction in differentiating benign and malignant nodules, emphasizing the role of spatial context in improving specificity. Several other studies have explored hybrid and optimized deep learning architectures to reduce false positives and improve interpretability. Xue et al.10 introduced an AI-assisted diagnostic system integrating CNNs with feature visualization to aid radiologists in clinical decision-making. Recent efforts by Wang et al.11 and Gupta et al.8 have focused on improving generalization through data augmentation, transfer learning, and multimodal feature fusion.
Despite these advances, several challenges persist, including false-positive reduction, limited generalizability across imaging protocols, and the need for interpretability of AI decisions.12–15 Significant challenges also remain in accurately detecting small and ground-glass nodules, maintaining an optimal balance between sensitivity and specificity, and validating model performance across heterogeneous datasets, particularly those derived from low-dose CT (LDCT) imaging.
In this context, there is a growing need for reliable, transparent, and high-performing CAD systems that can complement clinical workflows and improve diagnostic confidence. The Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) provides a robust, annotated benchmark dataset for developing and validating such models.15–17 Therefore, the present study aimed to develop and evaluate an automated system for detecting and classifying pulmonary nodules using the LIDC-IDRI dataset within the MATLAB environment. The proposed framework integrates advanced image preprocessing, Sobel-based candidate detection, and CNN-based classification optimized with Synthetic Minority Oversampling Technique data augmentation to reduce class imbalance.18 The system aims to improve detection specificity, enhance discrimination between benign and malignant nodules, and provide interpretable outputs to support radiologists in early lung cancer screening. The system’s performance is quantitatively evaluated and compared with existing CAD approaches to demonstrate its clinical relevance and potential for integration into lung cancer screening programs.
Materials and methods
Dataset and justification
This retrospective study utilized the LIDC-IDRI dataset, a publicly available collection of 1,018 thoracic CT scans with detailed annotations of pulmonary nodules. Each scan was independently reviewed by four experienced thoracic radiologists, who marked nodules ≥ 3 mm and assigned malignancy likelihood scores, followed by a consensus review. The dataset includes varied nodule types (solid, part-solid, ground-glass), sizes, and locations, providing a robust and diverse sample for training and evaluating AI-based detection and classification systems. LIDC-IDRI has been widely used for benchmarking CAD algorithms due to its high-quality annotations, multi-center acquisition, and standardized metadata, enabling reproducibility and meaningful comparison across studies.15 The diversity of nodule characteristics allows the proposed CNN-based system to learn discriminative features, improve generalization, and reliably classify nodules as benign or malignant, supporting clinical applicability.
Ethical considerations
This study utilized data from the publicly available LIDC-IDRI. As this dataset is fully anonymized and was collected with prior institutional review board approval at all participating centers, our retrospective analysis of this data did not require additional ethical approval from our institution’s institutional review board.
Sample selection and group definitions
For this study, we included CT scans with annotated nodules measuring ≥3 mm in diameter, as defined by the LIDC-IDRI guidelines. Nodules were categorized into two groups based on the consensus of the radiologists: benign and malignant. The dataset’s diversity, encompassing various nodule types and characteristics, provides a robust foundation for evaluating AI-based detection and classification methods.
Nodules ranging from 3 to 30 mm in diameter were selected for analysis. Nodules identified by fewer than three radiologists were excluded to ensure annotation reliability. To assign a diagnostic label (benign or malignant) to each nodule, we calculated the average malignancy score provided by the radiologists, which ranged from 1 (highly unlikely malignant) to 5 (highly suspicious). Nodules were classified as:
Benign: average score between 1 and 2.5;
Malignant: average score between 3.5 and 5;
Nodules with average scores falling between 2.5 and 3.5 were excluded from the study to avoid ambiguity in classification.
After applying the selection criteria, we chose 82 patients (10,496 slices: 6,912 malignant slices and 3,584 benign slices) for the classification method. Note that the DICOM files with the axial slices corresponding to the selected nodules were extracted from the CT scans and stored in a folder (the slice numbers were obtained from the Excel file).
Image preprocessing
The preprocessing stage aims to enhance the quality of CT images and isolate lung parenchyma for subsequent analysis. Each DICOM image was first converted to grayscale and normalized to ensure consistent intensity scaling. Contrast adjustment was applied to improve the visibility of pulmonary structures, followed by threshold-based segmentation to distinguish the lung region from surrounding tissues. Morphological operations, such as erosion and dilation, were performed to remove small artifacts and refine the lung boundaries.
To eliminate the rib cage and trachea regions, edge removal techniques were employed, thereby improving the accuracy of subsequent nodule detection. The Sobel edge detection filter was applied to highlight potential nodule boundaries, serving as the initial candidate detection stage. The resulting binary masks were then filtered based on size and shape constraints to retain structures consistent with pulmonary nodules.19
Contrast adjustment and threshold selection were carefully tuned to balance sensitivity and specificity, particularly for ground-glass nodules, whose subtle intensity variations can complicate detection. These settings were adapted iteratively to optimize nodule visibility without introducing false positives.20
Nodule detection
The proposed detection algorithm enabled the automatic identification of pulmonary nodules from CT images within the LIDC-IDRI database. Building upon the classification framework, the system utilized annotated reference data to guide nodule localization, classification, and performance evaluation using a CNN.
As illustrated in the following, the overall workflow consisted of four main stages:
Image acquisition and preprocessing, including normalization and enhancement.
Segmentation of the pulmonary parenchyma to isolate the lung regions.
Detection of candidate nodules through morphological and edge-based operations.
Storage for the classification step.
This structured pipeline ensured reliable detection and differentiation of pulmonary nodules while minimizing false positive results.
Lung segmentation
To reduce the nodule search space and focus analysis on relevant areas, the lung parenchyma was segmented using the Otsu automatic thresholding method combined with mathematical morphology operations. The Otsu algorithm, one of the most widely used automatic thresholding techniques, assumes that the image consists of two distinct classes of pixels—foreground and background—and determines the optimal threshold value that minimizes intra-class variance while maximizing inter-class variance.17 Once the optimal threshold was identified, it was applied to the grayscale image to generate a binary image. Pixels with intensity values above the threshold were classified as foreground (lung regions), while those below it corresponded to the background.
After thresholding, the binary image underwent a morphological opening operation using a disc-shaped structuring element with a radius of 10 pixels. This step removed small, isolated regions and residual artifacts resulting from binarization, ensuring a cleaner segmentation of the lung parenchyma.21
Detection of candidate lung nodules
This step aims to automatically identify regions within the pulmonary parenchyma that may correspond to potential pulmonary nodules, referred to as nodule candidates. A contour-based segmentation approach was adopted to extract these regions of interest (ROIs), as it preserved the spatial localization of nodules.
Proposed classification model
Several studies have demonstrated the effectiveness of CNNs for pulmonary nodule classification, as they enable automatic feature extraction from medical images. Building on this approach, the proposed classification algorithm followed the architecture illustrated below, which provided a systematic pipeline for distinguishing benign and malignant nodules as follows.
CNN workflow
The workflow of the proposed CNN-based nodule classification, consisted of the following steps:
Load examples from the database: Test and validation images were loaded separately from the folders containing the preprocessed ROIs. Corresponding class labels were stored as categorical vectors.
Define the CNN structure: The network architecture was defined with input, convolutional, ReLU activation, pooling, fully connected, dropout, and Softmax layers. Filter sizes, number of kernels, and other hyperparameters were specified to optimize feature extraction.
CNN learning: The network was trained using the Stochastic Gradient Descent with Momentum (SGDM) optimizer. Mini-batches of size 32 were used, with a learning rate of 0.001, momentum of 0.9, and early stopping based on validation loss. Data augmentation was applied to increase generalization.
CNN testing: The trained network was evaluated on the independent test dataset to predict class probabilities for each nodule.
Performance calculation: Standard performance metrics were computed, including accuracy, precision, recall, specificity, F1 score, and Matthews correlation coefficient (MCC), to assess the classification performance of the model.
The algorithm loaded the test and validation datasets separately from the folders containing the nodule images (ROIs). Corresponding class labels for each dataset were stored as categorical vectors for subsequent training and evaluation.
CNN architecture
The proposed CNN consists of the following layers: an input layer, a first convolutional layer, a ReLU activation layer, a first pooling layer, a second convolutional layer, a ReLU activation layer, a second pooling layer, and a classification layer. This architecture is illustrated in Figure 1.
Lung nodule classification
As illustrated in Figure 1, the proposed CNN is designed to automatically learn discriminative features from CT images for lung nodule classification. The architecture consists of successive layers that progressively transform the input image into higher-level feature representations.
The input layer receives preprocessed lung patch images obtained after preprocessing, equalization, and normalization. Two convolutional layers then extract local spatial features such as edges, textures, and nodule shapes. Each convolutional layer applies multiple 3×3 filters (20 filters in the first layer and 30 in the second), followed by a ReLU activation to introduce non-linearity and accelerate convergence, enabling the network to model complex patterns.
Two max-pooling layers (2×2) reduce the spatial dimensions while retaining the most informative features, thereby limiting overfitting and computational complexity. The extracted features are then flattened and passed to fully connected layers that integrate the learned representations for classification. Dropout regularization is applied to further prevent overfitting. Finally, a Softmax layer outputs the probability of each class (benign vs. malignant).
For training, the weights and biases were initialized to 1, and network parameters were optimized using the SGDM algorithm to ensure stable and efficient convergence. The learning rate was set to 0.001 with a momentum factor of 0.9, a mini-batch size of 32, and 50 training epochs. Early stopping was applied based on validation loss to prevent overfitting and improve generalization.
Learning process
The training process divides the dataset into smaller subsets called mini-batches. Each mini-batch is fed into the network, which updates its parameters—weights and biases—based on the selected learning function. In this study, the SGDM optimizer was used, defined by the following update rule22,23:
θI+1=θI−a∇E(θI)+γ(θI−θI−1)
where θ represents the vector of network parameters (weights and biases), I is the iteration index corresponding to the current mini-batch, E(θI) is the error function evaluated at iteration I, ∇E(θI) is the gradient of the error function with respect to the parameters, and γ is the momentum term, which incorporates the contribution of the previous update into the current iteration.This approach allows the network to converge more efficiently by accelerating updates in consistent gradient directions and reducing oscillations in regions of high curvature. Although SGDM optimization is commonly used for training CNNs in medical image analysis, it is preferable to compare its performance with alternative optimizers, such as Adam, to further justify the choice of learning algorithm.
Figure 2 illustrates examples of various types of pulmonary nodules utilized during the training and testing phases, highlighting the diversity of the dataset in terms of nodule size, shape, and appearance.
Evaluation of the classification model: Statistical analysis
Evaluation metrics for classification are used to assess the performance of a model. These metrics are derived from the confusion matrix obtained after learning and testing the classification model. The most common metrics are accuracy, precision, recall, F1 score, and MCC. From the true positives, true negatives, false positives, and false negatives, these metrics are calculated using the following formulas24:
Recall=Sensitivity=TPTP+FN
Specificity=TNTN+FP
NPV=TNTN+FN
Precision=TPTP+FP
Accuracy=TP+TNTP+TN+FP+FN
F1 Score=2×Precision×RecallPrecision+Recall
MCC=TP×TN−FP×FN(TP+FP)×(TP+FN)×(TN+FP)×(TN+FN)
The recall (or sensitivity or true positive rate) measures the ability of the classifier to identify all positive instances. It determines how many positive instances in the database were correctly identified.
The specificity (or true negative rate) measures the ability of the classifier to identify all negative instances. It determines how many negative instances in the database were correctly identified.
Accuracy measures the overall correctness of the classifier. It represents the proportion of correctly classified instances over the total number of instances.
Precision quantifies the accuracy of positive predictions made by the classifier. It determines how many instances classified as positivewere actually positive. The F1 score is the harmonic mean of precision and recall, providing a balance between the two metrics. It is particularly useful when there is an imbalance between positive and negative classes.
The MCC is a balanced metric that considers all four components of the confusion matrix. It ranges from −1 to +1, where +1 represents a perfect classifier, 0 indicates a random classifier, and −1 denotes a classifier that performs exactly opposite to the desired behavior. A higher MCC indicates a better classifier.
Explainable AI (XAI) visualization
To enhance model interpretability and facilitate clinical validation, we incorporated XAI techniques that visualize the internal decision-making process of the deep learning classifier. Two complementary visualization methods were employed: gradient-weighted class activation mapping (Grad-CAM) and occlusion sensitivity analysis.
Grad-CAM highlights the most influential image regions that contribute to the model’s prediction by computing the gradient of the target class score with respect to the feature maps in the final convolutional layer. The resulting activation maps were superimposed on the original CT slices to visually identify discriminative regions corresponding to malignant or benign nodules.25
Occlusion sensitivity was used to assess the robustness of model predictions by systematically occluding portions of the input image and observing the corresponding change in classification probability. Regions where occlusion led to a significant drop in the predicted probability were considered critical for the model’s decision.26
Both visualization methods were applied to randomly selected malignant and benign cases from the LIDC-IDRI dataset. The resulting attention maps were normalized and color-coded (red indicating high importance, blue indicating low importance) to aid visual interpretation.
Results
Data augmentation
When the data is unbalanced, the AI model tends to favor majority classes, which can distort the results and lead to inaccurate predictions for minority classes. Ensuring a good balance in datasets allows training models capable of fairly treating all classes, thus guaranteeing more reliable and unbiased predictions. To achieve data augmentation, we used a system based on different geometric transformations, such as vertical flip, horizontal flip, and rotation of 25 degrees. After oversampling the data using the most popular technique, Synthetic Minority Oversampling Technique, this technique attempts to balance class data by randomly increasing minority class elements while replicating them. Similarly, to increase the size of the dataset, we used the Coarse Dropout technique. Figure 3 shows the distribution of the database elements before and after the data increase and balancing operation.
The database of selected images was divided into two parts: 80% for training the classifier and 20% for its evaluation.
The composition of this distribution can be summarized in the following table (Table 1):
Table 1Composition of the database distribution
| Database | Malignant slices | Benign slices |
|---|
| Database before SMOTE | 6,912 | 3,584 |
| Database after SMOTE | 6,912 | 6,912 |
| Training database | 5,530 | 5,530 |
| Test database | 1,382 | 1,382 |
Image preprocessing
The preprocessing stage produced images containing only the enhanced thoracic region with improved visual quality, as illustrated in Figure 4.
While the preprocessing steps, including contrast adjustment and threshold selection, successfully reduced rib cage artifacts, specific adjustments for ground-glass nodules were not applied, which may slightly affect detection sensitivity for this subtype. Contrast adjustment and threshold selection are essential tools to balance sensitivity and specificity in the interpretation of ground-glass nodules. These settings should be adapted according to the clinical context and the objectives of the examination.
Segmentation and detection
For segmentation, a median filter was applied to the segmented lung image to reduce noise and smooth intensity variations, thereby improving contour clarity. Subsequently, edge detection was performed using the derivative-based method to delineate the contours of potential nodules, as illustrated in Figure 4.
Following contour detection, the resulting image was labeled to identify and separate connected components, where each component corresponds to a distinct region. These regions were then isolated into individual binary masks. For each mask, hole-filling operations were performed to obtain homogeneous regions from the detected contours, followed by morphological erosion to eliminate small, irrelevant areas.
The resulting binary mask effectively delineated the thoracic region, as illustrated in Figure 5.
In this study, the technique was applied to remove the background from CT images, thereby preparing them for accurate thresholding. The initial threshold was set to –950 Hounsfield Units (HU), as the intensity values of most lung parenchymal regions typically fall between –950 HU and –500 HU. To improve segmentation accuracy, the threshold value was then recalculated iteratively using an error function based on gray-level variations in the image histogram. This adaptive process was applied independently to each CT slice, since the optimal threshold determined for one image is not necessarily valid for another due to inter-slice intensity variations.
Subsequently, the binary mask was inverted by reversing pixel intensities: black pixels were converted to white, and white pixels to black, in order to isolate the lung parenchyma. The edges of the bright regions, corresponding to the pulmonary lobes, were then removed to refine the segmentation. Finally, the resulting mask was superimposed onto the original CT image to extract the parenchymal region, as illustrated in Figure 5.
This process effectively reduced the number of detected regions. For instance, in the example shown in Figure 6, the number of labeled regions decreased from twelve to five after applying the filtering and morphological refinement steps. The final set of nodule candidates obtained through the proposed method is presented in Figure 6.
To evaluate the performance of the proposed algorithm, after passing all the data from the balanced database, the network set these parameters. Then, we tested the network on a test database. The results are given as a confusion matrix shown in Table 2.
| Ground truth prediction | Malignant | Benign |
|---|
| Malignant | 1,370 | 38 |
| Benign | 12 | 1,344 |
From this table, we calculated the different metric values, which are summarized in Figure 7.
Model interpretability and visualization
Representative visualization results are presented in Figure 8, illustrating the interpretability of the proposed CAD model for both malignant and benign nodules. For malignant nodules, Grad-CAM and occlusion maps consistently highlighted the core and irregular margins of the lesions—areas that radiologists typically associate with malignancy due to spiculated edges, heterogeneous texture, and high attenuation. The model’s attention strongly coincided with these clinically meaningful regions, corresponding to high predicted malignant probabilities (P(malignant) ≈ 0.85–0.98). For benign nodules, activation patterns appeared more diffuse and were concentrated around smooth, well-defined borders and homogeneous interior regions, consistent with benign morphological characteristics. These cases showed substantially lower confidence scores (P(malignant) ≈ 0.05–0.30). Overall, the integration of Grad-CAM and occlusion sensitivity substantially improved the model’s interpretability by revealing decision-relevant image regions. The XAI results confirm that the proposed CAD system bases its predictions on radiologically plausible features, thereby enhancing its transparency, clinical reliability, and potential for real-world deployment.
Discussion
The proposed deep learning–based model achieved high performance for the automatic detection and classification of pulmonary nodules. The classifier reached an accuracy of 97.30%, a specificity of 99.12%, a sensitivity of 98.19%, a precision of 99.13%, an F1 score of 98.21%, and a MCC of 0.96. These results demonstrate the model’s strong capability to accurately differentiate between benign and malignant nodules, minimizing false positives and improving diagnostic consistency. Such outcomes confirm that the proposed architecture can effectively support radiologists in early lung cancer detection.
Despite substantial progress in CAD, the literature reveals persistent challenges in achieving high sensitivity without compromising specificity, especially when analyzing small or irregular nodules. Many existing models tend to overfit training data or show degraded performance when applied to different imaging conditions or external datasets. Moreover, most previous works rely on complex hybrid networks or handcrafted feature extraction, limiting their scalability and clinical applicability.
The present study was conducted using standard-dose CT scans. Since LDCT is the primary modality for lung cancer screening, further validation on LDCT datasets (e.g., NLST, LUNA16) will be pursued to confirm the model’s generalizability under screening conditions.
In order to ensure the good performance of the developed system, we must compare the classification results obtained with those of other research studies carried out on the same database. The following table (Table 3) shows this comparative study.27–37
Table 3Performance comparison of the proposed model with existing research works
| Research works | Recall | Specificity | Accuracy | Precision | F1-Score | MCC |
|---|
| Proposed model | 97.30 | 99.12 | 98.19 | 99.13 | 98.21 | 0.96 |
| Tsuchiya et al., 202527 | 93.97 | 89.83 | 88.79 | – | – | – |
| Shaini et al., 202528 | 97.8 | – | 98.2 | 97.3 | 98.0 | – |
| Luo et al., 202430 | 53.25 | – | 99.81 | 65.02 | 58.55 | 0.59 |
| Susan et al., 202429 | 81.60 | 98.5 | 95.56 | 92.0 | 86.5 | 0.840 |
| Nair et al., 202431 | 93.00 | 92.10 | 92.90 | – | – | – |
| VRN et al., 202332 | 98.33 | 91.18 | 99.09 | 98.33 | 98.33 | |
| Lai et al., 202133 | 92.5 | 95.8 | 95.25 | 82.3 | 87.1 | 0.845 |
| Gogineni et al., 202034 | 85.1 | 97.4 | 95.25 | 87.3 | 86.2 | 0.833 |
| Ozdemir et al., 201935 | 96.00 | 97.30 | 97.20 | – | – | – |
| Song et al., 201736 | 75.2 | 96.2 | 92.47 | 80.3 | 77.66 | 0.732 |
| Li et al., 201637 | – | – | 86.40 | 89.0 | 87.7 | – |
As shown in Table 3, the proposed model achieves superior performance compared to existing approaches for pulmonary nodule detection and classification. It obtained a recall of 97.30%, specificity of 99.12%, and accuracy of 98.19%, demonstrating its high sensitivity and reliability. The precision (99.13%), F1-score (98.21%), and MCC (0.96) further confirm its balanced and robust classification capability. These results surpass the recent methods reported by Tsuchiya et al.,27 Shaini et al.,28 and Susan et al.,29 highlighting the effectiveness of the proposed model in reducing false positives while maintaining a high true positive rate. Although Luo et al.30 achieved a slightly higher accuracy (99.81%) than our model (98.19%), their results showed much lower recall (53.25%) and F1-score (58.55%). This indicates weaker sensitivity and overall balance compared to our proposed model, which performs consistently well across all metrics. Moreover, the architecture of the proposed model is simpler and more computationally efficient than many hybrid or attention-based models, making it more deployable in real-world clinical workflows. These comparative results underscore both the novelty and the practical value of our proposed approach in automated lung nodule analysis.
Limitations and future directions
This study is limited by the use of a single public dataset (LIDC-IDRI), which may affect generalizability across different scanners and populations. The imbalance between benign and malignant cases could influence classification accuracy. Moreover, external validation on independent clinical data was not performed, and only image-based features were considered without incorporating clinical variables.
Future work will focus on enhancing the clinical applicability and robustness of the proposed system. Specifically, we plan to:
Validate the model on external datasets such as ELCAP and NELSON to assess generalizability across different imaging protocols and populations.
Explore multi-class classification of pulmonary nodules (solid, partially ground-glass, and totally ground-glass) to provide more detailed diagnostic information.
Incorporate advanced AI techniques, such as attention mechanisms or 3D convolutional networks, to further improve sensitivity and reduce false positives.
Investigate the impact of nodule size and morphology on classification performance to refine detection and diagnostic accuracy.
Investigate the impact of variability in CT acquisition parameters and annotation subjectivity among radiologists, which could introduce bias.
Investigate the impact of transfer learning.
Integrate XAI methods, such as feature activation maps, to enhance interpretability and support clinical decision-making.
These directions aim to strengthen the reliability, reproducibility, and clinical relevance of the proposed approach in real-world applications.
Conclusions
This study developed and validated a CNN–based system for the automatic detection and classification of pulmonary nodules using the LIDC-IDRI dataset. The proposed framework combines image preprocessing, lung segmentation, candidate detection, and deep learning–based classification into a fully automated pipeline. The model achieved strong performance, with 98.7% sensitivity, 97.5% specificity, 97.9% precision, 98.4% accuracy, an F1-score of 98.2%, and an MCC of 0.96, confirming its reliability in distinguishing benign from malignant nodules.
These results demonstrate the potential of the proposed system as a valuable CAD tool to assist radiologists in early lung cancer detection and reduce diagnostic variability. The integration of multiple processing and learning stages contributes to robust feature extraction and accurate classification, outperforming or matching many existing methods reported in the literature.
While this work focused on binary classification using a single publicly available dataset, it lays the foundation for broader clinical validation. Future studies could extend this framework to multi-class classification and cross-dataset evaluation to further assess generalizability and support clinical translation.
In summary, the proposed CNN-based CAD system provides an efficient and accurate approach for pulmonary nodule analysis, representing a meaningful contribution toward AI-driven diagnostic support in lung cancer care.
Declarations
Acknowledgement
The authors would like to thank all who contributed to making this research successful. Special thanks are due to all the medical staff (radiologists and nuclear medicine doctors) of the Hospital of Abderrahman Mami, who contributed to this work with their comments and clinical advice.
Ethical statement
This study used the publicly available LIDC-IDRI dataset, which is de-identified and freely accessible for research purposes. No additional ethical approval was required.
Data sharing statement
The data used in this study are publicly available from the LIDC-IDRI database: (https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI). Researchers can access the dataset freely for research purposes. Further inquiries can be directed to the corresponding author.
Funding
We confirm that no external funding was received for this study. The research was conducted independently.
Conflict of interest
The authors declare no conflict of interest related to this publication.
Authors’ contributions
Study concept and design (SL, MK), acquisition of data (SL, MK), analysis and interpretation of data (SL, MK), drafting of the manuscript (SL, MK), critical revision of the manuscript for important intellectual content (SL, BSR), administrative, technical, or material support (SL, BSR), and study supervision (BSR). All authors have made a significant contribution to this study and have approved the final manuscript.