
To do so, we decided to tackle the ChestX-ray Kaggle challenge as a Computer Vision project, containing more than 100,000 images of size 2000×2000 pixels, which represents an overall of 50 giga image dataset. The original radiology reports are not publicly available but you can find more details on the labeling process in this Open Access paper: “ ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases.” (Wang et al.) The labels are expected to be >90% accurate and suitable for weakly-supervised learning. To create these labels, the authors used Natural Language Processing to text-mine disease classifications from the associated radiological reports. This NIH Chest X-ray Dataset is comprised of 112,120 X-ray images with disease labels from 30,805 unique patients. Prior to the release of this dataset, Openi was the largest publicly available source of chest X-ray images with 4,143 images available. One major hurdle in creating large X-ray image datasets is the lack of resources for labeling so many images.

The lack of large publicly available datasets with annotations means it is still very difficult, if not impossible, to achieve clinically relevant computer-aided detection and diagnosis (CAD) in real world medical sites with chest X-rays. However, clinical diagnosis of a chest X-rays can be challenging and sometimes more difficult than diagnosis via chest CT imaging. ProblemĬhest X-ray exams are one of the most frequent and cost-effective medical imaging examinations available.

In this case study, we examine a deep learning project that aims to predict health conditions in the chest from an open source of chest x-rays made publically available. With readily available X-ray data you can train a model to predict specific health conditions based on the image. It can make medical diagnosis’ faster, more accurate, and offer better treatment solutions. Deep learning can be used to solve many issues in the medical field.
