Skip to main content
6 minutes reading time (1137 words)

Can Artificial Intelligence Diagnose Skin Lesions? With Philipp Tschandl, MD, PhD

Melanoma with uneven border.

Philipp Tschandl, MD, PhD, and colleagues found that current artificial intelligence (AI) algorithms that use "deep learning"—a type of machine learning that is based on artificial neural networks—outperform humans, even experts, in the classification of pigmented skin lesions. In this interview with i3 Health, Philipp Tschandl, member of the Vienna Dermatologic Imaging Research (ViDIR) Group of the Medical University of Vienna's Department of Dermatology, discusses the significance of the study's results and shares his thoughts concerning how machine-learning algorithms might impact the future of dermatology.

What led you to research the accuracy of machine-learning algorithms versus human readers in the classification of pigmented skin lesions?

Philipp Tschandl, MD, PhD: Automated image recognition has gained significantly more accuracy with the improvements of deep learning within the last few years. Skin cancer detection is, among other things, a process highly dependent on visual pattern analysis; therefore it is one of the role models in the application of artificial intelligence in medicine.

Previous studies on dermatoscopic images (a dermatoscope is a magnification lens with cross-polarized light which dermatologists use to check your moles) were restricted in multiple ways, which we tried to overcome.

First, they only distinguished moles from melanomas. In real life, there are multiple different skin changes that can present as pigmented spots. In our study, we integrated the seven most common classes of pigmented skin lesions, which should cover over 90% of the changes seen in daily practice.

Second, they only compared to one of their own algorithms. In this study, we invited machine learning groups around the world to participate in the International Skin Imaging Collaboration (ISIC) 2018 challenge, through which we gathered 139 algorithms to test. As this was a public challenge, anyone could participate, not only "elite" research groups.

Finally, previous studies usually only compared to a rather limited number of dermatologists. Artificial intelligence models are most likely more helpful to non-specialists than they are to already specialized dermatologists. This is why we made a public comparison in this study, not only involving expert dermatologists but also general practitioners, dermatology residents, and others—in sum more than 500 participants.

Did the results of this study surprise you? Why or why not?

Dr. Tschandl: That algorithms perform better than human participants in such an experiment was not fully surprising, as similar findings were seen in previous studies. We were, in fact, a little surprised by the magnitude of the difference in comparison with experts in skin cancer detection, whose performance in clinical practice is excellent. This should not worry patients, as absolute numbers of "accurate specific diagnoses" from such experiments cannot be transferred directly to correct management decisions and diagnoses in real life.

We did not estimate that the algorithms would detect pigmented early stages of "white" skin cancer (actinic keratosis / Bowens disease [intraepithelial carcinoma], the class "AKIEC" within the study) that well; in fact, we thought the opposite. Truly, these are very hard to diagnose, specifically on digital images, as they can mimic melanomas and aging spots on sun-damaged skin. We are trying to find out why the algorithms were so good at this. Hopefully there is something for us to learn so that dermatologists can enhance their abilities as well!

While algorithms outperformed the average participant in detecting a malignant skin change, we also performed an analysis where we created a "collective" human vote for all images. In that case, humans were on par with the accuracy of the very best machine-learning groups.

What further investigations are needed on this topic?

Dr. Tschandl: Unfortunately, these models are not ready to function on their own for multiple reasons. For example, the algorithms currently cannot tell you when they don't know the answer. If you show an algorithm an image of a skin lesion class which was not present in our dataset, it still tries to predict one of the seven classes that it "knows," whereas instead it should say, "This is something different; I don't know."

In addition, the current algorithms do not generalize to all areas of the world in the same way. We "implanted" images from skin cancer centers in countries that had not contributed to the training images and found that the machines outperformed the human participants by significantly less in those cases.

While the measured accuracy is better than that of humans, we currently don't know how to transfer this performance of machines to clinically meaningful decisions that are beneficial to patients. Knowledge regarding how to design the best interaction models between human and machine is still missing. This is something that we are currently researching through our free study and teaching platform, dermachallenge.com. On this site, anyone is invited to play and find skin cancer together with a machine and interact with the algorithm's output. Through this, we want to learn more about the interplay of human and machine.

In the future, what role do you think that machine-learning algorithms will end up playing in the diagnosis of pigmented skin lesions and other conditions?

Dr. Tschandl: As of today, I think that the two main applications will be 1) as decision support to non-specialized health care workers (eg, general practitioners or physician assistants) so that more decisions can be made without time-consuming consultation of a specialist, and 2) pre-screening of patients visiting a (tele)dermatologist in order to enhance and speed up doctors' workflow. The latter may be very welcome for physicians, as it would free up time for meaningful doctor-patient interactions; these are becoming increasingly rare due to the economic pressure within health care systems.

Having patients themselves use an artificial intelligence as used in this study has multiple problems. First, such implementations would need an even higher level of scrutiny and accuracy in prospective clinical trials, which are expensive. Secondly, patients are not used to handling diagnostic accuracy in daily life. For example, what would someone do if the AI model were to say that he or she had a 5% probability of melanoma? Would this be regarded as low or high? One needs to balance these outputs in a clinically meaningful way, which physicians inherently do every day.

Lastly, very small changes in the specificity of such models may have very large effects if implemented on a large scale, something that is equally true for any screening test in medicine.

For More Information

Tschandl P, Codella N, Akay BN, et al (2019). Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. [Epub ahead of print] DOI:10.1016/S1470-2045(19)30333-X

Image credit: Skin Cancer Foundation

Transcript edited for clarity. Any views expressed above are the speaker's own and do not necessarily represent those of i3 Health.


Related Posts