[medrxiv] 

Diagnosing and remediating harmful data shifts for the responsible deployment of clinical AI models

Harmful data shifts occur when the distribution of data used to train a clinical AI system differs significantly from the distribution of data encountered during deployment, leading to erroneous predictions and potential harm to patients. We evaluated the impact of data shifts on an early warning system for in-hospital mortality that uses electronic health record data from patients admitted to a general internal medicine service, across 7 large hospitals in Toronto, Canada. To explore the robustness of the model, we evaluated potentially harmful data shifts across demographics, hospital types, seasons, time of hospital admission, and whether the patient was admitted from an acute care institution or nursing home, without relying on model performance. Interestingly, many of these harmful data shifts were unidirectional. Overall, our study is a crucial step towards the deployment of clinical AI models, by providing strategies and workflows to ensure the safety and efficacy of these models in real-world settings.


Collaboration work with Vector Institute and GEMINI


[IEEE Access]

Evaluating knowledge transfer in the neural network for medical images

In this study, we implement different experiments for standard transfer learning approaches as our baseline and introduce the adoption of a novel knowledge transfer approach, called teacher-student learning framework, to improve the performance of diagnostic predictive models in medical imaging. Specifically, we investigate various configurations in the teacher-student learning framework inspired by the activation attention transfer in computer vision models to help address some of the challenges faced in medical imaging, such as the limited availability of annotated data and limited computing resources. We show that the teacher-student learning approach has the potential to significantly improve the performance of diagnostic predictive models. Our findings could have a positive impact on healthcare accessibility and affordability, as they may enable the development of more cost effective and widely available medical imaging technologies under a limited data environment.

[NATURE]

Transitioning sleeping position detection in late pregnancy using computer vision from controlled to real-world settings: an observational study

Sleeping on the back after 28 weeks of pregnancy has recently been associated with giving birth to a small-for-gestational-age infant and late stillbirth, but whether a causal relationship exists is currently unknown and difficult to study prospectively. This study was conducted to build a computer vision model that can automatically detect sleeping position in pregnancy under real-world conditions. Real-world overnight video recordings were collected from an ongoing, Canada-wide, prospective, four-night, home sleep apnea study and controlled-setting video recordings were used from a previous study. Images were extracted from the videos and body positions were annotated. Five-fold cross validation was used to train, validate, and test a model using state-of-the-art deep convolutional neural networks. The dataset contained 39 pregnant participants, 13 bed partners, 12,930 images, and 47,001 annotations.

[YouTube]

Collaboration work with UHN

[PLoS Digital Health]

Sleep in Late Pregnancy: Artificial Intelligence for the Detection and Diagnosis of Disturbances and Disorders (SLeeP AID4)

Recognizing that approximately one-third of the pregnancy is spent asleep, there has been a surge of clinical and research interest over the last two decades in the potential roles that sleep (and poor sleep) during pregnancy might play in adverse pregnancy outcomes. In this project, we are conducting multi-night, multi-participant, in-home sleep apnea studies in late pregnancy and developing video-based sleep apnea diagnosis technology. We believe the old adage, "diagnosis is treatment." This technology will eventually be used to streamline the diagnosis of sleep apnea in pregnancy so that it can be urgently triaged for appropriate management. This technology will also be able to diagnose sleep apnea in the bed partner simultaneously.

[YouTube]

Collaboration work with UHN

[JMIR] 

Using Social Media to Help Understand Patient-Reported Health Outcomes of Post–COVID-19 Condition: Natural Language Processing Approach

In response to the emergence of long COVID, we developed an NLP pipeline to facilitate extracting information from user-reported experiences on social media platforms. In this study, we examined the validity and effectiveness of our NLP pipeline to provide insights into patient-reported long COVID-related health outcomes across two popular social media platforms, Twitter and Reddit. In doing so, we extracted symptoms and conditions (SyCo) and estimated their occurrence frequency. We compared the outputs with human annotations and highly utilized clinical outcomes grounded in the medical literature. Lastly, we tracked occurrences of SyCo terms over time and geographies to explore the pipeline's potential to be used as a surveillance tool reflecting users’ opinions and experiences. The outcome of our social media-derived pipeline is comparable with the results of

peer-reviewed articles relevant to long COVID symptoms. Overall, this study provides unique

insights into patient-reported health outcomes from long COVID and valuable information about

the patient’s journey that can help healthcare providers anticipate future needs.


Collaboration work with Vector Institute, Roche, Deloitte, TELUS, and UHN

[JMIR] 

Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses

This study explores the feasibility of using a deep learning-based natural language processing (NLP) model for information extraction from unstructured laboratory reports. The NLP model, trained on a large corpus of annotated laboratory reports, demonstrated strong performance in extracting clinically meaningful medical concepts. The model's stability and generalizability were evaluated across different test sets, showing consistently high accuracy. The study highlights the potential of deep learning-based NLP models to automate the parsing of laboratory data, enabling scalable and efficient access to valuable information for decision support and analysis.


Collaboration work with Vector Institute and ICES

[EMNLP - ACL Anthology] 

Bringing the State-of-the-Art to Customers: A Neural Agent Assistant Framework for Customer Service Support

Building Agent Assistants that can help improve customer service support requires inputs from industry users and their customers, as well as knowledge about state-of-the-art Natural Language Processing (NLP) technology. We combine expertise from academia and industry to bridge the gap and build task/domain-specific Neural Agent Assistants (NAA) with three high-level components for: (1) Intent Identification, (2) Context Retrieval, and (3) Response Generation. In this paper, we outline the pipeline of the NAA's core system and also present three case studies in which three industry partners successfully adapt the framework to find solutions to their unique challenges. Our findings suggest that a collaborative process is instrumental in spurring the development of emerging NLP models for Conversational AI tasks in industry. 


Collaboration work with Vector Institute, KPMG, PwC, and CIBC

[IEEE] Journal of Translational Engineering in Health and Medicine

A Computer Vision Approach to Identifying Ticks Related to Lyme Disease

In this work, we build an automated detection tool that can differentiate blacklegged ticks from other tick species using advanced computer vision approaches in real-time. Specially, we use convolution neural network models, trained end-to-end, to classify tick species. Also, advanced knowledge transfer techniques are adopted to improve the performance of convolution neural network models. Our best convolution neural network model achieves 92% accuracy on unseen tick species. Our proposed vision-based approach simplifies tick identification and contributes to the emerging work on public health surveillance of ticks and tick-borne diseases. In addition, it can be integrated with the geography of exposure and potentially be leveraged to inform the risk of Lyme disease infection. This is the first report of using deep learning technologies to classify ticks, providing the basis for automation of tick surveillance, and advancing tick-borne disease ecology and risk management.


Collaboration work with Vector Institute and Public Health Ontario