2024 HPCC Systems Summit: Building an NLP Pipeline for Electronic Health Rec... / Learned Cache Size

Просмотров: 68 | Загружено: 9 мес.

HPCC Systems

0

Скачать

Всё видео пользователя: AM-HPCC Systems

Подробнее о видео

This joint presentation includes two talks from our HPCC Systems academic community.

Building an NLP Pipeline for Electronic Health Records and Brain MRI Classification - Vishalakshi Prabhu, Eshaan Mathur, Nikhil Vasu & Prashant Ronad, RV College of Engineering

A medical record can includes a variety of types of "notes" entered over time by healthcare professionals, such as observations and administration of drugs/therapies, test results, X-rays, reports, etc. Accordingly, one of the biggest challenges in healthcare is the unavailability of data standardization models. At the same time, most doctors rely on their own knowledge and limited patient data when making decisions. Therefore, accessing the knowledge of many medical professionals would potentially benefit patient care.

This work aims to develop an effective disease identification/classification system using NLP for Electronic Health Records (EHR). Further, it builds a knowledge base for future reference, allowing for querying patient/disease details and for pattern finding. BioBERT is used for text embedding in this project. The selected approach has leveraged pre-processing in such a way that the symptom, duration, gender and affected organ of the patient are labeled/displayed when the whole text is given as input.

The project also tries to identify the characteristics of brain tumours from textual data in Level-1, and provides a classification of brain MRIs in Level-2 by importing the VGG-16 model. The model should be trained to identify three tumor types (Benign; Pre-Malignant; Malignant) and recognize five to six important brain diseases. This is possible due to spraying of the large brain MRI datasets to an HPCC Systems cluster and training using the Generalized Neural Network Bundle (GNN)- version 3. This is a work in progress study and a relatively unexplored approach that needs further discussion.

Target Audience:
• Students, academicians, and industry persons who wish to solve problems related to unstructured medical records.
• Researchers working on healthcare/medical data analytics
• Technology enthusiasts willing to learn the latest trends.

----------------------------------
Learned Cache Size Setting for Roxie Clusters - Yifan Wang, University of Hawaii

Internal node cache of the index has a non-trivial impact on the Roxie system component. Proper setting of node cache size will result in a significant speedup on data access. However, optimal internal cache sizes depend on various factors, including the access patterns within the index, the compression ratio of data, the disk IO speed, and the time needed to decompress the data.

In this talk, I present our research on setting best cache size using machine learning methods. I first introduce the background and latest works of learning-based knob tuning which aims at predicting the best configurations for data systems. I then present our research in two parts: (1) simulation of Roxie system and (2) learning method over the simulation results.
I hope this talk can provide a basic overview about the knob tuning in AI for database area, as well as help inspire other researchers and developers to improve HPCC Systems using the emerging AI techniques.

This talk is targeting researchers and developers of HPCC Systems to provide insight on facilitating the system using AI techniques and inspire new direction to further improve the platform.

© 2024 LexisNexis Risk Solutions

2024 HPCC Systems Summit: Building an NLP Pipeline for Electronic Health Rec... / Learned Cache Size

Похожие видео