— The next generation data-grounded medical research project

The Project

Our partner collaborated with a world leader in the medical diagnostics industry to develop a scalable, cost-effective, and flexible data system for storing, processing, and analyzing medical records from clinics across North America.

The project required unstructured medical data to be annotated and classified for use in retrospective studies, R&D activities, drug efficacy investigations, and evidence-based medicine. This system would enable healthcare providers to access best industry practices easily and enhance their capabilities in preventative medicine.

Initially, the client had amassed 20 terabytes of raw health data, which was unclassified and therefore unusable for data analysis, reporting, statistics, and data mining.

Our partner’s development team conducted an in-depth investigation and implemented the following process:

  • Data Storage: All data was stored within a multi-node HBase cluster on top of a Hadoop framework, powered by the MapR platform.
  • Data Classification & Annotation: The second phase involved classifying and annotating data within the project ecosystem.
    • The first module enabled manual classification and annotation of a subset of the data, which was then used to generate input for machine learning models. This was built using AngularJS, Spring, Hibernate, and MySQL.
    • The second module utilized this annotated data to train machine learning models, enhancing classification accuracy.

As a result, the Proof of Concept (PoC) successfully demonstrated high data accuracy, paving the way for a more efficient and intelligent approach to medical data analysis.


Technologies

Hbase, Hadoop, MapR, Apache Storm, RabbitMQ, UI (Angular)+Backend REST(Java/Spring) app for human annotation, H2O +

Java backend (process orchestration) for machine learning


Team

TM, 2 specialists