Transforming clinical measures computation with cloud native BigData pipeline

A leading healthcare provider wanted to transform their clinical measures computation process to be more efficient and cost effective. The existing process was built on a MSSQL, stored procedures and C# based toolchain.


The project identified a significant opportunity to revamp the clinical measure computation process for a leading healthcare provider:

  • Transition to Cost-Effective Solutions: The existing MSSQL, stored procedures, and C# based toolchain was costly due to .NET licensing fees. There was a clear need to reduce Total Cost of Ownership (TCO) while maintaining or enhancing the system’s efficiency.
  • Leveraging Open Source Technologies: By transitioning to open-source technologies, there was potential to significantly drive down costs.
  • Improving Scalability: The existing system’s scalability was limited, necessitating a solution that could handle increasing data loads and computational demands more effectively.
  • Modernizing the Infrastructure: Adopting a cloud-native Big Data pipeline was identified as a key strategy to modernize the infrastructure, aligning with current technological advancements.
  • Enhanced Data Processing: The project aimed to improve the way clinical data was processed, aiming for faster, more accurate computations.
  • Compliance and Security: In the healthcare sector, data compliance and security are paramount. The new system needed to meet these standards while being more cost-effective.
  • Interoperability and Integration: The solution needed to ensure seamless integration with existing healthcare systems an


We analyzed the existing process and identified the technology, financial and business concerns that needed to be addressed. We then designed a cloud native BigData pipeline that processed over 100 million clinical records in a day. At it’s peak the pipeline processed Terabytes of clinical data every day. Here is an outline of the modules that were built for this transformation project:

  • Source data storage and management: We used the Hadoop Distributed File System (HDFS) for handling and storing the large volumes of clinical raw data. Apache Hadoop was an excellent for big data storage and processing. The HDFS cluster was hosted on AWS EC2 nodes with SSDs and high network bandwidth. Eventually we added AWS S3 to this layer and controlled the cost of this module by storing data in various S3 storage tiers offered by AWS.
  • Data Processing pipeline: We used Apache Spark for running the measure element and measure score map reduce pipelines. It was a great fit for our large-scale data processing use case. We used Apache Spark exclusively in batch mode. We augmented the computing pipeline with Apache Solr for fetching intermediate results or master table lookups. These tools helped in building an efficient data ingestion and processing pipeline.
  • Database Management: This module was very crucial for the success of this entire product. We used Apache Cassandra for storing the intermediate results and the final measure scores. Apache Cassandra was a great fit for this use case because of its high availability, scalability and fault tolerance. We used Apache Cassandra exclusively in batch mode. We used Apache Solr for indexing the data stored in Apache Cassandra. This helped in building a fast and efficient data retrieval pipeline. Date, week, month, quarter and yearly data pivots were persisted in Apache Cassandra.
  • Workflow Orchestration: We built a custom in house admin UI that could automate, schedule and orchestrate the Spark jobs.
  • Containerization and Orchestration: We used built a custom build process to submit lightweight Spark job jar files via the workflow orchestration toolchain. We eventually moved to Docker containers and Kubernetes for containerization and orchestration.
  • Monitoring and Logging: We used Prometheus, AWS Cloudwatch and Cloudwatch logs for monitoring the infrastructure and visualizing logs. The job level monitoring tools were built for support engineers to monitor the job execution and troubleshoot issues. This helped in building a robust monitoring and logging and trustable infrastructure.
  • Security and Compliance: The pipeline was built to comply with the HIPAA regulations. We used AWS IAM roles to secure the data at rest and transit along with the infrastructure that accessed it. The entire pipeline was hosted on AWS Gov Cloud. We used AWS KMS for encrypting the data at rest and in transit. We used AWS Cloudwatch and Cloudwatch logs for monitoring the infrastructure and visualizing logs. The job level logs were stored in AWS S3 and were secured using AWS IAM roles to comply with the HIPAA regulations.

This setup was complex and powerful, suitable for large-scale, high-performance data processing and computation of clinical elements and measures.


The shift to a cloud-native Big Data pipeline yielded substantial outcomes:

  • Significant Cost Reduction: Transitioning to open-source technologies and a cloud-based infrastructure led to a notable decrease in the system’s TCO.
  • Scalability Achieved: The new pipeline provided enhanced scalability, allowing the healthcare provider to manage larger data sets and complex computations without performance degradation.
  • Increased Efficiency in Data Processing: The Big Data approach streamlined the clinical measure computation process, offering faster and more accurate results.
  • Modernized Healthcare Analytics Infrastructure: The project successfully modernized the healthcare provider’s analytics infrastructure, leveraging the latest in cloud and Big Data technologies.
  • Improved Data Compliance and Security: The new system adhered to stringent data compliance and security standards, crucial in the healthcare industry.
  • Enhanced Interoperability: The cloud-native solution improved interoperability with other healthcare systems, facilitating smoother data exchange and integration.
  • Foundation for Future Innovation: By adopting a more flexible and advanced technological framework, the healthcare provider laid the groundwork for future innovations and improvements in clinical data analysis


Image Image Image Image Image Image Image Image Image Image Image Image Image Image Image

Featured Success Stories

Petals: A decade long journey in revolutionizing Spiritual Learning with a Multilingual Content Platform

Petals, a sophisticated content management system, was created for a non-profit organization to disseminate spiritual texts like the Bhagwad Gita and Shree Dnyaneshwari.

Tech Logo Tech Logo Tech Logo Tech Logo Tech Logo

A zero cost cloud based platform for mental wellness professionals to train, assess and track the mental fitness of athletes and corporate employees.

Discover the transformative journey from a week-long manual mental fitness assessment process to a cutting-edge, automated solution completed in under a minute.

Tech Logo Tech Logo Tech Logo

Let’s Bring Your Ideas to Life!

How can tech transform your business?
Get in touch today!

"*" indicates required fields

Contact us