scroll

Customer Name

Partner Name

We modernized Lemnisk’s technical data architecture using Amazon EMR, leading to 75% reduction in pig script execution and substantially higher speed and agility in handling data

Challenge

INDUSTRY OVERVIEW

Lemnisk’s a Customer Data Platform-led personalization and real-time marketing automation solution delivers superior customer experiences that result in increased conversions, retention, and growth for enterprises. With a global customer base and growing data needs, Our Customer sought to optimize its data processing and analysis capabilities by leveraging cloud infrastructure and modern data engineering practices.

CHALLENGE

To optimize and improve existing MapReduce jobs, analyze the system and identify areas for improvement. Create a plan to migrate the jobs to Elastic MapReduce clusters scheduled via Apache Airflow. Configure and set up the EMR clusters and Apache Airflow, implement the migration plan, and test thoroughly to ensure required performance and functionality standards are met. Set up monitoring using tools like Prometheus and Grafana to track performance and identify potential issues. Successful migration requires careful planning, testing, and monitoring.

Gaps in the current system:

  • When planning a migration, it is crucial to ensure that it does not impact existing business processes and data flow. This means that the new system or platform being adopted should seamlessly integrate with the existing infrastructure, and data should be transferred without any loss or corruption
  • The migration should be completed within the agreed-upon timeline and budget

Why were we brought in?

Ganit has made a significant dent in various industries using data science and analysis. Ganit partners with clients to translate their data into a tangible, insightful plan of action that delivers on a measurable impact to the clients’ topline & bottom-line growth.

Our approach

Solution

For this project, Ganit implemented an AWS architecture that leveraged several AWS services to meet the business requirements of Lemnisk. The architecture included:

  • Elastic Compute Cloud (EC2) instances for hosting the Apache Airflow server and the Prometheus and Grafana servers.
  • EC2 Auto Scaling to automatically adjust the number of EC2 instances based on demand.
  • Amazon S3 for storing the raw and processed data.
  • Amazon EMR for running the Apache Spark cluster and processing the data.
  • Amazon Managed Workflows for Apache Airflow (MWAA) to manage and schedule the data pipeline.
  • Amazon Simple Notification Service (SNS) to send alerts when there were issues with the pipeline.
  • CloudFormation to manage the infrastructure as code and easily deploy the architecture.

Features of the tool

  • Implemented comprehensive monitoring and alerting using Prometheus and Grafana to detect and respond to issues for cluster performance.
  • Optimizing costs by using rightsizing and right-pricing techniques and leveraging automation for resource management.

A valuable difference

Impact

  • The Ganit team successfully migrated Lemnisk's data to AWS EMR clusters, implemented custom scheduling with Apache Airflow, and set up monitoring with Prometheus and Grafana.
  • This migration resulted in a 75% reduction in pig script execution duration compared to their previous on-premises servers.
  • The team was able to run all 60 scripts parallelly on AWS EMR, leading to greater speed and agility in handling data. Overall, the migration had a highly beneficial impact on Lemnisk's data processing capabilities.
Success stories

See the impact that we make on our
cross-industry client base.

Top