Data Engineer Real-Time

Location:

Fort Wayne, IN

Salary:

100,000.00

Posted:

January 06, 2025

Contact this candidate

Resume:

Summary

Manish Reddy Mucharla DATA ENGINEER

Indiana, USA ***************@*****.*** 260-***-****

• Data Engineer with 5 years of experience designing and optimizing scalable data pipelines using AWS (Glue, EMR, Lambda), Apache Spark (PySpark), and Apache Kafka.

• Experienced in building and deploying ETL solutions, data integration, and transformation pipelines across cloud platforms for high performance, reliability, and scalability in real-time and batch processing.

• Skilled in data warehousing with Snowflake and AWS RDS, optimizing storage, querying, and analysis of large datasets for efficient analytics and reporting.

• Proficient in managing cloud infrastructure with AWS tools like EC2, S3, and Terraform for automated, cost-efficient workflows.

• Experienced in containerization and orchestration using Docker and Kubernetes to enhance scalability and portability for data engineering workloads.

• Capable of creating and maintaining real-time data processing systems with Apache Kafka and AWS Glue, ensuring seamless data ingestion and transformation for downstream systems.

• Skilled in building interactive dashboards using Tableau and Power BI to provide actionable insights for data-driven decisions.

• Experienced with MySQL, PostgreSQL, MongoDB, and SQL Server, optimizing queries and performing data migrations and integrations.

Skills

Methodologies: SDLC, Agile, Waterfall

Programming Language: Python, SQL, R

Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn

Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP), Quick Sight

IDEs: Visual Studio Code, PyCharm, Jupyter Notebook, IntelliJ

Database: MySQL, PostgreSQL, MongoDB, SQL Server

Data Engineering Concept: Apache Spark, Apache Hadoop, Apache Kafka, Apache Beam, ETL/ELT, PySQL, PySpark

Cloud Platforms: AWS (EC2, S3, Lambda, Glue, Athena, SNS, RDS, EMR), Microsoft Azure

Other Technical Skills: Data Lake, SSIS, SSRS, SSAS, Docker, Kubernetes, Jenkins, Terraform, Informatica, Talend, Snowflake, Google Big Query, Data Quality and Governance, Machine Learning Algorithms, Natural Language Processing, Big Data, Advance Analytics, Statistical Methods, Data Mining, Data Visualization, Data warehousing, Data transformation, Critical Thinking, Communication Skills, Presentation Skills, Problem-Solving

Version Control Tools: Git, GitHub

Operating Systems: Windows, Linux, Mac OS

Experience

Data Engineer Cigna, Carmel, Indiana Jan 2024 – Present

• Designed and implemented ETL pipelines using AWS Glue and Apache Spark (PySpark) to process and transform over 10 million healthcare claims records daily, reducing processing time by 40% and enabling near-real-time analytics.

• Developed a centralized data lake on Amazon S3 to store and manage structured (RDS, EMR) and unstructured (JSON, Parquet, XML) data, ensuring 99.9% availability for analytics and reporting.

• Optimized query performance by 30% in Snowflake using advanced partitioning, clustering, and query pruning techniques, improving ad-hoc analytics speed.

• Automated data ingestion and transformation with AWS Lambda and Terraform, reducing manual intervention by

60% and ensuring consistent infrastructure-as-code (IaC) across environments.

• Consolidated healthcare data from 5+ sources (EHR, EMR systems, claims databases) into a unified platform, facilitating advanced analytics while ensuring HIPAA compliance.

• Orchestrated real-time data processing workflows with Apache Kafka and AWS Glue for timely ingestion, transformation, and delivery of claims data to fraud detection models.

• Containerized and deployed ETL workloads with Docker, enabling scalability across AWS EC2 clusters and reducing deployment time by 35%.

• Developed Python validation scripts to ensure the accuracy, consistency, and completeness of healthcare claims data, achieving 100% reliability for analytics.

• Managed metadata and data cataloging with AWS Glue Data Catalog, improving data discoverability and reducing query preparation time by 50%.

• Optimized PySpark jobs for parallel processing of large-scale claims data, reducing execution time by 2x with fine- tuned cluster configurations on Amazon EMR.

• Created Tableau and Power BI dashboards for real-time claims KPI visualization, improving decision-making by 30%.

• Documented ETL workflows, data dictionaries, and SOPs, streamlining onboarding and reducing training time for new team members by 40%.

Data Engineer Capgemini, India Jan 2020 – Dec 2022

• Architected and implemented a real-time and batch data processing pipeline for a FinTech payment system using Apache Beam and PySpark on AWS EMR, reducing payment processing latency by 30% while handling 1.2 million transactions daily.

• Integrated AWS Lambda to automate event-driven workflows, reducing manual intervention and cutting operational costs by 25%.

• Developed and optimized ETL pipelines with Apache Beam and AWS Glue to ingest, transform, and process transactional data into a centralized AWS RDS PostgreSQL database, improving query execution speed by 40% through indexing and optimization.

• Configured AWS SNS for real-time alerts on pipeline failures and data anomalies, improving incident resolution time.

• Optimized data processing workflows using parallel execution and partitioning in PySpark, reducing execution time by 30% for large-scale data streams.

• Built Tableau dashboards to visualize transaction metrics such as latency, throughput, and error rates, reducing manual reporting time by 35%.

• Implemented Apache Kafka for streaming data ingestion from multiple payment sources, ensuring real-time data availability and seamless downstream processing.

• Automated CI/CD pipelines with Jenkins and containerized the application using Docker, reducing deployment time.

• Performed data validation, reconciliation, and anomaly detection using Python (Pandas, NumPy) and SQL, ensuring 100% data accuracy in payment records.

• Designed AWS Glue jobs for batch processing historical payment data, improving data readiness for downstream analytics by 50%.

• Developed data governance and monitoring frameworks using AWS CloudWatch and Python scripts to ensure pipeline health, SLA adherence, and regulatory compliance.

• Led sprint planning, backlog grooming, and issue tracking in JIRA, ensuring on-time delivery and addressing project blockers.

Data Engineering Intern TCS (Tata Consultancy Services), India Jan 2019 – Dec 2019

• ETL Pipeline Development: Assisted in the design and implementation of ETL pipelines to process and integrate large-scale transactional and customer data from multiple systems using Apache Spark (PySpark) and AWS Glue, reducing data processing times by 15%.

• Cloud Infrastructure & Data Storage: Worked on AWS Cloud services, including EC2, S3, and RDS, to manage and store structured and unstructured data, ensuring efficient data retrieval and high availability.

• Data Transformation & Integration: Applied Python and SQL to clean and transform raw data from various data sources, ensuring it was structured for analysis and visualization in AWS RDS and SQL Server.

• Data Warehouse Management: Contributed to optimizing data warehousing processes using AWS RDS and Snowflake, improving query performance by 25% and enabling faster reporting capabilities for business users.

• Real-time Data Processing: Assisted with setting up real-time data processing workflows using Apache Kafka and AWS Lambda to enable continuous data ingestion from external systems for immediate analysis and reporting.

• Dashboarding & Reporting: Supported the development of interactive Tableau and Power BI dashboards, visualizing key business metrics (e.g., customer churn, sales, operational performance) and reducing manual reporting efforts by 30%.

• Automation of Data Workflows: Automated various manual processes using AWS Lambda, helping reduce errors and operational costs by 20% and improving data pipeline efficiency.

• Data Quality & Validation: Worked on developing data validation scripts in Python and SQL, ensuring data accuracy and consistency before integration into the data lake.

• Agile Development & Collaboration: Collaborated with cross-functional teams in an Agile environment, participating in sprint planning, stand-up meetings, and assisting with the development of user stories and backlog grooming.

• Documentation & Knowledge Sharing: Documented ETL processes, data dictionaries, and created SOPs for data ingestion and processing workflows, contributing to a 30% reduction in onboarding time for new team members.

• Client Interaction & Reporting: Assisted in the preparation of project reports and presentations for client meetings, providing updates on data pipeline performance and data quality initiatives.

Education

Master of Science in Information Technology - Indiana Institute of Technology, Fort Wayne, Indiana, USA

Bachelor of Technology in Information Technology - Keshav Memorial Institute of Technology, Hyderabad, India

Certification

AWS Certified Data Engineer – Associate (DEA-C01) – Amazon Web Services (AWS), (Validates expertise in designing and implementing data processing solutions, managing data storage, and ensuring data integrity on AWS)

Contact this candidate