Data Engineer Engineering

Location:

Providence, RI

Posted:

April 30, 2025

Contact this candidate

Resume:

Data Engineer

Name: Rambabu Pathakamuri

Email: **********************@*****.*** Phone: 413-***-****

Professional Summary

4+ Years of Data Engineering Expertise: Proven experience in building, maintaining, and optimizing data systems, including designing data pipelines and processing large- scale datasets across diverse industries such as finance, e-commerce, and healthcare.

Data Pipeline Development & Optimization: Specialized in designing and implementing robust ETL (Extract, Transform, Load) processes to efficiently handle and process data. Expert in automating data pipelines using tools like Apache Airflow, Talend, and Python scripting.

Big Data Technologies & Cloud Platforms: Skilled in utilizing big data frameworks such as Hadoop, Spark, and Kafka for distributed data processing. Extensive hands-on experience with cloud platforms including AWS (S3, Redshift, Lambda), Azure, and GCP, ensuring highly scalable, cost-effective, and secure data solutions.

Data Warehousing & Architecture: Strong background in building and maintaining modern data warehouses with solutions like Amazon Redshift, Google BigQuery, and Snowflake. Proficient in data modeling, schema design, and performance optimization for data retrieval and analytics.

Real-Time Data Processing: Expertise in implementing real-time data streaming and processing solutions using technologies like Kafka and Apache Flink, enabling businesses to make timely, data-driven decisions.

Database Management: Advanced knowledge of both relational (PostgreSQL, MySQL) and NoSQL (MongoDB, Cassandra) databases, ensuring efficient data storage, retrieval, and management across different types of applications.

Data Quality & Performance Optimization: Adept at implementing data validation and cleansing processes to ensure the accuracy and integrity of datasets. Strong problem- solving skills for tuning system performance, improving query efficiency, and resolving bottlenecks in data flow.

Collaboration & Cross-Functional Leadership: Collaborative team player with a proven track record of working closely with data scientists, analysts, and business stakeholders to deliver actionable insights and meet business objectives.

Continuous Learning & Innovation: Passionate about staying up to date with the latest trends and technologies in the field of data engineering, including advancements in AI/ML, data governance, and automation.

Technical Skills

Category

Skills

Data Engineering &

Big Data Technologies

ETL Processes, Data Pipelines, Data Transformation, Data Cleansing, Automation (Airflow, Apache NiFi),Hadoop, Apache Spark, Apache Flink, Apache Kafka, HDFS, MapReduce, YARN

Machine Learning & Data Integration

Tools

ML Pipelines, Data Preprocessing, Model Training & Evaluation (TensorFlow, Scikit-learn, PyTorch), Apache NiFi, Talend, Informatica, Microsoft SSIS, Fivetran

Cloud Platforms &

Data Visualization

AWS (S3, Redshift, Lambda, Glue), Google Cloud Platform (Big Query, Dataflow, Pub/Sub), Azure (Azur Tableau, Power BI, Looker, Google Data Studio re SQL, Azure Data Lake),Tableau, Power BI, Looker, Google Data Studio

Databases & Data

Warehousing

PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, Redis, DynamoDB, Redshift, Snowflake, Google Big Query, Amazon Aurora, Teradata, Vertica

Programming Languages

Python, SQL, Java, Scala, Bash, Shell Scripting

Data Security & Version Control

Data Encryption, IAM, Access Control, GDPR Compliance

, Git, GitHub, GitLab

Monitoring & Logging

Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana), CloudWatch

CERTIFICATIONS:

AWS Certified Data Engineer - Associate Professional Experience

Solenis,DE Sep 2024 – Mar 2025

Data Engineer /Internship

Responsibilities

Design, develop, and maintain scalable and efficient data pipelines to support healthcare data processing, including clinical, claims, and operational data.

Lead the integration of disparate healthcare data sources, including EHR/EMR systems, insurance claims, patient records, and external healthcare databases, into centralized data lakes or data warehouses.

Architect and optimize data storage solutions tailored for healthcare data, ensuring compliance with HIPAA and other regulatory standards, while enabling efficient querying and retrieval for analytics and reporting.

Implement and manage ETL processes to ensure high-quality, timely, and accurate data transformation, loading, and processing of healthcare datasets.

Develop strategies for real-time data processing and streaming of healthcare data using tools such as Apache Kafka, AWS Kinesis, or Apache Flink, enabling timely insights for healthcare decision-making.

Collaborate with healthcare data scientists, analysts, and clinicians to understand specific data requirements, and translate them into robust technical solutions for improved patient outcomes.

Ensure data quality and integrity across all healthcare data, implementing data validation, cleansing, and monitoring processes to meet strict regulatory and clinical standards.

Mentor junior data engineers, guiding them on best practices for managing sensitive healthcare data, optimizing data pipelines, and resolving data-related issues.

Manage healthcare data security and privacy, ensuring that all data engineering processes adhere to HIPAA guidelines and other privacy regulations, maintaining secure access and data storage.

Troubleshoot and resolve complex data pipeline issues, ensuring the high availability of healthcare data systems and minimizing downtime, especially for critical patient-related data.

Monitor, optimize, and improve the performance of healthcare data systems, focusing on reducing latency and ensuring reliable, fast access to critical healthcare data.

Stay updated on new technologies and industry trends related to healthcare data engineering, including innovations in telemedicine, wearables, and electronic health records (EHR).

Work with cloud platforms (AWS, GCP, Azure) to deploy scalable and cost-effective healthcare data solutions, ensuring compliance with healthcare industry standards.

Create and maintain detailed documentation for healthcare data systems, pipelines, and processes to ensure transparency, auditability, and ease of knowledge transfer.

Collaborate with healthcare stakeholders, including IT teams, clinical leaders, and business analysts, to ensure that the data infrastructure supports patient care goals, operational efficiency, and regulatory compliance.

Drive continuous improvement in data workflows, introducing automation, optimization, and enhancements to better meet the growing data needs of the healthcare organization.

Environment: (HIPAA, GDPR), cloud platforms (AWS, GCP, Azure), data integration (EHR, claims, wearables), ETL processes (Apache Airflow, AWS Glue), data pipelines, real- time streaming (Kafka, Kinesis), data storage (data lakes, data warehouses), databases (SQL, NoSQL), data security (encryption, RBAC), collaboration (data scientists, clinicians, analysts), data quality (validation, cleansing), machine learning (predictive analytics), data governance (metadata, lineage), performance optimization, data visualization (Tableau, Power BI), and scalability. Key focus on data privacy, audit trails, cloud solutions, and healthcare interoperability

Wipro, India

Data Engineer Jul 2021 – Jul 2023 Responsibilities

Design, development, and management of complex data architectures on AWS to support data ingestion, processing, and storage needs across the organization.

Architect and implement scalable data lakes and data warehouses using AWS services such as S3, Redshift, and AWS Glue to centralize and manage large datasets.

Design, develop, and optimize advanced ETL pipelines using AWS Glue, Lambda, and Step Functions to automate the extraction, transformation, and loading of data.

Integrate diverse data sources, including on-premises systems, third-party APIs, and cloud- based platforms, into a unified data environment on AWS.

Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and design solutions that enable effective analytics and insights.

Ensure the integrity, accuracy, and security of data throughout the data lifecycle, implementing data validation, encryption, and compliance practices (e.g., GDPR, HIPAA).

Implement real-time data processing and streaming solutions using AWS Kinesis, Lambda, or Apache Kafka to deliver up-to-date insights for operational decision-making.

Optimize data storage and retrieval performance across AWS platforms, including Amazon S3, Redshift, and RDS, ensuring low-latency access to critical business data.

Design and implement disaster recovery strategies, backups, and data redundancy mechanisms to maintain high availability and data durability.

Lead efforts to monitor and troubleshoot complex data pipelines and systems, ensuring data flows smoothly, performance is optimal, and issues are swiftly resolved.

Mentor junior data engineers and provide guidance on best practices for AWS data infrastructure, code development, and cloud-native solutions.

Drive automation and continuous improvement of data workflows, utilizing AWS CloudFormation, Terraform, and other tools to automate infrastructure provisioning and configuration management.

Ensure cost optimization across all AWS services, implementing strategies for efficient resource management and budget control without sacrificing performance or scalability.

Stay current on the latest AWS technologies and trends, evaluating new tools and services that can enhance the overall data engineering process and infrastructure.

Create and maintain comprehensive documentation for data pipelines, architecture, and workflows to ensure transparency, compliance, and knowledge sharing within the team.

Environment: AWS services (S3, Redshift, Glue, Lambda, Kinesis, RDS, DynamoDB, Athena), data lakes, data warehouses, ETL pipelines, real-time data processing, serverless architecture, data security (IAM, KMS, encryption), cloud infrastructure (CloudFormation, Terraform), data integration (cloud/on-premises), data governance, compliance (HIPAA, GDPR), cost optimization, automation (Step Functions), monitoring (CloudWatch), performance optimization, data transformation, data quality, high availability, disaster recovery, big data solutions, cross-functional collaboration, data pipeline orchestration (Airflow, Glue), batch & stream processing, and machine learning integration (Sage Maker)

Tata Consultancy Services

India

Azure Data Engineer Feb 2020 – Jun 2021

Responsibilities

Design, implement, and maintain scalable data pipelines using Azure tools to support data ingestion, transformation, and storage.

Develop and manage data lakes and data warehouses using Azure Data Lake Storage, Azure Synapse Analytics, and Azure SQL Database to centralize and manage data.

Design, build, and optimize ETL processes using Azure Data Factory to automate the extraction, transformation, and loading of data from multiple sources.

Ensure data integrity, quality, and accuracy through validation, cleansing, and transformation processes, enabling high-quality data for analytics and reporting.

Integrate data from a variety of sources, including on-premises, cloud-based, and third-party systems, ensuring seamless data flow into Azure platforms.

Optimize data storage solutions in Azure Blob Storage and Azure Data Lake for efficient querying, retrieval, and processing.

Implement real-time data streaming and processing using Azure Stream Analytics, Event Hubs, and Azure Databricks for immediate insights.

Manage database performance and optimize queries in Azure SQL Database and Azure Cosmos DB to meet business needs.

Develop and maintain automated solutions for data governance, security, and compliance, ensuring adherence to relevant regulations (e.g., GDPR).

Collaborate with data scientists, analysts, and business stakeholders to understand their data requirements and deliver technical solutions to meet them.

Monitor, troubleshoot, and resolve issues with data pipelines and systems to ensure high availability and minimize downtime.

Implement cost-efficient solutions and optimize resource usage to ensure the effective use of

Azure resources while maintaining high performance.

Set up and configure monitoring solutions using Azure Monitor and Log Analytics to ensure data pipeline health and performance.

Ensure proper data security practices by using Azure Security Center, Azure Key Vault, and

Azure Active Directory to safeguard sensitive data.

Mentor junior engineers, providing guidance on best practices for developing and maintaining Azure data systems and pipelines.

Stay up to date with the latest developments in Azure technologies and incorporate new tools and techniques to enhance the data engineering environment.

Environment: Azure services (Data Lake Storage, Synapse Analytics, SQL Database, Cosmos DB, Blob Storage, Event Hubs, Stream Analytics, Databricks, Data Factory), ETL pipelines, real-time data processing, data lakes, data warehouses, data integration

(cloud/on-premises), data governance, data security (Key Vault, Active Directory, Security Center), compliance (GDPR), performance optimization, monitoring (Azure Monitor, Log Analytics), cost optimization, automation, data quality, data transformation, data validation, collaboration (data scientists, analysts), cloud infrastructure, high availability, disaster recovery, cost management, scalable solutions, batch & stream processing, and machine learning integration (Azure ML).

Education Details

Masters - Rivier University – Computer Science

Bachelor’s – BVSR Engineering College – Computer Science

Contact this candidate