Data Engineer

Location:

Malden, MA

Posted:

May 20, 2025

Contact this candidate

Resume:

RAMYA DHAROOR

856-***-**** ***********@*****.*** www.linkedin.com/in/dharur-ramya

Professional Experience Synopsis

● skilled Data Engineer with 5 years of experience in designing, developing, and maintaining data pipelines and architectures across multi-cloud environments.

● Proficient in SQL, Python, and various data warehousing solutions, ensuring data integrity, security, and compliance with standards such as GDPR and HIPAA across complex systems.

● Extensive experience with Azure services, including Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake Storage (ADLS), and Azure Cosmos DB for advanced data processing, storage, and analytics solutions.

● Successfully integrated Azure Stack with Azure services such as Azure Active Directory, Azure Backup, Azure Site Recovery, and Azure Monitor, enabling seamless data management, backup, disaster recovery, and monitoring across hybrid environments.

● Hands-on experience with AWS services, including S3, RDS, DynamoDB, Redshift, Glue, and Lambda, providing scalable data storage and processing solutions.

● Strong expertise in implementing ETL/ELT processes using tools like Azure Data Factory and AWS Glue to handle large volumes of data, ensuring accuracy, performance, and speed.

● I am skilled in data modeling, data lakes, data warehousing, and data integration, with experience in designing scalable architectures to meet business requirements.

● Proficient in working with different file formats, such as PARQUET, TEXTFILE, AVRO, ORC, and various compression codecs like GZIP and SNAPPY, optimizing storage and processing efficiency.

● Demonstrated success in managing end-to-end data migration and integration projects, leveraging both Azure and AWS services to streamline operations and reduce costs.

● Experienced in using Terraform, CloudFormation templates for Infrastructure as Code (IaC) to automate and manage cloud infrastructure deployments.

● Extensive experience with version control systems like Git for managing code repositories, ensuring smooth collaboration and code integrity across teams.

● Skilled in data visualization and reporting using Power BI and Tableau, enabling stakeholders to gain actionable insights from complex data sets.

● Committed to implementing data governance frameworks, ensuring data quality, security, and compliance with industry regulations.

● Collaborative team player with excellent communication skills, experienced in working with cross-functional teams to deliver high-quality data solutions and drive business growth. TECHNICAL SKILLS

● Programming/Scripting Languages: SQL, Python, R, Java, Scala, PySpark, PowerShell.

● Databases: PostgreSQL, MongoDB, MySQL, Cassandra, Redis.

● Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP.

● Data Technologies and Frameworks: Azure (ADF, Databricks, Synapse, ADLS, HDInsight, Cosmos DB, Blob Storage,AKS), AWS(Glue,S3,EC2,EMR,IAM,Redshift,Athena,Lamdba,RDS,DynameDB,CloudWatch).

● Data Modeling and Processing: ETL/ELT Processes, Data Warehousing, Data Integration, Data Lakes, Data Migration, Data Governance, Data Quality.

● Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper, sparkML.

● Tools and IDEs: Azure Portal, Visual Studio Code, SQL Server Management Studio (SSMS), Azure Data Studio, Jupyter Notebook, AWS Management Console, AWS CLI, AWS SDKs.

● Orchestration Tools: Apache Airflow, Aws Step Functions.

● Operating Systems: Windows, UNIX, Linux.

● Version Control Tools: GIT, GitHub, Azure Repos.

● CI/CD Tools: Azure Pipelines, Jenkins, AWS Code Pipeline, AWS CodeBuild.

● Containerization Tools: Docker, Kubernetes.

● Visualization Tools: Tableau, Power BI.

EDUCATION:

Rowan University, Glassboro, New Jersey Sept 2022 - Dec 2023

● Degree: Master’s in computer science and engineering 3.7/4 CGPA Keshav Memorial Institute of Technology June 2015- May 2019

● Degree: Bachelor’s in computer science and engineering CERTIFICATIONS:

● Databricks Certified Data Engineer Associate.

● Microsoft Certified: Azure Data Engineer Associate.

● Certified AWS Data Engineer Associate.

WORK EXPERIENCE

Amazon Robotics, MA

Data Engineer Jan 2025 –Present

IT Skills: Aws Glue, MWAA, Athena, SQL, PySpark, EMR, S3, AWS Batch, Python, Data Catalog, Redshift, Sagemaker Studio, CloudFormation, CloudWatch, VPC Networking

Responsibilities:

● Engineered scalable ETL pipelines, refactoring a monolithic design into a parallelized PySpark-based architecture using AWS Glue, achieving a 70% reduction in processing time.

● Orchestrated end-to-end data workflows using Apache Airflow, integrating AWS Glue, AWS Data Catalog, and AWS Batch to automate and streamline data transformation and load processes.

● Spearheaded a proof of concept (POC) for Amazon MWAA (Managed Workflows for Apache Airflow), successfully deploying both private and public environments compliant with AWS networking standards.

● Implemented CI/CD pipelines for MWAA deployments across Alpha, Beta, and Production environments using AWS CodePipeline, CodeBuild, and CloudFormation, enabling automated and reliable DAG releases.

● Designed a robust ETL solution for the TTE pipeline, automating the processing and storage of data critical to the development of Amazon’s foundational AI/ML models.

● Partnered directly with product owners and data scientists to gather and translate business and analytical requirements into production-grade data pipeline solutions.

● Collaborated with cross-functional teams including ML engineers, DevOps, and analytics teams to ensure seamless integration and deployment of data products.

● Operated in a fast-paced Agile environment, participating in sprint planning, daily stand-ups, and retrospectives to align with iterative development goals and continuous delivery.

● Enhanced data discoverability and governance by maintaining schema registries and metadata management through AWS Glue Data Catalog.

Citizens Bank, NY

Data Engineer Feb 2024 – Jan 2025

IT Skills: ADF, Databricks and ADL Spark, Hive, HBase, Sqoop, Flume, Blob, data warehouse, cosmos DB, Map Reduce, HDFS, SQL, Azure, Python, Tableau, SQL Server.

Responsibilities:

● Constructed and optimized end-to-end ETL pipelines using Azure Data Factory and Databricks, improving financial data integration efficiency by 40%.

● Led the migration of on-premises data systems to Azure, utilizing Informatica PowerCenter and Azure Synapse Analytics to reduce latency by 30% and streamline workflows.

● Administered Azure Data Lake environments, implementing static and dynamic partitioning to enhance query performance and ensure scalable data retrieval.

● Integrated real-time market feeds with internal financial systems via Azure Event Grid and Kafka, enabling accurate and timely updates for risk management.

● Developed predictive models for Value at Risk (VaR) and stress testing using Azure Machine Learning, improving the accuracy of risk forecasting.

● Streamlined data processing using PySpark on Azure Databricks, reducing pipeline execution times by 50% for large-scale financial datasets.

● Configured and managed Azure HDInsight clusters, applying performance tuning and monitoring techniques to boost operational efficiency by 25%.

● Established data governance protocols in Snowflake, incorporating role-based access control (RBAC) and encryption to support regulatory compliance.

● Built interactive dashboards in Power BI, enabling financial analysts to derive insights and make informed decisions through enhanced data visualization.

● Implemented Slowly Changing Dimension Type 2 (SCD2) logic in Databricks to maintain historical accuracy and manage complex data versioning requirements.

Mayo Clinic, India

Data Engineer May 2020 – Aug 2022

IT Skills Azure HDInsight, Azure Data Lake, Azure Databricks, Azure Data Factory, Azure SQL Database, Azure Blob Storage, Azure Synapse Analytics, Python, Spark, SQL, JSON, XML, NoSQL, HDFS. Responsibilities:

● Architected scalable data warehousing solutions with Azure Synapse Analytics and SQL Data Warehouse, supporting high-volume healthcare data storage and accelerating analytical query performance.

● Structured clinical data models using star and snowflake schemas to facilitate intuitive data navigation and enhance performance for analytical workloads.

● Leveraged Apache Spark on Azure Databricks to accelerate the transformation of extensive healthcare datasets, cutting job runtimes by 50% and supporting near real-time data pipelines.

● Linked EHR systems with data platforms via HL7 and FHIR standards, ensuring seamless interoperability across healthcare applications and third-party systems.

● Provisioned cloud infrastructure through ARM templates and VM Scale Sets, enabling consistent and automated environment setup for development and production workloads.

● Formulated advanced transformation logic in Azure Data Factory and Databricks to enforce data validation, standardization, and enrichment for clinical reporting.

● Programmed automation scripts in Python to configure and maintain Azure Databricks clusters, minimizing manual overhead and enhancing cluster lifecycle management.

● Rolled out predictive analytics workflows in Azure Machine Learning for patient risk stratification and readmission prediction, contributing to improved care outcomes.

● Synchronized Informatica PowerCenter with Azure services, allowing streamlined cloud-based data integration across disparate healthcare sources.

● Applied HIPAA-compliant data governance strategies, embedding role-based security, auditing, and encryption protocols to protect sensitive patient data.

● Documented deployment strategies for HDInsight, Spark, and Hive jobs, outlining Azure-specific optimizations to aid reproducibility and team knowledge sharing.

● Configured data pipelines to handle Avro, Parquet, and semi-structured JSON formats, facilitating efficient schema evolution and analytics readiness.

● Enabled metadata lineage tracking and audit logging, supporting data traceability and compliance across analytics workflows.

Target, India

Junior Data Engineer Feb 2019 - Apr 2020

IT Skills: Databricks, Sqoop, ApacheNifi, Spark, SQL, PowerBI, ETL., airflow, Glue, S3, SQL Server 2008R2, SSIS, Windows server, SQL Query Analyzer, Oracle 8.0.

Responsibilities:

● Facilitated the ingestion and normalization of retail sales data from heterogeneous sources into AWS Redshift, maintaining data consistency across product categories and store locations.

● Collaborated in Agile sprints, breaking down tasks into iterative deliverables, participating in stand-ups, and aligning with sprint objectives to maintain delivery velocity.

● Refined complex SQL queries to improve retrieval performance and accuracy of reporting metrics related to inventory turnover, sales performance, and category trends.

● Supported the modernization of legacy systems by transitioning retail datasets from on-premises databases to AWS RDS, ensuring minimal operational impact.

● Enabled automation of repetitive workflows using Python scripts and AWS Lambda functions, significantly reducing manual intervention in daily retail data processing.

● Recalibrated SQL execution plans to minimize query costs and execution times, aiding faster insights for pricing, discounting, and replenishment decisions.

● Created and deployed Excel-based automation tools, such as VBA macros, to streamline report generation for sales and inventory stakeholders.

● Supervised the health and performance of AWS-native pipelines, diagnosing bottlenecks and improving pipeline uptime by 25% through proactive monitoring.

● Documented end-to-end data flow architecture, including lineage, transformations, and access controls, to improve team onboarding and operational clarity.

Contact this candidate