Data Engineer Azure

Location:

Fremont, CA

Posted:

March 10, 2025

Contact this candidate

Resume:

DINESH AMBAVARAM

+1-669-***-**** ******.******@*****.*** linkedin.com/in/dinesh-ambavaram

Summary

Experienced Data Engineer with expertise in designing, building, and optimizing scalable data pipelines and robust data platforms. Proficient in cloud technologies including AWS and Azure, and adept at leveraging tools like Apache Spark, Databricks, and Airflow to drive efficient data engineering workflows. Highly skilled in Python and SQL, specializing in developing and automating ETL processes, ensuring high data quality, and enabling seamless integration for reliable analytics. Proven ability to enhance data platform scalability, improve system performance, and support strategic business objectives. Strong collaborator with a track record of delivering innovative, cost-effective solutions in dynamic, fast-paced environments. Skills

• Database Tools: Azure Data Factory, Azure SQL, SQL Database, Microsoft SQL Server, Teradata, HBase, Hive, AWS RDS, AWS Redshift, Athena, Snowflake, Teradata, Oracle, Azure Fabric,Big Query, Cloud Spanner

• Data Warehousing: Azure Synapse Analytics, SSIS, Data Warehousing, OLAP, OLTP, Kimball Methodology, AWS Glue.

• Programming: Python, PySpark, Spark SQL, R, T-SQL, Data Structures, SAS Viya, Corn shell, Java, Go.

• Reporting Tools: Power BI, Tableau, DAX, SSAS, Looker.

• Big Data Tools: Apache Spark, Data Bricks, HDFS, Zookeeper, Kafka, Flink, AWS Kinesis, Google Dataflow, Google Pub/Sub

• ETL Tools: Informatica, Airflow, SSIS, Azure Data Factory, AWS Glue, AWS DMS, Google Cloud Data Fusion, Google Cloud Composer.

• Cloud Tools: AWS S3, AWS Lambda, AWS EC2, AWS Step Functions, AWS CloudWatch, Azure Blob, Azure Data Lake, Google Cloud Storage (GCS), Google Kubernetes Engine (GKE), Cloud Functions, Cloud Run

• Other: Linux, GIT, Automic, SDLC, RESTful API, Agile, Unit Testing, Vertex AI, Google Cloud IAM, Terraform for GCP Work Experience

Digitals AI San Jose, USA

Data Engineer June 2023 - Present

• Successfully migrated Appworx datasets to a new foundation SOT dataset using MSBI, achieving 99.9% data accuracy, improved accessibility, and enhanced database performance while minimizing downtime during migration.

• Implemented robust data quality checks (Sentinel DQ) on fact and dimension tables using SQL Database, ensuring high data reliability and compliance with business standards.

• Redesigned legacy Hive/Pig scripts into optimized Spark processes using Azure Databricks, significantly reducing execution time and enhancing processing efficiency in large-scale data environments.

• Streamlined data integration workflows by loading fact and dimension tables with PolyBase in Azure Synapse, delivering faster and more efficient data retrieval and integration.

• Architected and optimized scalable data pipelines to streamline real-time and batch processing, improving data flow efficiency and redsssssucing processing time.

• Performed advanced data analysis and predictive modeling, uncovering actionable insights that enhanced business strategies and operational efficiency.

• Developed and automated ETL workflows using Python and SQL, ensuring seamless data integration, high availability, and optimized query performance.

• Designed and orchestrated complex data acquisition and integration pipelines using Azure Data Factory (ADF), automating and streamlining data flow to meet business demands.

• Optimized MapReduce and Spark jobs by fine-tuning resource allocation and leveraging YARN for dynamic resource management, enhancing overall cluster performance.

• Developed an interactive SLA dashboard in Power BI, reducing manual reporting time by 70% and improving visibility into pipeline performance and compliance metrics.

Cognizant Technology Solutions Chennai, IND

Programmer Analyst Jan 2020 - Jul 2022

• Facilitated seamless I/O data operations using Azure Data Factory (v2), migrating data from Teradata to Azure Data Lake Store Gen1 and Gen2, significantly improving data accessibility and scalability for downstream analytics.

• Engineered robust MapReduce programs in Java to perform critical business validation, ensuring high accuracy and reliability in large-scale data processing workflows.

• Streamlined cross-platform data manipulation across diverse sources, including HDFS and Teradata, enabling efficient integration and enhancing data management capabilities.

• Optimized in-memory data computation by leveraging Spark RDD, achieving 3x improvement in processing speed and reducing latency for real-time analytics.

• Designed and optimized high-performance I/O data pipelines, ensuring seamless transfer of 10TB+ datasets between Teradata and HDFS, minimizing data loss and maximizing flow efficiency.

• Executed comprehensive data transformation, cleansing, and filtering using Hive and MapReduce, delivering high-quality datasets loaded into HDFS and ready for advanced data mining and analytics.

• Collaborated with cross-functional teams to identify bottlenecks in data pipelines and provided scalable solutions aligned with business goals.

• Utilized Azure Monitor and Log Analytics to proactively identify and resolve issues in data processing workflows, ensuring high availability.

• • Incorporated CI/CD pipelines for data pipeline deployments, reducing deployment time by 50% and maintaining consistency across environments.

• Developed comprehensive backup and recovery strategies for critical workflows, safeguarding data integrity and ensuring high availability during unexpected failures.

Projects

WIRELESS WOMEN'S SAFETY SYSTEM

• Developed an innovative safety solution to enhance women’s security, leveraging real-time GPS location tracking and captured images to send instant emergency alerts to designated guardians during critical situations.

• Integrated advanced hardware components, including Raspberry Pi 3 and a vibration sensor, to detect distress signals and trigger immediate wireless communication.

• Designed a robust alert mechanism that initiates calls and messages via LAN and GSM networks, ensuring reliable and realtime updates regardless of connectivity constraints.

ENTERPRISE DATA MODERNIZATION AND MIGRATION

• Orchestrated a large-scale migration of legacy databases to a modern cloud infrastructure, leveraging AWS DMS and AWS RDS to ensure seamless data transfer with zero data loss and minimal downtime.

• Engineered high-performance data pipelines using AWS Glue and AWS Kinesis for real-time and batch processing, enhancing data accessibility and operational efficiency across business-critical systems.

• Designed an optimized data warehouse architecture on AWS Redshift, reducing query execution times by 40% and supporting advanced analytics capabilities.

• Implemented robust security measures, including AWS Key Management Service (KMS) and IAM policies, to ensure compliance with regulatory standards and safeguard sensitive information.

• Created comprehensive technical documentation for database workflows and integration processes, enabling effective collaboration across cross-functional teams.

Education

University of California Master's

Computer Engineering

Sep 2022 - Dec 2023

Anna University Bachelor’s

Electronics and Communication Engineering Sep 2017 - Apr 2021

Contact this candidate