Post Job Free
Sign in

Data Engineer Processing

Location:
Somerset, NJ
Posted:
September 10, 2025

Contact this candidate

Resume:

Yash Chowdary

Data Engineer

Email: *.*****@*******.***

Ph: +1-201-***-****

LinkedIn: https://www.linkedin.com/in/yashchowdary/

Professional Experience

Data Engineer with 5+ years of experience designing scalable data architectures, optimizing cloud-based ETL workflows, and implementing high-performance data solutions. Specialized in Azure/GCP and AWS ecosystems, leveraging Databricks, ADF, Redshift, and Snowflake to drive real-time analytics, big data processing, and cloud migrations. Adept at building robust data pipelines, reducing query times, and enhancing system efficiency to support business intelligence and decision-making.

Summary:

• Rich experience in Azure, AWS, and GCP, and services like Azure Data Factory, AWS Glue, EMR, Lambda, and GCP BigQuery.

• Developed complex ETL pipelines using Apache Spark (PySpark/Scala), Snowflake, and Python on terabyte-scale datasets across multi-cloud platforms.

• Strong background in data warehousing and dimensional modeling (SCD2) with hands-on experience using Snowflake, Redshift, and Azure Synapse Analytics.

• Automated ML pipeline and CI/CD workflows using Terraform, Jenkins, Docker, Kubernetes, and Apache Airflow.

• Deployed real-time monitoring and logging with AWS CloudWatch, configuring alarms and logs on EC2, RDS, Lambda, and Glue to reduce downtime.

• Integrated different data sources like Oracle and APIs and used Pandas, SQL, and Python for data extraction, transformation, loading, and analysis.

• Created and managed Power BI dashboards and semantic models; skilled in Tableau and AWS QuickSight in displaying insights through interactive visualizations.

• Hands-on experience with SnowSQL, Snowpipe, streams, tasks, time travel, and performance tuning; managed Snowflake advanced features like metadata management and data sharing.

• Working knowledge of REST API integrations and real-time data processing with Kafka, Spark Streaming, Cloud Pub/Sub, and Cloud Dataflow.

• Contributed to Infrastructure-as-Code using Terraform and CloudFormation, and ensured deployments and version control using Git in agile team environments.

Technical Skills:

• Operating Systems: Sun Solaris 11, 10, 9, 8, 7, Red Hat Linux 7.x

• Monitoring tools: Appdymanics and Kibana.

• ETL Tools: ADF,ADB,DBT,Airflow,Dataproc,Dataflow, AWS Glue, RedShift, Ab Initio GDE

• Provisioning Tools: Terraform, CloudFormation, ARM Template

• Database: DB2(Z'OS), Oracle (11g,10g), SQL Server 7.0, MySQL and MS Access (using VBA).

• Configuration Management: Ant,Ansible,chef

• Public Cloud: AWS, Azure and GCP Cloud, On prim, HPE

• Test and Build Systems: Jenkins, Maven.

• Scrum tool: Agile, waterfall, DevOps

• Design tools: UML (Use cases, Class Diagrams, Sequence Diagrams)

• Big-Data Technologies: Hadoop, Apache Spark, Hive, Scala, HDFS, PySpark.

• BI Tools: SQL Server Management Studio, SQL Assistance

• Versioning tools: SVN, Git, GitHub

• ETL Tools: Informatica, Pentaho, Denodo

• Monitoring Tool: Apache Airflow, redwood

• Visualization/ Reporting: Tableau, and Power BI

• SDLC Process: Agile Software Development with Scrum, Waterfall methodology

Certification: Microsoft Certified: Azure Data Fundamentals

Certification: Microsoft Certified: Fabric Data Engineer Associate DP-700

Education details:

Master’s Degree: Computer Science, University of Central Missouri. Bachelor’s Degree: Electronics and Communication Engineering, GITAM. Projects Summary

Client: Intrust Bank, Wichita, KS, Aug 2024 to Till Now

Role: Senior Data Engineer

Roles and Responsibilities

• Designed and optimized data models, ensuring efficient storage, retrieval, and processing for high-performance analytics.

• Designed and implemented massive-scale data conversion pipelines using PySpark on datasets larger than 100M records.

• Developed scalable ETL pipelines with Azure Data Factory (ADF) to load and transform data from various sources.

• Used Azure Synapse Analytics for creating and optimizing enterprise-level data warehousing solutions.

• Processed and stored big data using Azure Data Lake Storage (ADLS) with security access controls.

• Set up Azure Blob Storage for storing semi-structured and unstructured datasets for analysis and retrieval.

• Used AWS S3 buckets to store raw and transformed datasets with lifecycle policies.

• Implemented serverless data pipelines using AWS Lambda and AWS Glue for cost-efficient automation.

• Actively participated in daily stand-ups, sprint planning, and retrospectives as part of a cross-functional Agile team.

• Query large datasets using Amazon Redshift for business intelligence and analytics.

• Conducted ad-hoc analysis with Google BigQuery for fast exploration of large-scale datasets.

• Enforced IAM roles and policies across cloud platforms for safe access to data and services.

• Enhanced Spark operations using partitioning, caching, and broadcast variables to cut down execution time.

• Utilized Azure Monitor and CloudWatch for performance monitoring, alerting, and debugging of pipelines.

• Created Linked service to land the data from different sources to Azure Data Factory.

• Created advanced SQL scripts for data transformation, validation, and loading in data warehouse environments.

• Created and operated ETL/ELT pipelines to automate data ingest and enrichment workflows.

• Employed SSIS for legacy data movement and migration operations involving SQL Server and flat files.

• Designed relational and dimensional data structures (Star/Snowflake schema) for BI reporting.

• Developed Python-based scripts for data cleansing, transformation, and automation.

• Incorporated third-party APIs and flat files into ETL processes for regular ingestion.

• Improved pipeline performance using partitioning, parallelism, and indexing techniques.

• Integrated Spark with Azure Data Lake and Blob Storage to handle batch and stream data in a distributed setting.

• Implemented data quality checks and validation rules for clean and accurate datasets.

• Developed maintainable, reusable parameterized pipelines in ADF and modular Python code.

• Managed high-volume data intake with parallel pipelines, with logging and retry properly implemented.

• Created interactive dashboards and KPIs with Power BI and Tableau from cloud warehouse data.

• Utilized Git for versioning and engaged in code review to ensure code quality in a collaborative environment.

• Deployed and tracked pipelines using CI/CD tools such as Azure DevOps and GitHub Actions.

• Ensured accurate documentation of all data flows, schemas, and transformations for audit purposes.

• Worked with cross-functional groups and stakeholders to collect requirements and provide data-driven solutions.

Environment: Python, PySpark, Azure SQL DB, Azure Data Factory, Azure Databricks, Azure Data Lake, Hadoop, Spark-SQL, Kafka, Cassandra, Apache Airflow, MongoDB, MapReduce, AWS

Client : Ajackus, India, Sep 2019 to Nov 2022

Role: Data Engineer

Roles and Responsibilities

• Developed Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple data file formats.

• Developed and modified ETL processes and workflows that load data for the data warehouse using ETL tools(transitioning from SSIS to Azure integration technologies) and SQL code.

• Built and optimized data pipelines using Apache Spark in big data systems.

• Built and maintained event streaming using Kafka for real-time data intake and processing.

• Processed and analyzed big data using Hadoop and its ecosystem (HDFS, Hive, HBase).

• Encoded complex SQL queries, stored procedures, and views for data transformation.

• Built ETL jobs using Scala, Python, and Java for large-scale data processing.

• Designed batch and streaming ETL pipelines with Apache Airflow for scheduled tasks.

• Designed scalable data storage infrastructure on HDFS, Hive, HBase, and Amazon S3.

• Applies IAM, access control, and data governance in AWS/Azure worlds.

• Applied Docker and Kubernetes for container-based data processing workloads.

• Performed database testing on data warehouse by using complex SQL queries in SQL Server to validate data.

• Resolved serverless/cloud-native ETL enablement on AWS Glue or Azure Data Factory.

• Implemented continuous integration and delivery (CI/CD) principles to automate deployment of data pipelines in Agile releases.

• Tuned Spark jobs and cluster performance for high throughput and low latency.

• Confirmed data validation, dependability, and integrity throughout pipelines.

• Enforced monitoring and alerting using AWS CloudWatch or Azure Monitor.

• Collaborated with stakeholders to gather and map business requirements to data solutions.

• Developed dimensional data models and data marts (snowflake schema, star schema).

• Create and maintain optimal data pipeline architecture in Microsoft Azure Cloud using Data Factory and Azure Databricks.

• Improved cloud cost and efficiency through the reduction of storage and compute.

• Automated deployment of ETL through CI/CD pipelines (GitHub Actions, Azure DevOps).

• Established metadata catalogs and documented lineage for data governance.

• Experience in working with NoSQL databases (HBase, Cassandra) for high concurrency and flexible schema

• Developed reusable data transformation modules and libraries in Python/Scala.

• Migrated on-premise systems to cloud-based data architecture (AWS, Azure, GCP).

• Performed ad-hoc analytics using BigQuery, Redshift, or Synapse.

• Performed design/code reviews to validate best practices in data engineering.

• Taught junior engineers Spark, Kafka, and cloud data solutions.

• Remain actively knowledgeable about the newest data engineering tech (e.g., Databricks certification).

Environment: SQL, Python, Hadoop, MapReduce, Hive, Sqoop, Kafka, Oozie, Scala, AWS Glue, Redshift, S3, EC2, IAM, EMR, Spark, Azure Data Factory, ADB, Logic Apps, Blob

Client: Monocept, India, May 2018 to July 2019

Role: Hadoop Developer

Roles and Responsibilities

• Integrated Snowflake into the data architecture, improving BI reporting and real-time analytics.

• Designed and implemented end-to-end ETL/ELT pipelines using Azure Data Factory, AWS Glue, and Apache Airflow for automating ingestion and transformation of unstructured and structured data.

• Designing and implementing a fully operational production-grade large-scale data solution on Snowflake Data Warehouse.

• Designed reusable Spark modules for ETL pipelines, which are modular and maintainable across projects.

• Built and queried large-scale data warehouses using Azure Synapse Analytics, Amazon Redshift, and Google BigQuery.

• managed data storage in Azure Data Lake, Blob Storage, and AWS S3, along with corresponding backups and lifecycle policies.

• Developed data transformation logic in Python, Scala, and SQL with performance, modularity, and code reusability.

• Utilized star and snowflake schema modeling for reporting and, additionally, optimized pipeline performance through indexing and partitioning.

• Enforced data validation, quality checks, and error-handling through automated scripts and logging with retry mechanisms.

• Used third-party APIs, batch feeds, and real-time streams integration in cloud data ecosystems.

• Used Jira and Confluence for sprint progress tracking, technical solution documentation, and business objective alignment.

• Used Spark SQL for data discovery, ad-hoc queries, and data preparation for subsequent analytics.

• Used Cloud IAM, RBAC, and governance in AWS and Azure environments for secure access management.

• Used Azure Monitor and AWS CloudWatch for proactive monitoring of pipelines, logging, and alerting.

• Created and shared Power BI/Tableau dashboards for cloud data sources that delivered business insights.

• Deployed version control with Git, and published pipelines with CI/CD tools like Azure DevOps and GitHub Actions.

• Collaborated with cross-functional teams in transforming business requirements into cloud-native, scalable data solutions using Docker and Kubernetes as applicable.

Environment: GCP bigquery, dataflow, storage bucket, cloud run, Jenkins, Power BI, Snowflake, Spark, Hadoop, Apache Spark, EMR, SQL



Contact this candidate