Azure data Engineer

Location:

New York City, NY

Salary:

$80,000/yr

Posted:

September 10, 2025

Contact this candidate

Resume:

SUMMARY

Results-driven Data Engineer with *+ years of experience designing, developing, and optimizing enterprise-scale data pipelines, cloud migrations, and data warehousing solutions in Azure, AWS, and GCP environments. Expert in Databricks, PySpark, SQL, Python (pandas, NumPy), and ETL/ELT workflows using Azure Data Factory, Talend, and SnowSQL/SnowPipe. Proven track record in OLTP/OLAP modelling, big data processing, and financial/banking data integration including securities, regulatory, and compliance reporting. Skilled in implementing data governance, security best practices, and performance tuning for large-scale analytics platforms. Holds multiple cloud certifications (Azure, AWS, Databricks) and excels at collaborating with business users and cross-functional teams to deliver actionable insights and scalable solutions.

TECHNICAL SKILLS

• Programming & Scripting: Python (pandas, NumPy), Scala, Java, SQL (T-SQL, PL/SQL), Shell Scripting

• Big Data & Analytics: Databricks, PySpark, Spark SQL, Spark Streaming, Hive, HBase, Kafka, Hadoop Ecosystem

• Cloud Platforms: Azure (ADF, Data Lake, Synapse, Databricks, Cosmos DB), AWS (S3, Redshift, DynamoDB, Lambda, Glue, EMR), GCP (BigQuery, DataProc, Composer)

• Data Warehousing: Snowflake (SnowSQL, SnowPipe), Redshift, Azure Synapse, Star & Snowflake Schema, OLTP & OLAP Modeling, Dimensions/Facts

• ETL/ELT Tools: Azure Data Factory, Talend, AWS Glue, DBT, Informatica, SSIS, Airflow

• Databases: Microsoft SQL Server, Oracle, MySQL, Teradata, MongoDB, Cassandra, DynamoDB

• Data Governance & Security: Access Control, Encryption, Audit Logging, Compliance Reporting

• DevOps & CI/CD: Jenkins, Git, Docker, Kubernetes, Terraform, Ansible, Maven, Control-M

• Monitoring & Logging: Kibana, Elasticsearch, CloudWatch, Logstash, Nagios

• Methodologies: Agile (Scrum), Waterfall, SDLC, Impala, SQL, Shell Scripting PROFESSIONAL EXPERIENCE

Homebridge Financial Services

Azure Data Engineer Sep 2023 – Present Iselin, New Jersey Technologies: Azure Data Factory, Azure Databricks, Azure Synapse, Azure SQL, Snowflake, Kafka, Spark

(SQL & Streaming), PySpark, Scala, Hive, Sqoop, Jenkins, Docker, Elasticsearch, Kibana, Oracle, MySQL, SQL Server, ADO.NET, Java, UNIX Shell Scripting, C#.

• Built multi-threaded Java ingestion jobs and Sqoop scripts to migrate data from FTP servers and Oracle to big data platforms.

• Designed and maintained scalable ETL workflows using Azure Data Factory, PySpark, and DBT to automate data processing from raw sources to Snowflake.

• Created Databricks workflows to extract data from SQL Server and securely transfer it to SFTP, optimizing transformation performance.

• Optimized Databricks jobs using caching, partitioning, and broadcast joins, reducing execution time by over 40%.

• Developed Snowflake pipelines leveraging SnowSQL scripts and SnowPipe for automated ingestion and transformation of incremental datasets.

• Implemented real-time data streaming pipelines using Kafka and Spark Streaming, enhancing data availability for downstream analytics.

• Implemented data governance policies including access control, encryption, and audit logging within Azure Databricks and Snowflake environments to ensure compliance with enterprise security standards.

• Established end-to-end monitoring with Azure Log Analytics and enabled alerts to support teams for better operational visibility.

• Built custom Kibana dashboards integrated with Elasticsearch and Logstash for real-time log analytics and troubleshooting.

• Managed CI/CD pipelines using Jenkins and Docker to automate deployments across multiple environments.

• Utilized C#, ADO.NET, and shell scripting for cross-platform data connectivity, job execution, and data validation tasks.

• Optimized Synapse loading with Azure Databricks integration and implemented dynamic partition switching to improve performance.

Sai Swaroop Morampudi

Data Engineer

Email: ****************@*****.*** LinkedIn: https://www.linkedin.com/in/saiswaroopmorampudi02 Mobile: +1-551-***-**** Place: New Jersey, USA

Catalent

AWS Data Engineer Dec 2022 – Aug 2023 Somerset, New Jersey Technologies: AWS (DynamoDB, Lambda, S3, CloudWatch), Spark, Scala, Python, Talend, Kafka, HBase, Hive, Hadoop, MapReduce, PIG, Databricks, Docker, Kubernetes, Jenkins, Terraform, GitHub, Jira, Elasticsearch, Kibana, MongoDB, Mesos, SQL.

• Integrated AWS DynamoDB with AWS Lambda to store item values and manage real-time backups via DynamoDB Streams.

• Designed and implemented ETL workflows using Talend, adhering to best practices for structured, semi- structured, and unstructured data pipelines.

• Developed and deployed Databricks ETL pipelines with Spark SQL, Python, and Data Frames to transform data for downstream consumption.

• Built Spark Streaming applications to process data in mini-batches, perform real-time transformations, and drive streaming analytics.

• Used Kafka for distributed messaging, managing partitioned feeds and real-time event data for streaming pipelines.

• Developed scalable analytics components using Scala, Apache Mesos, Spark, and implemented MapReduce jobs using PIG and Hive for data preprocessing.

• Led an enterprise-wide migration from an on-premise data warehouse to a cloud-based AWS analytics platform, overseeing architecture design, ETL reengineering, and performance optimization.

• Automated infrastructure provisioning and monitoring with Terraform and AWS tools, improving system uptime and reducing management overhead by 90%.

• Managed containerized environments using Docker Swarm and Kubernetes for deployment consistency and scalability.

• Led MongoDB data migration projects, ensuring accurate import/export operations and maintaining data integrity.

• Configured CI/CD pipelines using Jenkins, GitHub, Maven, and Chef, while monitoring infrastructure with CloudWatch, Nagios, and ELK Stack (Elasticsearch, Logstash, Kibana). Deutsche Bank

Application Developer Data Engineer Nov 2020 – Jul 2022 Mumbai, India Technologies: Azure Data Factory, Azure HDInsight, Azure Kubernetes Service (AKS), Azure SQL, Cosmos DB (SQL & Mongo APIs), GCP (BigQuery, Cloud Storage, DataProc, Composer), Spark (SQL & Streaming), Databricks, Hive, HBase, Kafka, Python, Pandas, MySQL, Oracle, Teradata, MongoDB, Jenkins, Git, Terraform, Ansible, T-SQL, Bash, SQL.

• Designed and developed scalable data ingestion pipelines using Azure Data Factory and Spark SQL on Azure HDInsight and GCP DataProc, integrating structured and unstructured data from multiple sources including Cosmos DB and GCP Cloud Storage.

• Built a custom ELT logging framework in ADF using Append Variables to enhance monitoring and debugging of pipeline executions.

• Developed Spark Streaming applications to process real-time Kafka messages and write transformed streams into HBase for low-latency analytics.

• Leveraged Databricks and Spark SQL for data extraction, transformation, and aggregation across Azure and GCP environments, optimizing large-scale data workflows and analytics.

• Managed and scheduled workloads on Azure Kubernetes Service, and utilized GCP Composer for orchestrating ETL workflows and job dependencies.

• Automated CI/CD pipelines using Jenkins, Git, Terraform, and Ansible, and built custom deployment scripts in Python and Bash to support cross-platform data engineering tasks.

• Led data migrations from RDBMS (Oracle, MySQL, Teradata) into NoSQL systems like MongoDB, while optimizing Hive queries, writing UDFs, and improving data performance by over 60%.

• Worked on data pipelines supporting regulatory and compliance reporting for banking operations, ensuring accuracy, timeliness, and adherence to financial regulations.

• Engineered data workflows for securities and banking products, integrating transaction, position, and reference data to support trading and risk analytics.

• Involved in the full project lifecycle—from gathering requirements to design, development, testing, deployment, and support—while implementing stored procedures, T-SQL triggers, and exception handling. Jio

Data Engineer May 2020 – Oct 2020 Mumbai, India

Technologies: Azure Data Factory, Azure Data Lake, Azure Logic Apps, Databricks, Spark (SQL & Streaming), PySpark, Hive, Kafka, Snowflake, Airflow, Docker, Terraform, Ansible, Jenkins, Pig, Oracle, SQL Server, Python, SQL.

• Designed and developed robust ETL pipelines using Azure Data Factory to ingest data from log files and business apps, process it via Databricks, and store it in Azure Data Lake.

• Designed OLTP-to-OLAP data integration workflows to support both operational reporting and advanced analytics.

• Built a reusable ETL framework to automate data migration from RDBMS systems to the Data Lake using Spark Data Sources and Hive objects.

• Developed and scheduled Airflow DAGs for ETL batch processing, enabling reliable data loading into Snowflake for enterprise analytics.

• Integrated Azure Logic Apps with ADF pipelines and HTTP triggers to automate batch workflows, enhancing system efficiency and reducing manual intervention.

• Engineered Spark Streaming jobs to consume and format real-time packet data from Kafka topics into JSON, then push data back to Kafka for downstream use.

• Created multiple Databricks Spark jobs with PySpark and Spark SQL to support complex table-to-table transformations, data profiling, and analysis.

• Implemented infrastructure monitoring and CI/CD automation using Jenkins, Docker, Ansible, and Terraform across development and production environments.

• Participated actively in all phases of SDLC and authored technical documentation for Hadoop cluster setup, Hive queries, Pig scripts, and front-end Python-based GUIs. EDUCATION

Master of Science in Data Science

Saint Peter's University, New Jersey, USA September 2022 – March 2024 CERTIFICATIONS

• Microsoft Certified: Azure Data Engineer Associate (DP-203) – 2024

• AWS Certified Data Analytics – Specialty – 2024

• Databricks Certified Data Engineer Associate – 2023

Contact this candidate