Data Engineer with 6+ Years Cloud Data Pipelines Expertise

Location:

Irving, TX

Posted:

November 28, 2025

Contact this candidate

Resume:

+1-202-***-****

********.********@*****.***

OBJECTIVE

As an experienced Data Engineer with 6+ years of expertise in designing, building, and optimizing data pipelines and architectures across AWS, Azure, and GCP, I seek to apply my skills in cloud-based data engineering to drive data-driven decision-making.

SUMMARY

• Around 6+ Years of professional experience in full life cycle system development involving analysis, design development, testing, documentation, implementation & maintenance of application software in Web-based and Client/Server environment.

• Expertise in building batch and real-time data processing systems leveraging AWS services (S3, Red- shift, EMR, Lambda, DynamoDB) and Azure services (ADLS, ADF, Databricks, Synapse Analytics).

• Experience using various Hadoop Distributions (Cloudera, MapR, Hortonworks, Azure, to fully implement and leverage new Hadoop features. Hands-on experience with Spark, Databricks, and Delta Lake.

• Expert in creating various Kafka producers and consumers for seamless data streaming with AWS services.

• Good experience working with PyTest, PyMock, Selenium web driver frameworks for testing different front end and backend components.

• Experience in Windows Azure Services like PaaS, IaaS and worked on storages like Blob (Page and Block), SQL.

• Well experienced in deployment & configuration management and Virtualization. Experience in Software Development Life Cycle including requirement gathering, designing, programming, testing.

• Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform ser- vices (Azure Data Lake, Azure Data Factory, Data Lake Analytics, and Stream Analytics).

• Experience in data integration and building data pipelines in Airflow using Python scripting.

• Experience in migrating on premise to Windows Azure using Azure Site Recovery and Azure back- ups.

• Deployed Dockers Engines in Virtualized Platforms for containerization of multiple apps.

• Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.

• In-depth knowledge of Data Sharing in Snowflake and in Snowflake Database, Schema and Table structure.

• Proficient in building CI/CD pipelines in Jenkins using pipeline syntax and groovy libraries.

• Designed and implemented a Scalable data architecture on AWS using Kubernetes, Terraform, and Snowflake, enabling seamless data integration and processing across multiple data sources. TECHNICAL SKILLS

Skill Area Key Skills

Data Storage Data Lakes, Data Warehousing, SQL & NoSQL Databases, Cloud Storage Data Processing ETL/ELT Development, Data Pipelines, Batch & Stream Processing, Data Transformation Big Data Technologies Hadoop, Spark, Distributed Computing, Large-scale Data Processing Cloud Platforms Azure, AWS, Google Cloud Platform (GCP) Programming Languages Python, SQL, Java, Scala

ETL/ELT Tools Apache NiFi, Informatica, Talend, Apache Airflow, Azure Data Factory, AWS Glue Analytics & Reporting Data Visualization, BI Tools (Tableau, Power BI), Descriptive & Predictive Analytics Orchestration Workflow Automation, Job Scheduling, Data Pipeline Orchestration Machine Learning Model Deployment, Feature Engineering, Integration with Data Pipelines Sumedha Mannem

Data Engineer

DevOps & CI/CD Version Control (Git), Continuous Integration/Continuous Deployment (CI/CD), Infrastructure as Code (IaC)

Networking Data Integration, API Management, Cloud Networking, Data Migration Data Quality Data Profiling, Data Cleansing, Data Validation, Data Lineage Web/Application server: Apache Tomcat, WebLogic, WebSphere Version controls and Tools GIT, Maven, SBT, CBT

Data Warehousing OLAP, Dimensional Modeling, Schema Design, Data Marts WORK HISTORY

Kroger, USA July 2024 - Present

Azure Data Engineer

• Looked into existing Java/Scala spark processing and maintained, enhanced the jobs.

• Design, development and implementation of performant ETL pipelines using Python API (pyspark)

• Interacted with business partners, Business Analysts and product owner to understand requirements and build scalable distributed data solutions using Hadoop ecosystem.

• Monitored Spark cluster using Log Analytics and Ambari Web Ul.

• Transitioned log storage from Cassandra to Azure SQL Datawarehouse and improved the query performance.

• Configured Spark streaming to get ongoing information from Kafka and store the stream information to DBFS.

• Created Databricks Job workflows which extracts data from SQL Server and upload the files to SFTP using PySpark and Python.

• Created Data tables utilizing PyQt to display customer and policy information and add, delete, update customer records.

• Developed custom UDFs in Python to be used in spark-SQL.

• Designed and implemented Infrastructure as code using Terraform, enabling automated provisioning and scaling of cloud resources on Azure.

• Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application.

• Implemented Synapse Integration with Azure Databricks notebooks which reduce about half of development work. And achieved performance improvement on Synapse loading by implementing a dynamic partition switch.

• Used Jira for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.

• Implemented Azure data lake, Azure Data Factory and Azure databricks to move and conform the data from on

- premises to cloud to serve the analytical needs of the company.

• Dockerized applications by creating Docker images from Docker file, collaborated with development support team to setup a continuous deployment environment using Docker.

• Developed Kibana Dashboards based on the Log stash data and Integrated different source and target systems into Elasticsearch for near real time log analysis of monitoring End to End transactions.

• Used Azure Synapse Analytics, Azure DataLake Storage, and Azure Data Lake Analytics services to query large amounts of data stored on Azure Data Lake Storage to create a Virtual Data Lake without having to go through ETL processes.

• Built PySpark code in validate the data from raw source to Snowflake tables.

• Worked on CI/CD tools like Jenkins, Docker in DevOps Team for setting up application process from end-to-end using Deployment for lower environments and Delivery for higher environments by using approvals in between.

• Analyzed data using Hadoop components Hive and Pig. Utilized Azure Logic Apps to build workflows to schedule and automate batch jobs by integrating apps, ADF pipelines, and other services like HTTP requests, email trigger Accenture Solutions Pvt. Ltd, India July 2021 – July 2023 GCP Data Engineer

• Creating Data Studio report to review billing and usage of services to optimize the queries and contribute in cost saving measures.

• Using Dataproc, Big Query to develop and maintain GCP cloud base solutions.

• Experienced in GCP features which include Google Compute engine, Google Storage, VPC, Cloud Load balancing, IAM.

• Processed the data efficiently with in Azure Databricks and visualizing insights through Tableau dashboards. Good knowledge in using Cloud Shell for various tasks and deploying services.

• Worked on Spark to improve the performance and optimization of the existing algorithms in data bricks using Spark con- text, Spark, SQL, Data Frame, pair RDD's.

• Processed the schema oriented and non-schema-oriented data using Scala and Spark.

• Expertise in Business intelligence and Data Visualization tools like Tableau. Used Tableau to connect to various sources and build graphs.

• Experienced in Google Cloud components, Google container builders and GCP client libraries and Cloud SDK'S.

• Worked on SQL and PL/SQL for backend data transactions and validations.

• Coordinated with team and Developed framework to generate Daily adhoc reports and Extracts from enterprise data from BigQuery.

• Worked with GCP cloud using GCP Cloud storage, Data-proc, Data Flow, Big-Query.

• Build a program with Python and Apache Beam and execute it in Cloud Dataflow to run Data validation between raw source file and BigQuery tables.

• Build data pipelines in Airflow/Composer for orchestrating ETL related jobs using different airflow operators. Accenture Solutions Pvt. Ltd, India Aug 2018 – June 2021 AWS Data Engineer

• The AWS Lambda functions were written in Spark with functional dependencies that generated custom libraries for delivering the Lambda function in the cloud.

• Involved in creating Hive tables with Dynamic Partitions and Buckets, loading various formats of data like Avro, Parquet into the tables and analyzed data using HiveQL.

• Staged the API and Kafka Data (in JSON file format) into Snowflake DB by flattening the same for different functional services.

• Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, denormalization and data preparation needed for machine learning and reporting teams to consume.

• Provisioned high availability of AWS EC2 instances, migrated legacy systems to AWS, and developed Terraform plugins, modules, and templates for automating AWS infrastructure.

• Developing scalable and reusable database processes and integrating them. Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow for ETL batch processing to load into Snowflake for analytical processes.

• Involved in various phases of Software Development Lifecycle (SDLC) of the application, like gathering requirements, design, development, deployment, and analysis of the application.

• Developed a fully automated continuous integration system using Git, Jenkins, MySQL and custom tools developed in Python and Bash.

• Migrated the data from Amazon Redshift data warehouse to Snowflake for further financial reporting.

• Created and triggered automated builds and Continuous Deployments (CI/CD) using Jenkins/looper and OneOps cloud.

• Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary Transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.

Education

The George Washington University, Master’s in Computer Science Washington, DC August 2023 - May 2025 GITAM University, Bachelor’s in Computer Engineering Visakhapatnam, India July 2014 – May 2018

Contact this candidate