Post Job Free
Sign in

Data Engineer,Data Analyst

Location:
Cumming, GA
Salary:
65000
Posted:
May 29, 2025

Contact this candidate

Resume:

Puja Lahari Sajja

Data Engineer

404-***-**** ****************@*****.***

ABOUT ME

Experienced Data Engineer with 5+ years of expertise in designing, building, and optimizing data pipelines and data architectures across multiple platforms including AWS, and Azure. Adept at working with large datasets, utilizing SQL, Python, Hadoop, and data warehousing solutions to deliver actionable insights.

PROFESSIONAL SUMMARY

Over 5+ years of experience in building and optimizing data pipelines, working with large datasets across diverse platforms including AWS, and Azure.

Expertise in Python, SQL, Scala, and Pyspark for data processing, analysis, and automation tasks.

Proficient in working with databases such as MySQL, PostgreSQL, MongoDB, Cassandra, HBase, and Amazon Redshift.

Extensive experience in data warehousing and analytics platforms like Snowflake, Azure Synapse Analytics, and Azure Databricks.

Deep understanding of distributed computing frameworks like Hadoop, HDFS, MapReduce, Pig, and Hive.

Skilled in utilizing ETL processes and tools like Sqoop, Impala, Flume, and Kafka for data ingestion, transformation, and real-time processing.

Hands-on experience with Zookeeper for managing and coordinating distributed systems.

Expertise in working with Azure Data Lake, EMR, and S3 for cloud storage and big data management.

Strong background in CI/CD practices, leveraging GIT, Maven, and automation tools to streamline development and deployment processes and Familiarity with orchestration tools such as Airflow for scheduling and managing data workflows.

Proficient in creating and optimizing data pipelines for large-scale data processing, ensuring high performance, reliability, and scalability.

Proficient in leveraging Azure Data Lake, Azure Synapse Analytics, Azure Databricks, and Azure SQL Database for building scalable data solutions, performing advanced analytics, and optimizing ETL processes in the cloud environment.

Extensive experience working with AWS services such as Amazon Redshift, EMR, S3, AWS Lambda, and AWS Glue to design and implement scalable data pipelines, manage big data workloads, and optimize cloud-based data processing.

Expertise in using AWS Lambda to automate data workflows, trigger serverless processing, and integrate real-time data streams, enhancing efficiency and reducing operational overhead in cloud environments.

Skilled in using Power BI and Tableau for data visualization, creating interactive dashboards and reports to transform complex datasets into actionable insights

Experienced in using Jenkins, Git, and Terraform for continuous integration and deployment (CI/CD), automating infrastructure provisioning, version control, and streamlining deployment processes and Proficient in utilizing ETL technologies such as Apache NiFi, Talend, and AWS Glue to design, develop, and optimize data pipelines.

TECHNICAL SKILLS

Cloud Platforms: AWS (Redshift, S3, EMR, Lambda), Azure (Data Lake, Synapse, Databricks)

ETL Tools: AWS Glue, Apache NiFi, Talend, Sqoop, Flume

Data Warehousing: Snowflake, Amazon Redshift, Azure Synapse Analytics

Orchestration & Scheduling: Apache Airflow, Oozie

Programming Languages: Python, SQL, Scala, Pyspark

Databases: MySQL, PostgreSQL, MongoDB, Cassandra, HBase, Amazon Redshift, Snowflake

Big Data Technologies: Hadoop, HDFS, Hive, Pig, MapReduce, Spark

DevOps & CI/CD: Jenkins, Git, Maven, Terraform

Data Visualization: Power BI, Tableau

Containerization: Docker, Kubernetes

Version Control: Git, GitHub, GitLab

Operating Systems: Linux, Windows

WORK EXPERIENCE

Data Engineer/Analyst US Bank Feb 2025 – Present

Collaborated with business analysts, SMEs, and developers to gather system requirements and process flows for ETL and control limit automation needs.

Designed scalable ETL solutions using Informatica PowerCenter and PowerExchange to support data extraction, transformation, and loading across enterprise systems.

Analyzed complex Informatica ETL mappings, shell scripts, and SQL stored procedures to create detailed technical documentation, including data lineage and transformation logic.

Developed and maintained Python scripts for control limit checks, automating validations and monitoring data quality across pipelines.

Produced Product Requirement Documents (PRDs) and pseudo-code for rewriting legacy COBOL programs into Spark SQL/SQL.

Delivered high-quality user guides, system manuals, and online help documentation tailored for both technical and business audiences.

Participated in Agile ceremonies, collaborated using Jira and Confluence, and strictly adhered to documentation process guidelines.

Developed and enhanced Informatica mappings, sessions, workflows, and worklets to implement complex data transformations, aggregations, and CDC (Change Data Capture) logic.

Performed unit testing and system integration testing of ETL mappings and Python scripts; validated data lineage, transformation logic, and ensured performance optimization.

Authored comprehensive technical documentation, including data flow diagrams, transformation rules, control validation logic, and process manuals for internal users.

Monitored ETL job runs, debugged production issues, fine-tuned performance bottlenecks, and provided post-deployment support to ensure smooth operations.

Environment: Informatica PowerCenter, Informatica PowerExchange, SQL Server, Unix Shell Scripting, Python, Jira, Confluence, Agile (Scrum/Kanban), MS Office (Word, Visio).

Azure Data Engineer Tallahassee Memorial Healthcare, FL(Remote) May 2024 – Jan 2025

Design and implement data pipelines using Azure Data Factory to orchestrate ETL processes, ensuring seamless data flow across systems and develop and maintain Azure SQL Databases to support operational reporting and business intelligence requirements, ensuring high availability and performance.

Optimize data storage and retrieval using Azure Blob Storage and Data Lake, ensuring cost-effective, scalable solutions and Design and manage data models in Azure Synapse Analytics, integrating data from multiple sources to support analytical workloads.

Integrate machine learning models and predictive analytics using Azure Machine Learning, collaborating with data scientists for advanced forecasting and insights and design data solutions that meet business needs, using Azure DevOps for version control and collaboration.

Automate data workflows and continuous integration/continuous deployment (CI/CD) pipelines using Azure DevOps and Git, ensuring efficient deployment processes.

Utilize Power BI for data visualization and reporting, delivering actionable insights to business leaders and decision-makers and develop and maintain data processing scripts using Python and Azure Functions to automate ETL workflows and integrate data across various Azure services.

Design and optimize complex queries and data models using SQL, Scala, MySQL, PostgreSQL, MongoDB, and Cassandra to support data processing, reporting, and analytics across multiple platforms.

Implement and manage large-scale data processing workflows using Hadoop technologies, including HDFS, MapReduce, and Hive, to efficiently store and analyses big data in a distributed environment.

Manage version control and automate deployment pipelines using Git, Maven, Jenkins, and DevOps practices to ensure continuous integration and continuous delivery (CI/CD).

Design and implement scalable ETL pipelines using Azure Data Factory, Apache Nifi, and Talend to extract, transform, and load data from various sources into cloud-based data warehouses for analytics.

Manage and optimize distributed data processing using Impala for high-performance SQL queries and Zookeeper for coordinating and managing the configuration of distributed systems in a big data environment.

Design and implement data warehousing solutions using Snowflake to store and analyse large volumes of data.

ENVIRONMENT: Azure Data Factory, ETL, Azure SQL Databases, Data Lake, Azure Synapse Analytics, Azure Machine Learning, Azure DevOps, CI/CD, Git, Power BI, SQL, Scala, PostgreSQL, MongoDB, Cassandra, MapReduce, Hive, Maven, Jenkins, Apache Nifi, Talend, Impala, Zookeeper, Snowflake.

Application Data Engineer Kennesaw State University, GA(USA) February 2023 – May 2024

Designed and developed ETL pipelines for seamless data integration and processing using AWS Glue and AWS Lambda and Optimized large-scale data pipelines for high availability and performance in Amazon Redshift and Amazon S3.

Implemented data ingestion frameworks leveraging Apache Spark and Amazon Kinesis for real-time and batch processing and Automated infrastructure deployment using AWS CloudFormation and Terraform, enabling scalable environments.

Created and maintained RESTful APIs for data access and transformation through AWS API Gateway and AWS Lambda.

Monitored and fine-tuned data workflows using AWS CloudWatch, AWS Step Functions, and AWS Data Pipeline.

Deployed serverless architectures for data processing and analytics workflows using AWS Lambda and Amazon DynamoDB.

Collaborated with cross-functional teams to integrate Tableau and Power BI for business intelligence reporting and Leveraged Python and SQL for scripting, data transformation, and advanced analytics tasks.

Implemented CI/CD pipelines for data workflows using Jenkins, GitHub Actions, and AWS CodePipeline and Ensured disaster recovery and backup strategies with AWS Backup and Amazon Glacier.

Designed and implemented robust ETL workflows using AWS Glue, Informatica, and Talend to automate data extraction, transformation, and loading across diverse sources.

Managed distributed systems coordination with Apache Zookeeper and integrated data visualization tools like Looker to deliver actionable insights

Automated and scheduled workflows with Apache Airflow, ensuring seamless orchestration of ETL and data integration processes and developed and optimized data processing pipelines using Pyspark on AWS EMR to handle large-scale datasets efficiently.

Designed and implemented scalable data storage solutions using Amazon S3 for both raw and processed data and Leveraged Snowflake for cloud-based data warehousing, ensuring high performance and scalability for analytical workloads.

ENVIRONMENT: ETL pipelines, AWS Glue, AWS Lambda, Amazon Redshift, DynamoDB, Tableau, Power BI, Python, SQL, CI/CD, Jenkins, GitHub, Amazon Glacier, Hadoop, HDFS, Hive, AWS EMR, Informatica, Talend, Apache Zookeeper, Pyspark, Snowflake.

Data Engineer/ AWS Data Engineer Virtusa, India November 2020 – December 2022

Design, develop, and maintain scalable ETL pipelines to process large datasets efficiently from multiple sources in the logistics domain and Utilize AWS services such as S3, Redshift, Lambda, Glue, and Kinesis for data storage, transformation, and streaming.

Implement data models for inventory tracking, shipment monitoring, and supply chain analytics using SQL and NoSQL databases like PostgreSQL and DynamoDB and Develop automation scripts using Python, Bash, and SQL for efficient data extraction, transformation, and loading (ETL).

Design and implement real-time data processing solutions using Apache Kafka and Apache Spark for logistics data streaming and analytics and Ensure data integrity and consistency by applying data validation techniques and implementing data pipelines with Airflow.

Build and maintain reporting solutions using Tableau and Power BI for visualizing key logistics KPIs such as on-time delivery and transportation costs and Work with CI/CD pipelines for continuous integration and delivery of data engineering solutions using Jenkins and GitLab.

Implement and manage Hadoop ecosystems for distributed storage and processing of large-scale logistics data, utilizing HDFS, MapReduce, and YARN for efficient data handling and analysis.

Design and implement ETL processes using tools like Apache Nifi, Talend, and AWS Glue to extract, transform, and load large volumes of logistics data into data warehouses for downstream analytics.

Utilize Apache Kafka, Impala, Zookeeper, and Flume to enable real-time data streaming, manage distributed systems, and efficiently ingest and process large volumes of logistics data across the Hadoop ecosystem.

ENVIRONMENT: S3, Redshift, Lambda, Glue, Kinesis, PostgreSQL, DynamoDB, Python, Bash, SQL, Apache Kafka, Apache Spark, Airflow, Tableau, Power BI, CI/CD, Jenkins, GitLab, Hadoop, Apache Kafka, Impala, Zookeeper, Flume, Snowflake, TensorFlow, scikit-learn, AWS Sage Maker

Data Engineering Intern Virtusa, India November 2019 – October 2020

Assisted in designing and developing scalable ETL pipelines to process logistics datasets efficiently.

Supported the use of AWS services such as S3 and Redshift for data storage and transformation tasks.

Contributed to building and maintaining reporting solutions using Tableau for visualizing logistics KPIs.

Developed automation scripts in Python and SQL for small-scale data extraction, transformation, and loading (ETL).

Gained hands-on experience with Apache Kafka and Spark for real-time data processing under senior engineers’ guidance.

Documented and tested data validation processes to ensure consistency and accuracy.

EDUCATION & CERTIFICATIONS

Master of Science in Computer Science -- Kennesaw State University, Marietta, GA, USA

Bachelor of Technology in Computer Science and Engineering – JNTUK, India

AWS Certified Cloud Practitioner



Contact this candidate