Data Engineer Engineering

Location:

Orlando, FL

Posted:

September 14, 2024

Contact this candidate

Resume:

SAI NALINI

DATA ENGINEER

Email: ***********@*****.*** 316-***-**** KS

SUMMARY

4+ years of experience in data engineering, focusing on designing and managing scalable data pipelines, optimizing ETL processes, and working with large datasets in healthcare and finance sectors.

Proficient in using Python, SQL, PySpark, Spark SQL, Azure Databricks, Apache Airflow, Azure Data Factory, Snowflake, and Kafka for data engineering tasks.

Experienced in cloud platforms like Microsoft Azure and AWS, with practical skills in real-time data streaming and processing, as well as managing data lake architectures.

Skilled in developing ETL pipelines, implementing data lake solutions, and utilizing data visualization tools such as Power BI and Tableau to generate insights and support decision-making.

Competent in working with databases like SQL Server and Snowflake for data management, including data cleaning, validation, and optimization.

EDUCATION

Master of science in computer science

Wichita State University, Wichita, KS

Bachelor in Electronics and Communication Engineering JNTU College, Andhra Pradesh, India

TECHNICAL SKILLS

Big Data Technologies: Hadoop, Spark, Hive, Kafka, Snowflake, Airflow Cloud Platforms: Azure, AWS

Languages: Python, SQL, Java8, Shell Scripting

Data warehousing: Snowflake, Azure Data Warehouse

Frameworks: React JS, Spring Boot, Microservices API Databases: MySQL, CosmosDB, MongoDB, PostgreSQL

Visualization Tools: Tableau, Power BI

Methodologies: SDLC, Agile, Waterfall

Tools and others: JIRA, Git, Docker, Kubernetes, AWS Glue, AWS S3 WORK EXPERIENCE

CVS Health, USA Data Engineer Jan 2024 – Current

Worked extensively with Agile development methodologies, managing application iterations for efficient project delivery.

Developed and optimized PySpark Data Frames in Azure Databricks to process and transform data from Data Lake or Blob storage.

Created and deployed high-performance ETL pipelines using PySpark and Azure Data Factory, enhancing data processing efficiency.

Utilized Azure Event Hubs and Apache Kafka for real-time data streaming, increasing data freshness by 30%.

Optimized ETL pipelines and data workflows, achieving over a 25% reduction in processing time.

Instituted a data lake architecture with CosmosDB, improving data storage and query efficiency, reducing management time by 35%.

Managed data drift and schema evolution challenges with StreamSets, ensuring seamless data integration.

Scheduled and orchestrated ETL processes using Apache Airflow, creating, and managing Directed Acyclic Graphs (DAGs).

Collaborated with Quality Engineering teams to design and execute comprehensive testing strategies, ensuring data accuracy and reliability.

Ensured data privacy and compliance with HIPAA guidelines, implementing best practices for data anonymization and access control.

Led the implementation and optimization of data workflows in Palantir Foundry, improving data processing efficiency and accuracy.

Developed interactive dashboards and visualizations with Tableau, providing actionable insights and enhancing decision-making processes.

Vaayuja info Solution, India Data Engineer May 2019– Aug 2022

Engaged in the analysis, design, and development phases of the Software Development Lifecycle (SDLC) within an agile environment, utilizing JIRA and GitHub for project management and version control.

Developed cloud-based data pipelines and Spark applications on AWS, using AWS S3 for data staging and Redshift for data migration.

Designed and implemented end-to-end data pipelines with StreamSets for efficient data ingestion and transformation.

Employed Spark Streaming for preprocessing streaming data and developed Spark applications for data validation, cleansing, transformation, and custom aggregation. Utilized Spark SQL for in-depth data analysis.

Developed REST APIs using Python with Flask and Django frameworks to integrate data from various sources.

Implemented and maintained Apache Airflow DAGs for orchestrating ETL processes, leading to streamlined and automated data pipelines.

Optimized big data workflows using Hadoop, including MapReduce and HDFS, for efficient data processing and storage.

Contact this candidate