Post Job Free

Resume

Sign in

Data Engineer Big

Location:
Atlanta, GA
Posted:
January 26, 2024

Contact this candidate

Resume:

Divya Nimmala

989-***-**** ad24o0@r.postjobfree.com

Professional Summary

Around 8 years of experience in Analysis, Design, Development, and Implementation as a Data Engineer

Hands on experience in Databricks and PySpark to design and optimize data pipelines for efficient ETL processes

Adept at designing and managing complex data systems in Python

Experience in various Azure cloud services such as Data Factory, ADLS, Synapse, Cosmos DB, Key Vault, Event Hubs, Logic Apps

Expert at building efficient data pipelines, ensuring data accuracy, and enabling real-time analytics. Dedicated to staying up to date on the latest Snowflake features for improved insights and decision-making

Skilled in utilizing Snowflake's distinct capabilities for handling diverse data types, real-time analytics, and efficient query processing

Experience in AWS cloud services such as EC2, S3, RDS, IAM, Redshift, Glue, Lambda, Kinesis, CloudWatch, CDK, Athena and EMR

Knowledge in various GCP services including BigQuery, Pub/Sub

Strong data engineering experience, focusing on the development of ETL/ELT pipelines for both batch and streaming data utilizing PySpark and SparkSQL

Hands on experience in Bigdata project implementations with Hadoop, Spark, PySpark, Hive

Experience in creating Real-time scalable data pipelines using Kafka

Experience in technical consulting and end-to-end delivery with architecture, data modeling, design, development, data governance and implementation of solutions

Experience and knowledge of NoSQL databases such as MongoDB and Cosmos DB

Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing

Experience in Big Data Ecosystem in ingestion, storage, querying, processing, and analysis of Big Data

Experience on SQL queries and creating database objects like stored procedures, packages, and functions using SQL for implementing the business techniques

Facilitated collaborative decision-making processes, ensuring input from various team members and stakeholder

Experienced with Git and demonstrated ability to resolve merge conflicts and maintain code integrity during the development process

Hands on experience in monitoring and troubleshooting PROD jobs in PRODUCTION environment

Created documentation for data pipelines and collaboration processes with data scientists, analysts, and other stakeholders to ensure clear communication and data requirement alignment

Proficient in creating ETL/ELT pipelines, data modeling, and ensuring data quality within data warehousing environment

For proactive issue resolution, error-handling mechanisms were developed, as well as monitoring and alerting systems, strong problem-solving and analytical thinking

I've played a critical part in the design, development, and maintenance of ETL procedures. This was accomplished using AWS Glue and Apache Spark, allowing our clients to process large datasets

Capable of managing sprints, tracking project progress, and managing backlogs using Agile project management tools like Jira

Skills

Python

Snowflake

PySpark

Kafka

AWS

Azure

Databricks

Git

Aurora

PostgreSQL

Data Warehouse

GCP

Redshift

EMR

SQL

Mongo DB

Oracle

Glue

Work Experience

Senior Data Engineer - Atlanta, GA: Shift Digital Dec 2021 – Current

Shift Digital is like a supportive friend for car dealerships in the digital world. We use clever online advertising, enhance websites, and handle potential car buyers. By analyzing data, we figure out what works best, ensuring dealerships thrive online. Think of us as a valuable ally, making the digital landscape easier for car dealers

Designed and created real-time scalable data pipelines to process structured data by integrating millions of raw records from 5 data sources including Kafka using Spark and Databricks

Utilized Azure Data Lake Storage Gen2 as the primary data repository, enabling efficient data storage, organization, and analysis for huge datasets

Utilized Databricks clusters to efficiently process and transform large-scale datasets, enhancing data accuracy and accessibility

Orchestrated ETL workflows using Airflow DAGs and Azure Data Factory, ensuring efficient extraction, transformation, and loading of data

Configured and orchestrated ETL workflows in Azure Data Factory to extract, transform, and load data from various sources to destinations

Coordinated with upstream data providers and downstream consumers to ensure data accuracy and consistency with the project goals

Building and optimizing OLAP cubes for analytical queries

Ability to integrate SAS solutions into the data engineering workflow

Collaborated with Data Science team to incorporate ML models into the pipeline, enabling advanced insights

Worked closely with Data Analysts and Business Intelligence teams, providing specialized SQL expertise for data analysis and reporting efforts

Regularly reviewed error patterns and trends, leveraging insights to improve error prevention and mitigation strategies and optimize data pipeline performance

Implemented encryption mechanisms, fine-grained IAM policies, and compliance with industry standards for data security and privacy

Integration of JSON processing and REST API interactions into SSIS projects, demonstrating expertise in handling diverse JSON structures and API response formats

Set up real-time monitoring, implemented cost optimization strategies, and optimized data partitioning for enhanced performance

Implementing and automating CI/CD pipelines, understanding of DevOps practices and tools such as Jenkins, Bitbucket, Gitlab, and Terraform

Implemented Azure DevOps pipelines for continuous deployment. Organized training sessions for effective cross-team collaboration and efficiency

Worked to standardize documentation formats and practices across the team, ensuring consistency and ease of use

Data Engineer - Chennai, India: Ford Motor Company May 2019 - Nov 2021

Conducted comprehensive analysis of source databases and tables to gain a deep understanding of source data

Developed SQL scripts for creating tables, views, functions, and stored procedures in Snowflake

Created mapping documents to establish clear connections between the existing data warehouse (Hadoop - HIVE) and Snowflake tables

Executed table creation scripts in Snowflake's development environment for both Raw and Consumption Layers

Designed and implemented integration objects to facilitate data transfer from HIVE to AWS S3 using ETL tool AWS Glue

Implemented batch jobs using Snowflake Task objects to load data from S3 into Snowflake's Raw Layer using the COPY command

Orchestrated data movement from the Raw Layer to the Consumption Layer within Snowflake

Performed rigorous unit testing in the Raw and Consumption Layers, ensuring accurate record counts and data quality. Documented test results in Jira for corresponding tasks within user stories

Developed and executed Views, Materialized Views, User-Defined Table Functions (UDTFs), and Stored Procedures to prepare data for BI reports

Created UDTFs in Snowflake for Hive tables containing static data

Designed ingestion patterns, encompassing both batch and near-real-time (NRT) data processing

Scheduled jobs using Snowflake Task objects with CRON job patterns

I have developed schedules in Airflow that monitor the complete ETL process. Using Airflow DAGs to define jobs, dependencies, and execution schedules

Wrote Python scripts to extract data from Snowflake tables and generate Excel reports. Automated email distribution of reports to business users

Data quality is ensured by using DBT to test the data (schema tests, referential integrity tests, custom tests)

Used DBT to debug complex chains of queries. They can be split into multiple models and macros that can be tested separately

Data was transformed and cleansed, and Workday data fields were mapped to Snowflake data warehouse tables.

Snowflake data was loaded and monitored, and data integration methods were optimized for performance

Integrated automated tests into the CI/CD pipeline to run tests automatically whenever code changes are pushed. Detected issues early in the development process, allowing for rapid feedback and quick bug resolution

Maintained open communication and coordination with business users throughout project lifecycles

To ensure continuous team commitment, best practices were established and published in a Confluence page

Python Data Engineer - Hyderabad, India: Swank Innovations May 2017 - April 2019

Designed and developed web applications using Django framework, implementing robust back-end logic and user-friendly front-end interfaces

Leveraged AWS services such as EC2, S3, Lambda, and RDS to deploy, scale, and manage web applications and services

Designed and executed data ETL pipelines using Python and AWS Glue, facilitating seamless data integration and transformation

Implemented data processing pipelines using AWS Step Functions and AWS Glue, orchestrating the flow of data between various AWS services and Python scripts

Created serverless APIs using AWS API Gateway and AWS Lambda, integrating with Python back-end logic for scalable and cost-effective API solutions

Implemented AWS security measures to ensure data protection and compliance SQL

Designed and maintained relational databases using SQL, ensuring efficient data storage, retrieval, and integrity

Developed complex SQL queries for data manipulation and reporting, optimizing database performance and enabling efficient extraction of meaningful insights from large datasets

Orchestrated the flow of data between AWS services, Python scripts, and Postgres databases using AWS Glue and Step Functions, ensuring a well-coordinated data processing workflow

Implemented infrastructure as code using CDK to define AWS resources and infrastructure components

Implemented object-oriented programming principles in Python to design modular and reusable code structures

Developed custom Python libraries to enhance code maintainability and scalability

Implemented version control using Git to track code changes and facilitate collaborative development across the team

Conducted client presentations and demonstrations to showcase project progress, gather feedback, and ensure alignment with client expectation

SQL Developer - Hyderabad, India: Techwave Inc May 2016 - April 2017

Was responsible for remote disaster recovery system and data backup and restoration, this also includes data fixes in Database, Functionality, and other data issue analysis

Creating stored procedures, functions, and views in MySQL to support the SSIS packages, console applications, SSRS reporting, and data warehouse purposes

Tuning SQL query for better performance using SQL Profiler

Involved in an enhancement with application development team which includes data flow for B2C and B2B applications. Also, collaborated with other teams to solve technical issues across product suites

Wrote and coded logical and physical database descriptions, specifying identifiers of database to management systems. Also, created and implemented complex business intelligence solutions

Scheduling reports for daily, weekly, and monthly reports for executives, business analyst and customer representatives for various categories and regions based on business needs using SQL Server Reporting Services

Performed Database migration which included creating new SSIS packages, Stored procedures and updating the old packages and procedures, migrating respective DB jobs from server to server, and creating new Jobs based on requirements

Working together with Quality Analysts and Business Analyst to make sure all the business requirements are covered for the product

Identified key use cases and associated reference architectures for market segments and industry verticals

Assisted clients in understanding and manipulating data to given value through SQL and ETL technical processes and visual analytics tools

Education

Master’s in information systems, Central Michigan University, MI, USA

Bachelor’s in computer science and Engineering, Vignan's Nirula Institute of Technology and Science for Women, Guntur, India



Contact this candidate