Divya Nimmala
989-***-**** ad24o0@r.postjobfree.com
Professional Summary
Around 8 years of experience in Analysis, Design, Development, and Implementation as a Data Engineer
Hands on experience in Databricks and PySpark to design and optimize data pipelines for efficient ETL processes
Adept at designing and managing complex data systems in Python
Experience in various Azure cloud services such as Data Factory, ADLS, Synapse, Cosmos DB, Key Vault, Event Hubs, Logic Apps
Expert at building efficient data pipelines, ensuring data accuracy, and enabling real-time analytics. Dedicated to staying up to date on the latest Snowflake features for improved insights and decision-making
Skilled in utilizing Snowflake's distinct capabilities for handling diverse data types, real-time analytics, and efficient query processing
Experience in AWS cloud services such as EC2, S3, RDS, IAM, Redshift, Glue, Lambda, Kinesis, CloudWatch, CDK, Athena and EMR
Knowledge in various GCP services including BigQuery, Pub/Sub
Strong data engineering experience, focusing on the development of ETL/ELT pipelines for both batch and streaming data utilizing PySpark and SparkSQL
Hands on experience in Bigdata project implementations with Hadoop, Spark, PySpark, Hive
Experience in creating Real-time scalable data pipelines using Kafka
Experience in technical consulting and end-to-end delivery with architecture, data modeling, design, development, data governance and implementation of solutions
Experience and knowledge of NoSQL databases such as MongoDB and Cosmos DB
Efficient in all phases of the development lifecycle, coherent with Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing
Experience in Big Data Ecosystem in ingestion, storage, querying, processing, and analysis of Big Data
Experience on SQL queries and creating database objects like stored procedures, packages, and functions using SQL for implementing the business techniques
Facilitated collaborative decision-making processes, ensuring input from various team members and stakeholder
Experienced with Git and demonstrated ability to resolve merge conflicts and maintain code integrity during the development process
Hands on experience in monitoring and troubleshooting PROD jobs in PRODUCTION environment
Created documentation for data pipelines and collaboration processes with data scientists, analysts, and other stakeholders to ensure clear communication and data requirement alignment
Proficient in creating ETL/ELT pipelines, data modeling, and ensuring data quality within data warehousing environment
For proactive issue resolution, error-handling mechanisms were developed, as well as monitoring and alerting systems, strong problem-solving and analytical thinking
I've played a critical part in the design, development, and maintenance of ETL procedures. This was accomplished using AWS Glue and Apache Spark, allowing our clients to process large datasets
Capable of managing sprints, tracking project progress, and managing backlogs using Agile project management tools like Jira
Skills
Python
Snowflake
PySpark
Kafka
AWS
Azure
Databricks
Git
Aurora
PostgreSQL
Data Warehouse
GCP
Redshift
EMR
SQL
Mongo DB
Oracle
Glue
Work Experience
Senior Data Engineer - Atlanta, GA: Shift Digital Dec 2021 – Current
Shift Digital is like a supportive friend for car dealerships in the digital world. We use clever online advertising, enhance websites, and handle potential car buyers. By analyzing data, we figure out what works best, ensuring dealerships thrive online. Think of us as a valuable ally, making the digital landscape easier for car dealers
Designed and created real-time scalable data pipelines to process structured data by integrating millions of raw records from 5 data sources including Kafka using Spark and Databricks
Utilized Azure Data Lake Storage Gen2 as the primary data repository, enabling efficient data storage, organization, and analysis for huge datasets
Utilized Databricks clusters to efficiently process and transform large-scale datasets, enhancing data accuracy and accessibility
Orchestrated ETL workflows using Airflow DAGs and Azure Data Factory, ensuring efficient extraction, transformation, and loading of data
Configured and orchestrated ETL workflows in Azure Data Factory to extract, transform, and load data from various sources to destinations
Coordinated with upstream data providers and downstream consumers to ensure data accuracy and consistency with the project goals
Building and optimizing OLAP cubes for analytical queries
Ability to integrate SAS solutions into the data engineering workflow
Collaborated with Data Science team to incorporate ML models into the pipeline, enabling advanced insights
Worked closely with Data Analysts and Business Intelligence teams, providing specialized SQL expertise for data analysis and reporting efforts
Regularly reviewed error patterns and trends, leveraging insights to improve error prevention and mitigation strategies and optimize data pipeline performance
Implemented encryption mechanisms, fine-grained IAM policies, and compliance with industry standards for data security and privacy
Integration of JSON processing and REST API interactions into SSIS projects, demonstrating expertise in handling diverse JSON structures and API response formats
Set up real-time monitoring, implemented cost optimization strategies, and optimized data partitioning for enhanced performance
Implementing and automating CI/CD pipelines, understanding of DevOps practices and tools such as Jenkins, Bitbucket, Gitlab, and Terraform
Implemented Azure DevOps pipelines for continuous deployment. Organized training sessions for effective cross-team collaboration and efficiency
Worked to standardize documentation formats and practices across the team, ensuring consistency and ease of use
Data Engineer - Chennai, India: Ford Motor Company May 2019 - Nov 2021
Conducted comprehensive analysis of source databases and tables to gain a deep understanding of source data
Developed SQL scripts for creating tables, views, functions, and stored procedures in Snowflake
Created mapping documents to establish clear connections between the existing data warehouse (Hadoop - HIVE) and Snowflake tables
Executed table creation scripts in Snowflake's development environment for both Raw and Consumption Layers
Designed and implemented integration objects to facilitate data transfer from HIVE to AWS S3 using ETL tool AWS Glue
Implemented batch jobs using Snowflake Task objects to load data from S3 into Snowflake's Raw Layer using the COPY command
Orchestrated data movement from the Raw Layer to the Consumption Layer within Snowflake
Performed rigorous unit testing in the Raw and Consumption Layers, ensuring accurate record counts and data quality. Documented test results in Jira for corresponding tasks within user stories
Developed and executed Views, Materialized Views, User-Defined Table Functions (UDTFs), and Stored Procedures to prepare data for BI reports
Created UDTFs in Snowflake for Hive tables containing static data
Designed ingestion patterns, encompassing both batch and near-real-time (NRT) data processing
Scheduled jobs using Snowflake Task objects with CRON job patterns
I have developed schedules in Airflow that monitor the complete ETL process. Using Airflow DAGs to define jobs, dependencies, and execution schedules
Wrote Python scripts to extract data from Snowflake tables and generate Excel reports. Automated email distribution of reports to business users
Data quality is ensured by using DBT to test the data (schema tests, referential integrity tests, custom tests)
Used DBT to debug complex chains of queries. They can be split into multiple models and macros that can be tested separately
Data was transformed and cleansed, and Workday data fields were mapped to Snowflake data warehouse tables.
Snowflake data was loaded and monitored, and data integration methods were optimized for performance
Integrated automated tests into the CI/CD pipeline to run tests automatically whenever code changes are pushed. Detected issues early in the development process, allowing for rapid feedback and quick bug resolution
Maintained open communication and coordination with business users throughout project lifecycles
To ensure continuous team commitment, best practices were established and published in a Confluence page
Python Data Engineer - Hyderabad, India: Swank Innovations May 2017 - April 2019
Designed and developed web applications using Django framework, implementing robust back-end logic and user-friendly front-end interfaces
Leveraged AWS services such as EC2, S3, Lambda, and RDS to deploy, scale, and manage web applications and services
Designed and executed data ETL pipelines using Python and AWS Glue, facilitating seamless data integration and transformation
Implemented data processing pipelines using AWS Step Functions and AWS Glue, orchestrating the flow of data between various AWS services and Python scripts
Created serverless APIs using AWS API Gateway and AWS Lambda, integrating with Python back-end logic for scalable and cost-effective API solutions
Implemented AWS security measures to ensure data protection and compliance SQL
Designed and maintained relational databases using SQL, ensuring efficient data storage, retrieval, and integrity
Developed complex SQL queries for data manipulation and reporting, optimizing database performance and enabling efficient extraction of meaningful insights from large datasets
Orchestrated the flow of data between AWS services, Python scripts, and Postgres databases using AWS Glue and Step Functions, ensuring a well-coordinated data processing workflow
Implemented infrastructure as code using CDK to define AWS resources and infrastructure components
Implemented object-oriented programming principles in Python to design modular and reusable code structures
Developed custom Python libraries to enhance code maintainability and scalability
Implemented version control using Git to track code changes and facilitate collaborative development across the team
Conducted client presentations and demonstrations to showcase project progress, gather feedback, and ensure alignment with client expectation
SQL Developer - Hyderabad, India: Techwave Inc May 2016 - April 2017
Was responsible for remote disaster recovery system and data backup and restoration, this also includes data fixes in Database, Functionality, and other data issue analysis
Creating stored procedures, functions, and views in MySQL to support the SSIS packages, console applications, SSRS reporting, and data warehouse purposes
Tuning SQL query for better performance using SQL Profiler
Involved in an enhancement with application development team which includes data flow for B2C and B2B applications. Also, collaborated with other teams to solve technical issues across product suites
Wrote and coded logical and physical database descriptions, specifying identifiers of database to management systems. Also, created and implemented complex business intelligence solutions
Scheduling reports for daily, weekly, and monthly reports for executives, business analyst and customer representatives for various categories and regions based on business needs using SQL Server Reporting Services
Performed Database migration which included creating new SSIS packages, Stored procedures and updating the old packages and procedures, migrating respective DB jobs from server to server, and creating new Jobs based on requirements
Working together with Quality Analysts and Business Analyst to make sure all the business requirements are covered for the product
Identified key use cases and associated reference architectures for market segments and industry verticals
Assisted clients in understanding and manipulating data to given value through SQL and ETL technical processes and visual analytics tools
Education
Master’s in information systems, Central Michigan University, MI, USA
Bachelor’s in computer science and Engineering, Vignan's Nirula Institute of Technology and Science for Women, Guntur, India