Data Engineer Analyst

Location:

Raleigh, NC

Posted:

May 15, 2023

Contact this candidate

Resume:

DIVYA GANNAMANI

Sr. DATA ENGINEER

Ph: +1-919-***-****

E-mail: *************@*****.*** LinkedIn: www.linkedin.com/in/divigannamani

Professional Summary:

Around 6+ years of programming experience working as Sr. Data Engineer involved in all phases of Software Development Life Cycle (SDLC)

Having Big Data experience exclusively on BIG DATA ECOSYSTEM using HADOOP framework and related technologies such as HDFS, HBASE, MapReduce, HIVE, FLUME, OOZIE, SQOOP, and ZOOKEEPER and in building highly scalable data analytics applications.

Expertise installing, configuring, maintaining, and administering Hadoop Clusters utilizing the Apache, Cloudera, and Amazon Web Services distributions (AWS).

Working knowledge of Mapper/Reduce/HDFS Framework as well as Hive, HBase, HBase-Hive Integration, Sqoop, and other key Hadoop Ecosystem components.

Hands on experience in AWS services like EC2, EMR, S3, Sagemaker, Athena, Glue Meta store, ELB, RDS, SQS, EBS, VPC, EBS, AMI, SNS, RDS, EBS, Cloud Watch, Cloud Trail, Cloud Formation, Autoscaling, Cloud Front, IAM, R53.

Experience in using Airflow, Snowflake, and Databricks to design, build and maintain scalable and efficient data pipelines.

Proficient in configuring AWS CloudWatch to monitor AWS services, custom applications, and infrastructure, with a focus on optimizing system performance and ensuring high availability.

Practical experience with AWS Redshifts to load huge data sets.

Worked on Azure SQL Database, Azure Data Lake (ADLS), Azure Data Factory (ADF) V2, Azure SQL Data Warehouse, Azure Key Vault, Azure Data Flows, Azure Databricks, Azure Synapse, Azure Blob Storage and other Azure services for design and implementation of data migration and transformation strategies.

Strong foundational knowledge using Azure data platforms like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Azure Data Services.

Have extensive hands-on experience in migrating on premises ETL to GCP using cloud native tools like Cloud Data Proc, Big Query, Google cloud storage, Composer.

Results-oriented data engineer with extensive experience designing and implementing data pipelines to extract, transform, and load data into ThoughtSpot data warehouse.

Proficient in designing and implementing ELT pipelines using industry-standard tools such as Apache Airflow, Talend, and Stitch.

Implemented and optimized Dremio data virtualization to provide business users with a unified view of data from multiple sources, resulting in improved data accessibility and faster decision-making for the organization.

Experience with NoSQL databases such as MongoDB, Cassandra, and HBase.

Created numerous MapReduce applications using Apache Hadoop for handling Big Data.

Solid understanding of SQOOP, YARN and Puppet as well as analysis skills utilizing HIVE.

Created automated scripts for database-related tasks including RUNSTATS, REORG, REBIND, COPY, LOAD, BACKUP, IMPORT, and EXPORT using Unix Shell.

Strong Knowledge in developing Web UI applications using HTML, CSS, XML, Bootstrap, JavaScript, Ajax, Spring MVC, Spring boot, jQuery, Angular.js, and React.js,

Experienced in reading and writing data from various file formats such as CSV, Excel, SQL databases, and more using Pandas, making it easy to work with different data sources.

Worked on building real time data workflows using Kafka, Spark streaming and HBase.

Proficient with container systems like Docker and container orchestration like EC2 Container Service, Kubernetes, worked with Terraform.

Built end to end CI/CD Pipelines in Jenkins to retrieve code, compile applications, perform tests and push build artifacts to Nexus.

Solid experience in working with CSV, Text, Sequential, Avro, parquet, orc, Json datasets.

Experience in using Hadoop ecosystem and processing data using Tableau, Quick sight, and Power BI, Hortonworks, and Cloudera.

Developed numerous projects using the Flask framework and with one of the latest frameworks called Fast API as well as a Python developer.

Knowledge on Spring Boot, Spring DAO, Spring web flow, and Spring APO

Strong Knowledge on COBOL, Core java concepts like OOPS, Data Structures & algorithms, Generics, Collections, Multithreading, Lambda, Exceptional handling, I/O, and java. Lang Package.

Technical Skills:

PROGRAMMING LANGUAGES

Java, Scala, Python and Shell Scripting

BIG DATA ECOSYSTEM

Spark, Hive, HBase, SQOOP, Oozie, Storm, Flume, Pig, Kafka, NiFi, Zookeeper, MapReduce

CLOUD

AWS EMR, EC2, S3, RDS, Azure Databricks, Azure Data Factory, Azure data lake, Azure Synapse, Azure Blob Storage, Snowflake, GCP, DataProc, DataFlow, Cloud Functions, Big Query

DBMS

SQL Server, MySQL, PL/SQL, Oracle, Cassandra, Vertica, Versant

WEB TECHNOLOGIES

HTML, JavaScript, XML, jQuery, Ajax, CSS

IDEs

Eclipse, IntelliJ, Visual Studio, WinSCP

DevOps

GitHub, Jenkins, Ansible, Chef, Docker, Kubernetes, Nagios, Puppet

OPERATING SYSTEMS

Windows, Unix, Linux, Solaris, CentOS

FRAMEWORKS

MVC, Struts, Maven, Junit, Log4J, ANT, Tableau, Splunk, Aqua-data Studio

J2EE TECHNOLOGIES & WEB SERVICES

Spring, Servlets, J2SE, JSP, JDBC, Web Logic, Web Sphere.

Professional Skills:

Humana, KY, USA. Jan 2022 – Till Date

Role: Sr. Data Engineer

Responsibilities:

Working on building data pipelines that performs data ingestion, data enrichment and data loading/distribution of the business ready data sets as inputs to building statistical models like Forecast Model, Pricing Model.

Experience in migrating and implementation of multiple applications from on premise to cloud using AWS services like SMS, DBMS, CloudFormation, S3, Route53 Glacier, SNS, AWS Glue, Lambda and VPC.

Implemented AWS EC2, Key Pairs, Security Groups, Auto Scaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services.

Pushed final data to AWS Snowflake Data Warehouse, providing enriched data for consumption by the data science teams.

Stored data in AWS S3 like HDFS and performed EMR programs on data stored in S3.

Built streaming data pipelines using Kafka, with the data source being AWS S3. Configured AWS Lambda to parse events, retrieve object and bucket metadata, and ingest into a Kafka topic.

Used AWS glue ETL service that consumes raw data from S3 bucket and transforms raw data as per the requirement and write the output to S3 bucket in parquet format for data analytics purpose.

Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDD's.

Performing metadata conversion from Oracle to Redshift using Schema Conversion Tool - AWS SCT

Implemented batch data pipelines using Apache Spark, including Spark SQL, Spark Streaming, Mllib, and GraphX. Used Spark Streaming for batch processing, Spark Mllib for model training, Spark DataFrames for transformations, and Spark SQL for querying data.

Processed data using RDDs on Spark, applying transformations such as map, filter, and flatmap, and performed actions to load data onto the cluster.

Implemented multiple Spark Structured Streaming jobs in AWS Databricks using Python and Pyspark, including a PRE ETL job and a data cleansing, transformation, deduplication, and standardization job.

Worked closely with the development team to integrate CloudWatch with application logs and other AWS services such as Lambda and EC2 Auto Scaling, enabling automated remediation of issues and reducing the need for manual intervention.

Strong Experience in implementing Data warehouse solutions in Confidential Redshift.

Worked on various projects to migrate data from on premise databases to Confidential Redshift and RDS

Strong experience in setting up CloudWatch alarms and notifications to alert teams of critical events and incidents, reducing downtime and improving response times.

Designed and developed ELT pipelines to extract data from various sources such as policy administration systems, claims management systems, and billing systems, and loaded it into a centralized data warehouse.

Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3buckets, performed folder management in each bucket, managed logs, and objects within each bucket.

Worked in cloud formation to automate AWS environment creation along with the ability to deploy AWS using bill scripts (Boto3 and AWS CLI) and automate solutions using python and shell scripting.

Designed and implemented complex database solutions using PostgreSQL, leveraging its advanced data types, security features, and scalability to meet the needs of Project.

Optimized performance of large-scale PostgreSQL databases through effective indexing, partitioning, and query optimization.

Implemented data ingestion and transformation processes using AWS Glue to enable efficient extraction, transformation, and loading (ETL) of data from diverse sources into a centralized data lake, ensuring data quality and availability for downstream analytics and reporting.

Utilized the PostgreSQL programming language to write custom functions and triggers, extending the functionality of the database to meet specific business requirements.

Utilized Kinesis to build real-time data pipelines for streaming analytics, enabling the organization to make data-driven decisions faster and more efficiently.

Strong understanding of DAX formulas and M language for data transformation.

Developed and maintained YAML configurations for Docker Compose, Kubernetes, and Ansible, streamlining deployment and management of complex microservices architecture.

Experience in creating interactive dashboards and reports using Tableau.

Proficient in using Tableau desktop, Tableau service, and Tableau mobile app for data analysis and visualization.

Architect, Design, Develop and Maintain custom complex ETL pipelines using various scripting languages including Python, Scala, Spark, Hive for EMR.

Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.

Write Python scripts to update content in the database and manipulate files. Involved in building database Model, APIs, and Views utilizing Python technologies to build applications.

Translated customer business requirements into technical design documents, established specific solutions, and leading the efforts including programming in Spark Scala and testing that culminate in client acceptance of the results.

Responsible for troubleshooting any issues within Oracle HCM, assigned process flows, and helped troubleshoot issues with end user to resolve.

Created and maintained ETL pipelines using COBOL to extract data from legacy systems and load it into modern data warehouses and data lakes.

Optimized COBOL programs for performance and scalability, using techniques such as parallel processing, indexing, and data partitioning.

Used Maven to build rpms from source code in Scala checked out from GIT repository, with Jenkins being the Continuous Integration Server and Artifactory as repository manager.

Responsible for Setting up UNIX/Linux environments for various applications using shell scripting.

Managed Zookeeper for cluster co-ordination and Kafka Offset monitoring.

Optimized legacy queries to extract the customer information from Oracle.

Myntra, Chennai, India. July 2020- Dec 2021

Role: Big Data Developer.

Responsibilities:

Worked on migration of Event processor and big query executor jobs to Data build tool models.

Developed SQL queries from legacy Scala application to build DBT models.

Developed pre and post hooks for DBT configuration with various parameters for incremental load data.

Data pipelines triggered and scheduled using Airflow by constructing DAGs using python operators.

Built data pipelines to ingest data from hive to Big Query using DataProc.

Built ETL pipelines to ingest data from third party cloud platforms to inn house GCP platform using DataProc.

Worked with SharePoint engineer team in troubleshooting production issues regarding Firewall, port exhaustion, SQL connectivity.

Worked on Talend ETL to load data from various databases and further process them.

Created and modified several database objects such as Tables, views, indexes, constraints, packages, and triggers using MySQL, Microsoft SQL, and Oracle SQL.

Worked on a migration project to migrate data from on premises to GCP.

Build Data pipelines in airflow in GCP for ETL related jobs using different airflow operators.

Developed scripts for extracting and processing data from SFTP server in Hive data warehousing using Linux shell scripting.

Developed and maintained ETL pipelines using Dremio to process and transform large volumes of data from disparate sources, resulting in improved data quality and reliability for downstream analytics and reporting.

Expertise in Snowflake Cloud Data Platform, including administration, configuration, and development.

Developed a fully automated continuous integration system using Git, Jenkins, Splunk, Hunk, Oracle and custom tools developed in Python and Bash.

Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.

Used cloud shell SDK in GCP to configure the services Data proc, Storage, Big Query.

Developed Oozie coordinators to schedule Hive scripts to create Data pipelines.

Created Big Query authorized views for row level/Column level security or exposing the data to other teams and querying the big query data from tableau servers using their access levels.

Deployed Cloud IAM solution for auditing purposes such as to find out what level of access each of the GCP users in the organization has across all the projects.

Successfully planned and executed the migration of a multi-tiered application from Google Cloud Platform (GCP) to Microsoft Azure.

Implemented strategies for data migration and synchronization, ensuring minimal disruption to business operations during the transition.

Configured and deployed Azure virtual machines, storage accounts, and networking components to replicate the existing GCP infrastructure.

Successfully managed and supported the Azure Machine Learning (AML) environment, including developing and deploying AML endpoints for multiple projects and teams.

Expertise in developing ETL processes using tools such as Azure Data Factory and SSIS to integrate large volumes of data from diverse sources, ensuring data quality and accuracy.

Worked on developing Azure Data Factory pipelines with various integration runtimes & linked services; and multiple activities like Copy, Dataflow, Spark, Lookup, Stored Procedure, For each & While loop.

Created and scheduled workflows, mappings, sessions, and workflows in Informatica Power Center or SSIS to automate data integration process.

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Azure Data Lake Analytics, Data Ingestion to Azure Services like Azure Data Lake, Azure Storage, Azure SQL, Azure DW and processing the data in In Azure Databricks.

Experience in creating interactive dashboards and reports using Power BI.

Familiarity with Power Query Editor for data cleansing and transformation.

Proficient in using Power BI desktop, Power BI service, and Power BI mobile app for data analysis and visualization.

Proficient in deploying and supporting Azure Synapse (formerly Azure SQL Data Warehouse), including designing, implementing, and managing data warehouses to meet diverse business needs.

Designed and implemented SSIS packages to extract, transform, and load (ETL) data from various sources into Azure SQL Database, enabling efficient and reliable data integration for the Ecommerce platform.

Created and maintained SSRS reports to provide business insights and analytics to stakeholders, leveraging Azure Analysis Services and Power BI to visualize data and improve decision-making.

Utilized Azure Site Recovery for disaster recovery and business continuity planning, providing a high availability solution for the migrated application.

Improved application performance and scalability using Azure Kubernetes Service and Azure Container Instances.

Demonstrated ability to collaborate effectively with cross-functional teams and stakeholders, including business analysts, data scientists, developers, and project managers, to deliver high-quality solutions that meet project goals and deadlines.

Familiarity with data governance, security, and compliance best practices, including GDPR, HIPAA, and CCPA, ensuring that data is protected and managed in accordance with relevant regulations and standards.

Managed and monitored the migration project, proactively addressing any challenges and risks to ensure successful delivery.

Vodafone, Hyderabad, India. Aug 2019- June 2020

Role: Data Engineer.

Responsibilities:

Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.

Worked on setting up Azure Data Lake environment for unstructured data and log files from multiple applications.

Created Azure SQL database, performed monitoring and restoring of Azure SQL database and performed migration of Microsoft SQL server to Azure SQL database.

Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, data Lake and Data Factory.

Experience in working with Azure Cloud Services including Linux Virtual Machines, Managed Disks along with various services including BLOB Storage, File Storage, Azure DevOps, Load Balancer, Automation, Batch, Virtual Networks.

Worked on migration of data from On-prem SQL server to Cloud databases like Azure synapse Analytics and Azure SQL DB.

Have good experience working with Azure BLOB and Data Lake storage and loading data into Azure SQL synapse analytics (DW).

Have extensive experience in creating pipeline jobs, scheduling triggers, Mapping data flows using Azure Data Factory (V2) and using Key Vaults to store credentials.

Created Azure Rest API endpoints to deliver performance and client reporting data feeds.

Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Worked on Azure API Management, Security, Cloud to Cloud Integration.

Worked in Azure environment for development and deployment of Custom Hadoop Applications.

Converted existing MapReduce programs to Spark to cleanse the data in HDFS obtained from multiple data sources to make it suitable for ingestion into Hive for analysis.

Designed and developed Spark jobs to process data coming in different file formats like XML, CSV, and JSON.

Created Azure Rest API endpoints to deliver performance and client reporting data feeds.

Designed SSIS Packages to transfer data from flat files, Excel SQL Server using Business Intelligence Development Studio. Migrated on premise databases to Snowflake databases via shift and load method in ADF.

Worked as a Data Engineer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper, Oozie and Hive with Cloudera Hadoop distribution.

Involved in Agile development methodology active member in scrum meetings.

Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, Hive, MapReduce, Spark, and Shell scripts.

Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Canara Bank, Bengaluru, India. June 2018- July 2019

Role: Data Analyst.

Responsibilities:

Involved in primary on-site ETL Development during the analysis, planning, design, development, and implementation stages of projects using IBM Web Sphere software (IBM Infosphere DataStage v 8.5 and 11.5).

Understanding the data sources and working with data analysts to understand the structure and format of the data to be extracted.

Worked on Automation of ETL Processes using DataStage Job sequencer, Shell Scripting and PL/SQL Programming.

Extensive experience in using Azure Data Factory for creating, scheduling and orchestrating data pipelines.

Utilized Azure Databricks for performing data transformation and cleansing tasks.

Implemented real-time data processing using Azure Stream Analytics to process large amounts of streaming data.

Stored transformed data using Azure SQL Database, a scalable and reliable relational database service.

Worked with Azure Blob Storage, a cost-effective object storage service, to store large amounts of unstructured and semi-structured data.

Proven track record of delivering successful ETL projects on Azure, ensuring data quality, accuracy, and consistency.

Demonstrated ability to utilize the comprehensive set of services offered by Azure for ETL operations, making it easier to manage large amounts of data and automate data workflows.

Worked on Mainframe related Jobs and updated the Mainframe Datasets to Unix platform as per Business Requirement.

Design and developed various jobs using DataStage Parallel Extender stages OCI, Hashed file, Sequential file, Aggregator, Pivot and Sort.

Developing the ETL code and writing the code that implements the ETL process, using tools such as SQL, Python, or Informatica.

Developed PL/SQL procedures & functions to support the reports by retrieving the data from the data warehousing application.

Developed complex jobs using various stages like Lookup, Join, Transformer, Dataset, Row Generator, Column Generator, Datasets, Sequential File, Aggregator and Modify Stages.

Testing the ETL process and ensuring that the code works as intended and that the data is accurately extracted, transformed, and loaded into the target system.

Used DataStage stages namely Hash file, Sequential file, Transformer, Aggregate, Sort, Datasets, Join, Lookup, Change Capture, Funnel, FTP, Peek, Row Generator stages in accomplishing the ETL Coding.

Involved in performance tuning and optimization of DataStage mappings using features like Pipeline and Partition Parallelism to manage very large volume of data.

Deployed different partitioning methods like Hash by column, Round Robin, Entire, Modulus, and Range for bulk data loading and for performance boost.

Repartitioned job flow by determining DataStage PX best available resource consumption.

Created Universes and reports in Business object Designer.

Created, implemented, modified, and maintained the business simple to complex reports using Business objects reporting module.

Worked with Power BI for creating Power BI reports and dashboards and created custom visualizations and published them to power BI service.

Extracted data from various databases like DB2, SQL Server, Oracle and Netezza.

Designed Parallel jobs involving complex business logic, update strategies, transformations, filters, lookups, and necessary source-to-target data mappings to load the target using DataStage designer.

Designed and developed highly complex SQL queries to extract data from RDBMS, such as Oracle, SQL Server, MySQL, or PostgreSQL, based on business requirements.

Implemented data quality controls and data validation routines in SQL queries to ensure the accuracy and completeness of the data.

Designed DataStage jobs using Sequential File stage, Complex Flat File Stage, Modify, Surrogate Key Generator, Pivot, Filter, Funnel, Join, Lookup, Transformer, Copy, Aggregator, and Change Capture

DMart, Hyderabad, India. Aug 2016- May 2018

Role: Software Engineer.

Responsibilities:

Designed front end and backend of the application utilizing Python on Django Web Framework.

Developed consumer-based features and applications using Python and Django in test driven Development.

Worked on front end frame works like CSS Bootstrap for development of web applications.

Develop consumer-based features and application using Python, Django, pyramid, Flask, Web2py, HTML, and other web technologies.

Proficient in using Pandas library for data manipulation and analysis, including data cleaning and preprocessing, filtering, grouping, sorting, and visualization.

Wrote Python modules to extract/load asset data from MYSQL source database.

Extensively worked on Jenkins by installing, configuring, and maintaining for continuous integration (CI) and for End-to-End automation for all build and deployments.

Launched Kubernetes to provide a platform for automating deployment, scaling, and operations of application containers across cluster of hosts.

Implemented Bash, Perl, Python scripting to automate many day-to-day tasks.

Wrote with Object- oriented Python, Flask, SQL, Beautiful Soup, jinja2, HTML/CSS, Bootstrap, jQuery, Linux, Sublime Text, and GIT.

Experience designing and implementing systems using a Service-Oriented Architecture.

Knowledge of web service protocols such as SOAP, REST, and XML-RPC.

Experience building and consuming web services using SOAP.

Worked on front end frame works like CSS Bootstrap for development of Web applications.

Created database using MySQL, wrote several queries to extract data from database.

Worked in NoSQL database on simple queries and writing Stored Procedures for normalization and renormalization.

Experienced in using the AWS SDK for Python (Boto3) to interact with various AWS services such as Amazon S3, Amazon EC2, and Amazon DynamoDB.

Utilized AWS services such as AWS Lambda and AWS Elastic Beanstalk to run and manage Python applications on the cloud.

Developed scripts and applications using Python and Boto3 to automate processes and interact with AWS services programmatically.

Implemented solutions for storing and retrieving data using Amazon S3 and Amazon DynamoDB.

Designed and deployed scalable and highly available applications on AWS using Amazon EC2 instances and Auto Scaling groups.

Utilized AWS services such as Amazon CloudWatch and Amazon SNS to monitor and manage applications on the cloud.

Worked on migrations of on-premises applications to the cloud using AWS services.

Collaborated with DevOps teams to automate deployment processes and manage infrastructure on AWS.

Developed Merge jobs in Python to extract and load data into MySQL database.

Successfully migrated the Django database from SQLite to MySQL to PostgreSQL with complete data integrity.

Designed front end using UI, HTML, Bootstrap, Node JS, underscore JS, Angular JS, CSS, and JavaScript.

Contact this candidate