Senior data engineer

Location:

Cleveland, OH

Posted:

May 27, 2023

Contact this candidate

Resume:

AJAY REDDY REGATTE

Senior Data Engineer

Mail: ****************@*****.***

Phone:440-***-****

SUMMARY

7+ years of overall IT experience in a variety of industries, this includes hands-on experience in Big Data Analytics and development.

Experience in collecting, processing, and aggregating large amounts of streaming data using Kafka, Spark Streaming.

Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing datamining, and reporting solutions that scales across massive volume of structured and unstructured data.

Experience in designing Data Marts by following Star Schema and Snowflake Schema Methodology.

Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database, and SQL Data warehouse environment.

Propose architectures considering cost/spend in Azure and develop recommendations to right-size data infrastructure.

Datacenter Migration, Azure Data Services have a strong virtualization experience.

Experience managing Big Data platform deployed in Azure Cloud.

Experience in analyzing data using Spark SQL, Hive QL and PIG Latin.

Familiarity with Amazon Web Services along with provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2 instances, RDS and others.

Hands on experience building streaming applications using Spark Streaming and Kafka with minimal/no data loss and duplicates.

Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.

Strong experience and knowledge of HDFS, Map Reduce and Hadoop ecosystem components like Hive, Pig, Sqoop, NoSQL databases such as Mongo DB and Cassandra.

Extensive work in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, Python and Tableau.

Hands on experience in implementing LDA, Naïve Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principal Component Analysis and good knowledge on Recommender Systems.

Performed statistical & graphical analytics using NUMPY, PANDAS, MATPLOTLIB and BI tools such as Tableau.

Experience in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboards.

Statistical Modelling with ML to bring Insights in Data under guidance of Principal Data Scientist.

Involved in all phases of software development life cycle in Agile, Scrum and Waterfall management process.

TECHNICAL SKILLS

Big Data/Hadoop Technologies

MapReduce, Spark, SparkSQL, Azure, Spark Streaming, Kafka, PySpark, Pig, Hive,

HBase, Flume, Flink, Yarn, Oozie, Zookeeper, Hue, Ambari Server

Languages

HTML5, DHTML, WSDL, CSS3, C, C++, XML, R/R Studio, SAS Enterprise Guide,

SAS, R (Caret, Weka, ggplot), Perl, MATLAB, Mathematica, FORTRAN, DTD,

Schemas, Json, Ajax, Java, Scala, Python (NumPy, SciPy, Pandas, Gensim, Keras),

Java Script, Shell Scripting

NO SQL Databases

Cassandra, HBase, MongoDB, MariaDB

Web Design Tools

HTML, CSS, JavaScript, JSP, jQuery, XML

Development Tools

Microsoft SQL Studio, IntelliJ, Azure Databricks, Eclipse, NetBeans.

Public Cloud

EC2, IAM, S3, Autoscaling, CloudWatch, Route53, EMR, RedShift, Glue,

Athena, SageMaker.

Development Methodologies

Agile/Scrum, UML, Design Patterns, Waterfall

Databases

ETL Tools

Snowflake, AWS RDS, Teradata, Oracle 9i/10g, MySQL 5.5/5.6, Microsoft SQL,

PostgreSQL, Epic Clarity, FHIR, MDM (Golden Architecture), Data Governance

Azure Data Factory (ADF), Azure Database migration Service (DMS), ETL SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), ETL Extract Transformation and Load., Business Intelligence (BI).

PROFESSIONAL EXPERIENCE

Client: Paycor Cincinnati,oh

Sep 2020 to present

Sr. Data Engineer

Responsibilities:

Implemented Restful web service to interact with Redis Cache framework.

Intake happens through Sqoop, and Ingestion happens through Map Reduce, HBASE.

Migrate data from traditional database systems to Azure databases.

Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.

Interacting with other data scientists and architected custom solutions for data visualization using tools like Tableau, packages in R.

Developed predictive models using Python & R to predict customers churn and classification of customers.

Documenting the best practices and target approach for CI/CD pipeline

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Design and implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools.

Experience in DWH/BI project implementation using Azure Data Factory.

Installed Kafka manager for consumer lags and for monitoring Kafka Metrics also this has been used for adding topics, Partitions etc.

Created Kafka streaming data pipelines to use data from multiple sources and perform transformations using Scala.

Handled the importing of data from various data sources, performed transformations using hive, Map-Reduce, loaded data into HDFS and extracted data from MySQL into HDFS using Sqoop.

Responsible for performing various transformations like sort, join, aggregations, filter in-order to retrieve various datasets using apache spark.

Developed various data loading strategies and performed various transformations for analyzing the datasets by using Hortonworks Distribution for Hadoop ecosystem.

Wrote Spark RDD transformations, actions, Data Frames, case classes for the required input data ad performed the data transformations using Spark Context to convert RDD to Data frames

Worked on storing the dataframe into hive as table using Python (PySpark).

Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.

Involved in creating Hive Tables, loading with data, and writing Hive queries which will invoke and run MapReduce jobs in the backend.

Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.

Extracted data from multiple sources, applied transformations, loaded data into HDFS.

Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.

Involved in writing optimized Pig Script along with developing and testing Pig Latin Scripts

Implemented the workflows using Apache Oozie framework to automate tasks

Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.

Exported data to Cassandra (NoSQL) database from HDFS using sqoop and performed various CQL commands on Cassandra to obtain various datasets as required.

After performing all the transformations data is stored in MongoDB (NOSQL)using Sqoop.

Created and imported various collections, documents into MongoDB and performed various actions like query, project, aggregation, sort, limit.

Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.

Tools & Environment: Hadoop, HDFS, Map Reduce, spark, Sqoop, Oozie, Pig, Kerberos, Hive, Flume, LINUX, Java, Eclipse, Cassandra, Python, MongoDB, MS Azure

Client: Fiserv, Inc. Brookfield, Wisconsin

Apr2019 – Sep 2020

Data Engineer

Responsibilities:

Performed ETL on data from different formats like JSON, Parquet.

Analyzed the data by performing Hive queries (HiveQL) and running Pig Scripts (Pig Latin).

Used Oozie Scheduler systems to automate the pipeline workflow and orchestrate the Spark jobs.

Worked in Loading and transforming large sets of structured, semi structured, and unstructured data.

Involved in collecting, aggregating, and moving data from servers to HDFS using Flume.

Written multiple Hive UDFS using Core Java and OOP concepts and spark functions within Python programs.

Wrote Spark applications for Data validation, cleansing, transformations, and custom aggregations.

Imported data from various sources into Spark RDD for processing.

Developed custom aggregate functions using Spark SQL and performed interactive querying.

Collecting data from various Flume agents that are imported on various servers using Multi-hop Flow.

Created functions and assigned roles in AWS Lambda to run python scripts, and AWSLambda using java to perform event driven processing.

Created Hive UDFs and UDAFs using python scripts & Java code based on the given requirement

Automated all the jobs to pull the data and load into Hive tables, using Oozie workflows

Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.

Knowledge on microservices architecture in spring Boot integrating with various restful webservices.

Created and maintained technical documentation for launching Hadoop Clusters and for executing Pig Scripts.

Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3 buckets, performed folder management in each bucket, managed logs and objects within each bucket.

Developed SQOOP scripts to migrate data from Oracle to Big data Environment.

Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3buckets, performed folder management in each bucket, managed logs and objects within each bucket.

Converted all Hadoop jobs to run in EMR by configuring the cluster according to the data size

Collated Real-time streaming data from credit agencies such as Transunion & Experian, performed data cleaning and fed the data into Kafka.

Deployed model using RESTful APIs and used Dockers to facilitate multi-environment transition.

Streaming data was stored using Amazon S3 deployed over EC2 and EMR cluster framework apart from in-house tools.

Tools & Environment: Spark, AWS, EC2, EMR, Hive, MS SQL Server, Genie Logs, Kafka, Sqoop, Spark SQL, Spark Streaming, Scala, Python, Tableau

Client: KeyBank Cleveland,OH

Aug 2017 – Apr 2019

Data Analyst/Data Engineer

Responsibilities:

Led data analysis and data profiling using complex SQL on various sources systems.

Lead data analytics projects from requirement stage, scope analysis, developing models, deployment and support.

Built Azure Web Job for Product Management teams to connect to different APIs and sources to extract the data and load into Azure Data Warehouse using Azure Web Job and Functions.

Implemented large scale data and analytics using advanced statistical and machine learning models.

Led thorough data analysis and documented them; recommended courses of action to determine the best outcomes.

Responsible for data extraction, data analysis, understand the ‘Why’ behind data.

Migrate data from traditional database systems to Azure databases.

Deploying Azure Resource Manager JSON Templates from PowerShell worked on Azure suite: Azure SQL Analyzed failed SSIS packages onto different environments by checking the log files.

Supervised entire gamut of data analysis including data collection, data transformation, and data loading.

Implemented data analysis by utilizing simple machine learning algorithm.

Elicited requirements using document analysis, surveys, business process descriptions, use cases, business analysis, and task and workflow analysis.

Analyze risk by domain experts by ensuring maintenance and analyses of large datasets.

Experience in Custom Process design of Transformation via Azure Data Factory & Automation Pipelines.

Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.

Involved in SQOOP implementation which helps in loading data from various RDBMS sources to Hadoop systems and vice versa.

Involved in Analyzing system failures, identifying root causes, and recommended course of actions, Documented the systems processes and procedures for future references.

Involved in Configuring Hadoop cluster and load balancing across the nodes.

Extensively used the Azure Service like Azure Data Factory and Logic App for ETL, to push in/out the data from DB to Blob storage, HDInsight - HDFS, Hive Tables.

Involved in Hadoop installation, Commissioning, Decommissioning, Balancing, Troubleshooting, Monitoring and, debugging Configuration of multiple nodes using Hortonworks platform.

Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.

Tools & Environment: Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Flume, Spark, Impala, Cassandra, Pig, Hdfs, Scala, Spark RDD, Spark Sql, Kafka.

Client: Aflac, Columbus, GA

May 2015 – Feb 2017

BI / ETL Developer

Responsibilities:

Gathered business requirements with the business users, prioritized and translated the same into system / design requirements.

Created project artifacts such as business requirements documents, business rules spreadsheet, use case document, functional requirements specification, data mapping document, technical specification, and detailed design document.

Primarily involved in Data Migration using SQL, SQL Azure, Azure storage, and Azure Data Factory, SSIS, PowerShell.

Participated in all phases of SDLC methodologies throughout the project life cycle.

Wrote complex stored procedures, functions, and SQL constructs to implement business logics in the OLTP database.

Developed database objects such as tables, views, indexes, stored procedures, and constraints with T-SQL.

Implement Copy activity, Custom Azure Data Factory Pipeline Activities.

Evaluated execution plan for T-SQL queries and stored procedures to improve performance based on which created indexing/partitioning strategy with re-structuring of SQL logics.

Wrote dynamic SQL scripts with table variables and temporary tables to improve the code reuse factor as a part of application.

Designed complex ETL packages utilizing SSIS to extract data from on-premise to Azure pre-staging database.

Create C# applications to load data from Azure storage blob to Azure SQL, to load from web API to Azure SQL and scheduled web jobs for daily loads.

Parsed high-level design specification to simple ETL coding and mapping standards.

Enforced logging in SSIS packages to support troubleshooting and identify performance bottlenecks.

Utilized script task, FTP task and Execute package tasks in SSIS for different functionalities in ETL process.

Implemented techniques like grouping and interactive sorting to design user-friendly reports in SSRS.

Worked with the team and maintained project related document versions on TFS.

Tools & Environment: SQL Server, ETL, SSIS, SSRS, SSMS, SSDT, T-SQL, Erwin, DAX, TFS, DTA, MS Azure.

Contact this candidate