Data Engineer Analyst

Location:

Coppell, TX

Salary:

75000

Posted:

April 25, 2024

Contact this candidate

Resume:

SAI PATHIPATI

Data Analyst Data Engineer

SaiPathipati ************.****@*****.*** +1-937-***-**** SaiKumarPathipati.github

Profile summary:

Cloud Data Engineer (Microsoft Certified), 2+ years of IT Experience in building data pipelines in all phases (Design, development, and Testing).

Expertise in dimensional Modelling, Data Analysis, Data Integration (ETL/ELT using AWS & Azure data services), Data modelling.

Building ETL Data pipelines to cleanse, standardize, ingest, transform data and enrich the database using PySpark, Databricks, Azure cloud IAAS and PAAS services.

Developing and migrating Analytics, data warehouse, data stores along with third party system.

I have experience in Python, Rest API, Azure Data Factory, Databricks, Azure dev ops, Azure synapse analytics (SQL data warehouse), Hive and Azure Functions.

Involved in setting up AD, ACLS, service principles for security.

Used ETL/ELT to develop data pipelines for extracting, cleaning, transforming, and loading data into data warehouse using Pyspark.

Working experience in Agile Development methodology.

Develop business documentation, including procedures, work instructions and process flow diagrams.

Participate in requirement gathering sessions to determine business goals and areas for improvement.

Experience in using recursive CTEs, CTE, temp tables and effective DDL/DML Triggers to facilitate efficient data manipulation and data consistency as well as to support the existing applications.

Expertise in writing T-SQL Queries, Dynamic-queries, sub-queries, and complex joins for generating Complex Stored Procedures, Triggers, User-defined Functions, Views and Cursors.

Defined Data Archiving and Data retention strategies for mission critical databases.

Created the database objects - schemas, tables, indexes, views, user-defined functions, cursors, triggers, stored procedure, constraints, and roles.

Created and managed the clustered, non-clustered and column store indexes and optimized queries using execution plan.

Developing Spark code using Python and Scala to process and create dimension/fact tables.

Experience working with Databricks (Delta Lake, lake house, configuring clusters, creating user tokens, mounting data lakes and maintaining the data bricks user admin console)

Experience in optimizing spark jobs, complex SQL queries.

Technical Skills:

Big Data Technologies: Hadoop, MapReduce, HDFS

Programming Languages: Python (Pandas, NumPy, Matplotlib, Seaborn, scikit Learn), R Language (tidyr, caret, rpart, glmnet), SQL.

Cloud Services: ADLS, Data Factory, Azure Dev ops, Databricks, Azure SQL DB, Managed Instance, Azure synapse analytics and SQL Server.

Databases: Oracle 11g, SQL Server, Cosmos DB and Redis.

Visualization Tools: Tableau, MS Excel.

Version Control: Git, GitHub.

Certifications:

Microsoft Azure Data Engineer(DP-203)

EDUCATION:

University Of North Texas, Denton, Tx Aug 2022- Dec 2023

Master in Advance Data Analytics

Concentration: Data Visualization, EDA Analysis, Machine Learning

GPA: 3.5/4

Jawaharlal Nehru institute of Technology, Andhra Pradesh, India June2017- July2021

Bachelor of Technology in Mechanical Engineering

GPA: 7.6/10

Professional Experience:

Company: NextZen Technologies, Bengaluru, Karnataka, India Aug-2020- July2022 Junior Data Analyst

Roles and Responsibilities:

Executed complicated SQL queries for data retrieval and management, assuring data accuracy and accessibility. Through skilled database management and optimization, I played a critical part in in-depth research.

Using intuitive visualization methodologies, I created dynamic dashboards and reports in Tableau. Findings were effectively communicated to stakeholders, resulting in improved decision-making processes.

Utilized Azure technology for cloud-based data processing, utilizing Cloud infrastructure for scalable and effective data analysis. Expertise in harnessing cloud resources for sophisticated analytics.

Azure Data Factory was used to manage ETL methods and orchestrate workflows in the Azure cloud environment. Ensured seamless data integration and transformation, helping to efficient data processing.

Proven ability in exploratory data analysis, data segmentation, cleaning, and model selection. Accurate accuracy evaluations were performed, giving to robust data-driven insights.

Skillfully manipulated and analyzed data using Excel, employing advanced functions for comprehensive processing. Generated detailed reports, providing valuable insights.

PROJECTS:

Crime Analysis – Capstone project Experience Nov 2023

A thorough examination of underlying causes from 2017 to 2023 finds that inflation, mental illness, narcotics usage, animosity, and fury account for 65% of criminal activity. Python scripting, which accounts for 20% of the analytical process, is useful for data extraction and processing.

Over the course of five and a half years, 80% of the dataset has formed a solid foundation for crime prevention tactics, resource allocation, and community safety enhancement. Python scripting (15%) enables effective data manipulation, while Tableau (15%) gives dynamic visualizations that provide significant insights.

Goal-oriented tactics with a 75% influence on total crime rates are executed utilizing Python programming (30%), enabling data-driven decision-making. This comprehensive analysis, powered by Python and Tableau, makes a substantial contribution to informed interventions and effective resource allocation.

Linear regression analysis of car price using R-Language Nov 2023

Managing data preparation for a dataset of 16,898 entries and 19 columns, I orchestrated a full car price analysis, assuring optimal preparedness for subsequent analysis.

Using advanced data cleaning procedures and strong data visualization tools like ggplot, dplyr, histograms, box plots, and scatter plots, we were able to gain a better knowledge of the dataset and discover significant insights.

Implemented complex regression models, such as Support Vector Machines (SVM) and decision trees, yielding impressive results, including a prediction RMSE of 11,413 and an R2 value of 0.0778 for SVM and an R2 value of 0.0778 for decision trees. Significantly improved data-driven decision-making procedures in the automobile sector.

Optimization of Convolutional Neural Networks (CNNs) for the CIFAR-10 Dataset Using TensorFlow May2023

On the CIFAR-10 dataset, we used a Convolutional Neural Network (CNN) with ReLU activation layers and advanced techniques like SoftMax for multi-class classification.

Trial and error were used to fine-tune hyperparameters, which included adding three convolution layers and tweaking parameters. Despite model changes, we were able to achieve consistent and virtually comparable results, resulting in a final accuracy of close to 67%.

The impact of input picture resolution on testing accuracy was investigated, recognizing the requirement for approaches such as L2 regularization. Recognized areas for code change to improve accuracy, while recognizing the model's potential for additional refinement.

Contact this candidate