Data Analyst Engineer

Location:

Dunwoody, GA

Salary:

Posted:

April 24, 2023

Contact this candidate

Resume:

Santhosh Kumar Koyalkar Mob: +1-470-***-****

*********.***@*****.***

Career Summary

12+ years of overall IT experience, working extensively in Data Analyst/Data engineering roles.

Worked on various Prediction analyses, Forecasting, Statistical Analysis, Machine learning, EDA, data mining, NLP, DEEP LEARNING, Datawarehouse&Reporting projects.

Worked as Data Analyst/Data Engineer with hands-on experience with various technologies like Azure Data Factory Azure Synapse, AzureML, CI/CD, MLOps

DataBricks,ControlM,SparkSql,Pyspark,BigData,Hive,Python,R,OBIEE,Tableau,Scala

Azure Cloud, Bigdata Hadoop, Hive. Managing Database, Azure Data Platform services (Azure Data Lake(ADLs), Azure Databricks), Oracle, Green Plum, Informix Database, Data Preparation, Exploratory Data Analysis, Business Intelligence(BI), Predicmodelinglling, Statistics, Machine Learning, Oracle PL/SQL, ETL & ELT tools.

pandas,numpy,skeran,matplotlib,tensorflow,keras,scipy,seaborn,evalml,AutoMLSearch,pytesseract

Build the AI, ML algorithms and designed by using programming languages like Python, R.

Having good knowledge and experience in Decision Trees with Boosting techniques, Regression, logistic Regression, Ridge, Lassco.

Processed a huge amount of structured/semi-structured/unstructured data by using HDFS, HIVE, Spark, SQL and data frames

Experience in extracting parquet,JSON,CSV file from AWS s3 by using Snowflake.

E-commerce application development and commerce data analysis ATG E-Commerce

Experience in Developing Spark applications using Spark, python, and SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.

Good technical experience in business intelligence (OBIEE, Tableau Reports, Dashboard), and data warehousing (Informatica, ODI, Scheduling tools Autosys and Control-M).

Good experience in dimensional, snowflake and data models according to the requirement for reporting prospects.

Implemented best practices to ingest data from HDFS to RDBMS by using spark SQL

Processed a huge amount of structured/semi-structured/unstructured data.

Familiar with DBMS & RDBMS concepts and Normalization and de-normalization concepts.

Data processing is done through the spark SQL/python on data frames.

Familiar with DBMS & RDBMS concepts and Normalization and de-normalization concepts.

Having good analytical and programming capabilities.

Willingness and ability to learn new environments and new technologies.

Work Experience:

Working as a Software Architect for Smartsoft International Inc, from Feb 2022 to Till date.

Worked as an Associate Technical Architect for TechMahindra, from June 2018 to Feb 2022, Hyderabad.

Worked as a Sr. developer for Sky Bridge Solutions, Hyderabad, from May 2016 to May 2017

Worked as a Sr. Consultant for Cap Gemini India Pvt. Ltd, Mumbai, from Nov 2009 to May 2016

Education:

Master of Computer Applications (M.C.A) from Osmania University in 2005

Certifications:

Certified AZURE DATA ENGINEER

Certified OBIEE from Oracle

Certified in ITIL

Professional Scrum Master

Certified in Data mining from IIT Roorkee.

Certified Scrum Master

TECHNICAL SKILLS

Bigdata : Hadoop,Hive,Pyspark

IDE : Intellij,spyder,Visual Studio,DATABRICKS

Cloud Technology : Azure DataFactory,Az DataLake,DataBricks,AzSynap,AWS

Sripting languages : Pyspark,python,Scala,

Modeling language & Tools : R/R Studio, MINITAB, Excel Analytics, python

Data Visualization Tools : Tableau 7.0, 8.2

ETL : Informatica 9.1

E-LT : ODI 11.1.5

RDBMS : Oracle 11.2.0.3, 10G,Greenplum,SQLserver

Reporting : OBIEE 11G

Job Scheduling Tools : DAC, Autosys,Control M

Script : Bash script.

Database Tools : TOAD, SQL Developer,TeraDats Sql Assistence,SQL

DataAnalytics : Altrix

SMARTSOFT INTERNATIONAL INC FEB 2022-TILL DATE

Company : Smartsoft International Inc.

Role : DataAnalyst & Data Engineer(Lead)

Duration : Feb 2022-Tilldate

Client : T-Mobile(Collection Domine)

Technologies : Databricks,Scala,Pyspark,Python,Azure ADF,GIT

Roles&Resposibiliities:

Process the Vendor file delimited and fixed-length files by using pyspark

Develop the logic based on the business requirement for derived columns.

Develop the UDF according to the business logic and apply it to the Dataframe columns.

Pre-process the raw data for data analytics

Identifying cropped and junk data in raw files and loading it into error tables

Implementing exception handling in the data processing.

Implementing PII and File Encryption and Decryption(PGP, GPG) throughout the process.

Creating ADF pipelines for loading and file process.

Develop the derived column logic by using SQL In the scope of file processing.

Create event base Triggers in ADF for file process.

Modifying and configuring GIT .yml file for CI/CD for adding different environments

Extract Parquet,json,csv files from AWS s3 by using the snowflake

Bulk Loading data from the external stage (AWS S3)

Load data from the internal stage to snowflake using the snowflake Copy command

Data validations have been done through snowflake information_schema

Loading data into snowflake tables from the internal stage using snowsql

Used SNOWPIPE for continuous data ingestion from the S3 bucket

Created clone objects to maintain Zero-copy cloning.

Created STREAMS to track the data changes in the tables for data validations.

Extracting various file formatted data (parquet,JSON,csv) by using the snowflake

Creation of snow pipes, Tasks, and Streams by using the snowflake

Creating aggregation materialized view by using the snowflake.

Creation of data loading process by using snowflake tasks (sequential and Parallel)

Data cleaning, cleansing, joining, trimming,concat, querying, error handling, formatting, and conditional operations

Applying statistical analysis on the data central tendency (Mean, Median, Mode, ANOVA, Distributions, and Dispersions.

Performance turning.

Conversion of JSON data to Dataframe for analysis.

Incorporate Rules Maintenance Application (RMA) output with processed data for analysis

Build reports on collection trends, delinquency, trends, time se,ries and, treatments

Experience Migrating SQL database, SAP HANA, SQLServer, and Teradata to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On-premise database and to Azure Data lake store using Azure Data factory.

Involved in the design of Data-warehouse using Star-Schema methodology and converted data from various sources to SQL tables.

Experience in Developing Spark applications using Spark, python, SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.

Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames,

Experience in implementing hybrid connectivity between Azure and on-premise via Integration Runtime.

Database and AZURES SQL data warehouse environment

Develop dashboards and visualizations to help business users analyze data as well as provide data insight to upper management

Engage with business users to gather requirements, design visualizations, and provide training to use self-service BI tools.

Understanding the Business Process and customer requirements.

Processed a huge amount of structured/semi-structured data.

Data processing is done through the spark/scala/python on data frames.

Analyzed data and final output Data frames stored in DBFS, Database and Deltalake tables.

Process the complex/nested Json converted to structured Data frames.

Apply Transformation rules on the top of Data Frames.

Validate and debug the script between source and destination.

Validate the source and final output data.

Create workflows to fetch data from different sources by using Alteryx and Schedule jobs.

Data transfermation by using Alterys.

data preparation, data blending, and the creation of data models/sets using Alteryx

Responsible for designing, building, and testing several complex workflows in Alteryx

TECHMAHINDRA JUNE 2018-FEB 2022

Company : TechMahindra

Role : Data Scientist & Data Engineer

Clients : GE(Aviation,Renevables,Power),BNSF

Duration : June 2018-Feb 2022

Business Solution : Classification, text mining, Forecasting, EDA, statistical tests.

M.L.Algorithms : Decision trees, random forest, logistic regression, Time Series.

Technologies : BigData Hadoop System, Hive, Python, R Languages,Azure,AWS

Roles & Resposibilities:

Clinical data analysis and clinical data extraction.

Clinical trial data treatment effects(ANOVA) for health care.

Create S3 buckets in AWS

Develop the python function,UDF which was implemented in AWS LAMBDA

Data definition, Collecting the data as per data definition, Data Cleansing

Analyzing the data using the required software (Excel/R/Minitab/python) by applying the required mathematics and statistics

Build the model, Pilot the model, Present the pilot model results to stakeholders

Deploy the model, maintain the model

Perform exploratory and targeted data analyses using descriptive statistics.

Identifying trends, patterns, and outliers in data.

Analyzing the data using the required software (Excel/R/Minitab etc.) by applying required mathematics and statistics.

As there were so many variables we do start with feature selection and then check certain measures to evaluate the model accuracy.

Building the model where Variance-Bias trade-off is taken care of results in fewer errors with high accuracy.

Developing a one-step transition matrix and probability distributions for corresponding performance ratings.

Make business recommendations (e.g. cost-benefit, forecasting, and experiment analysis) with effective presentations of findings at multiple levels of stakeholders through visual displays of quantitative information.

Involved in the design of Data-warehouse using Star-Schema methodology and converted data from various sources to SQL tables.

Experience in Developing Spark applications using Spark, python, SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.

Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons Experience in implementing hybrid connectivity between Azure and on-premise via Integration Runtime.

Database and AZURES SQL data warehouse environment

Develop dashboards and visualizations to help business users analyze data as well as provide data insight to upper management

SKYBRIDGE SOLUTIONS (DATAVAIL INFOTECH PVT. LTD) MAY 2016 - MAY 2017

Role : Sr.Software Developer

Client :Boarshead

Company : Skybridge solutions Pvt Ltd

Duration : May 2016-May 2017

Roles & Responsibilities:

Production support and job monitoring.

Query-based OBIEE reports generation.

Create New Business Models in OBIEE ETL as Informatica.

Build new Subject Areas, Hierarchies, Dimensions, and Fact Tables.

Creating OBIEE Dashboard, and Dashboard pages.

Troubleshooting an issue.

IBOT Scheduling for reports.

Validation of Oracle EBS workflows.

Customization of reports adding Columns, Filters, and prompts.

Delivering Reports in Various Phases.

Technical and configuration documents during the BI Apps implementation.

Project tracker up to date with Report Phases Delivery Dates and changes of Plans.

Scheduling daily standup calls.

Carrying out functional testing of the product.

Planning and preparing GAP Analysis Document between requirements and out-of-box analytics functionality.

Worked closely with stakeholders and testing of the reports.

CAPGEMINI INDIA PVT LTD NOV 2009-MAY 2016

Role : Sr.Consultant (Lead Consultant)

Company : Capgemini India Pvt Ltd

Duration : Nov 2009-May 2016

Responsibilities:

Involved in the Informatica mapping developments.

Developed autosys execution plan like the creation of autosys boxes and associate

dependencies

Created bash scripts for making process automation

Created/Modified Metadata repository (.rpd) by using Administration Tool: Physical, Business

Model, Mapping, and Presentation Layers.

Created connection pools, imported physical tables, and then created Alias tables, defined joins,

and implemented authorizations in the physical layer of the repository.

Created Dimensional Hierarchies for the Drill Down functionality and implemented Business

logic on metrics.

Developed Calculated Measures and Dimension Hierarchies and Time Series Functions in the BMM Layer.

Involved in Unit Testing and created reports for testing.

Created Drill-down and Sub-Report. Monitored the performance of the reports in

Tableau/Obiee

Generated Dashboards with Quick filters, Parameters, and sets to handle views more efficiently

by using Tableau.

Generated context filters and data source filters while handling a huge volume of data for

Tableau Reports.

Built dashboards for measures with the forecast, trend line, and reference lines by using

Tableau.

Published Workbooks by creating user filters so that only appropriate teams can view it

Visualization-Reporting:

Created Drill-down and Sub-Report. Monitored the performance of the reports in Tableau

Generated Dashboards with Quick filters, Parameters, and sets to handle views more efficiently by using Tableau.

Generated context filters and data source filters while handling a huge volume of data for Tableau Reports.

Built dashboards for measures with the forecast, trend line, and reference lines by using Tableau.

Published Workbooks by creating user filters so that only appropriate teams can view it

Support Responsibilities

Enhancements/customizing/supporting Oracle BI RPD.

Support for exiting 60 Subject areas.

Production Follow-up: Monitored Production system's background jobs and troubleshot any issues related to the problem and fixing on Adhoc

Data Quality Follow-up: Analyzed error data in the report and troubleshoot any issues related process.

Performance Follow-up: Analyzed System logs, Database logs, Table locks and system performance.

Enhancements/customizing/supporting to exiting Informatica mapping design.

Enhancements/customizing/supporting to exiting bash scripts.

Enhancements/customizing/supporting to exiting Autosys design.

Handling the ticket according to priority(Low, high, showstopper)

Release management activities

Follow the ITIL methodology and AL

DAC Configuration and DAC loading

Reporting Customization

Involved in Fixing the Complex and Priority Issues.

Prepared technical Document

Contact this candidate