Santhosh Kumar Koyalkar Mob: +1-470-***-****
adwpzj@r.postjobfree.com
Career Summary
12+ years of overall IT experience, working extensively in Data Analyst/Data engineering roles.
Worked on various Prediction analyses, Forecasting, Statistical Analysis, Machine learning, EDA, data mining, NLP, DEEP LEARNING, Datawarehouse&Reporting projects.
Worked as Data Analyst/Data Engineer with hands-on experience with various technologies like Azure Data Factory Azure Synapse, AzureML, CI/CD, MLOps
DataBricks,ControlM,SparkSql,Pyspark,BigData,Hive,Python,R,OBIEE,Tableau,Scala
Azure Cloud, Bigdata Hadoop, Hive. Managing Database, Azure Data Platform services (Azure Data Lake(ADLs), Azure Databricks), Oracle, Green Plum, Informix Database, Data Preparation, Exploratory Data Analysis, Business Intelligence(BI), Predicmodelinglling, Statistics, Machine Learning, Oracle PL/SQL, ETL & ELT tools.
pandas,numpy,skeran,matplotlib,tensorflow,keras,scipy,seaborn,evalml,AutoMLSearch,pytesseract
Build the AI, ML algorithms and designed by using programming languages like Python, R.
Having good knowledge and experience in Decision Trees with Boosting techniques, Regression, logistic Regression, Ridge, Lassco.
Processed a huge amount of structured/semi-structured/unstructured data by using HDFS, HIVE, Spark, SQL and data frames
Experience in extracting parquet,JSON,CSV file from AWS s3 by using Snowflake.
E-commerce application development and commerce data analysis ATG E-Commerce
Experience in Developing Spark applications using Spark, python, and SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.
Good technical experience in business intelligence (OBIEE, Tableau Reports, Dashboard), and data warehousing (Informatica, ODI, Scheduling tools Autosys and Control-M).
Good experience in dimensional, snowflake and data models according to the requirement for reporting prospects.
Implemented best practices to ingest data from HDFS to RDBMS by using spark SQL
Processed a huge amount of structured/semi-structured/unstructured data.
Familiar with DBMS & RDBMS concepts and Normalization and de-normalization concepts.
Data processing is done through the spark SQL/python on data frames.
Familiar with DBMS & RDBMS concepts and Normalization and de-normalization concepts.
Having good analytical and programming capabilities.
Willingness and ability to learn new environments and new technologies.
Work Experience:
Working as a Software Architect for Smartsoft International Inc, from Feb 2022 to Till date.
Worked as an Associate Technical Architect for TechMahindra, from June 2018 to Feb 2022, Hyderabad.
Worked as a Sr. developer for Sky Bridge Solutions, Hyderabad, from May 2016 to May 2017
Worked as a Sr. Consultant for Cap Gemini India Pvt. Ltd, Mumbai, from Nov 2009 to May 2016
Education:
Master of Computer Applications (M.C.A) from Osmania University in 2005
Certifications:
Certified AZURE DATA ENGINEER
Certified OBIEE from Oracle
Certified in ITIL
Professional Scrum Master
Certified in Data mining from IIT Roorkee.
Certified Scrum Master
TECHNICAL SKILLS
Bigdata : Hadoop,Hive,Pyspark
IDE : Intellij,spyder,Visual Studio,DATABRICKS
Cloud Technology : Azure DataFactory,Az DataLake,DataBricks,AzSynap,AWS
Sripting languages : Pyspark,python,Scala,
Modeling language & Tools : R/R Studio, MINITAB, Excel Analytics, python
Data Visualization Tools : Tableau 7.0, 8.2
ETL : Informatica 9.1
E-LT : ODI 11.1.5
RDBMS : Oracle 11.2.0.3, 10G,Greenplum,SQLserver
Reporting : OBIEE 11G
Job Scheduling Tools : DAC, Autosys,Control M
Script : Bash script.
Database Tools : TOAD, SQL Developer,TeraDats Sql Assistence,SQL
DataAnalytics : Altrix
SMARTSOFT INTERNATIONAL INC FEB 2022-TILL DATE
Company : Smartsoft International Inc.
Role : DataAnalyst & Data Engineer(Lead)
Duration : Feb 2022-Tilldate
Client : T-Mobile(Collection Domine)
Technologies : Databricks,Scala,Pyspark,Python,Azure ADF,GIT
Roles&Resposibiliities:
Process the Vendor file delimited and fixed-length files by using pyspark
Develop the logic based on the business requirement for derived columns.
Develop the UDF according to the business logic and apply it to the Dataframe columns.
Pre-process the raw data for data analytics
Identifying cropped and junk data in raw files and loading it into error tables
Implementing exception handling in the data processing.
Implementing PII and File Encryption and Decryption(PGP, GPG) throughout the process.
Creating ADF pipelines for loading and file process.
Develop the derived column logic by using SQL In the scope of file processing.
Create event base Triggers in ADF for file process.
Modifying and configuring GIT .yml file for CI/CD for adding different environments
Extract Parquet,json,csv files from AWS s3 by using the snowflake
Bulk Loading data from the external stage (AWS S3)
Load data from the internal stage to snowflake using the snowflake Copy command
Data validations have been done through snowflake information_schema
Loading data into snowflake tables from the internal stage using snowsql
Used SNOWPIPE for continuous data ingestion from the S3 bucket
Created clone objects to maintain Zero-copy cloning.
Created STREAMS to track the data changes in the tables for data validations.
Extracting various file formatted data (parquet,JSON,csv) by using the snowflake
Creation of snow pipes, Tasks, and Streams by using the snowflake
Creating aggregation materialized view by using the snowflake.
Creation of data loading process by using snowflake tasks (sequential and Parallel)
Data cleaning, cleansing, joining, trimming,concat, querying, error handling, formatting, and conditional operations
Applying statistical analysis on the data central tendency (Mean, Median, Mode, ANOVA, Distributions, and Dispersions.
Performance turning.
Conversion of JSON data to Dataframe for analysis.
Incorporate Rules Maintenance Application (RMA) output with processed data for analysis
Build reports on collection trends, delinquency, trends, time se,ries and, treatments
Experience Migrating SQL database, SAP HANA, SQLServer, and Teradata to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On-premise database and to Azure Data lake store using Azure Data factory.
Involved in the design of Data-warehouse using Star-Schema methodology and converted data from various sources to SQL tables.
Experience in Developing Spark applications using Spark, python, SQL in Databricks for data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into customer usage patterns.
Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames,
Experience in implementing hybrid connectivity between Azure and on-premise via Integration Runtime.
Database and AZURES SQL data warehouse environment
Develop dashboards and visualizations to help business users analyze data as well as provide data insight to upper management
Engage with business users to gather requirements, design visualizations, and provide training to use self-service BI tools.
Understanding the Business Process and customer requirements.
Processed a huge amount of structured/semi-structured data.
Data processing is done through the spark/scala/python on data frames.
Analyzed data and final output Data frames stored in DBFS, Database and Deltalake tables.
Process the complex/nested Json converted to structured Data frames.
Apply Transformation rules on the top of Data Frames.
Validate and debug the script between source and destination.
Validate the source and final output data.
Create workflows to fetch data from different sources by using Alteryx and Schedule jobs.
Data transfermation by using Alterys.
data preparation, data blending, and the creation of data models/sets using Alteryx
Responsible for designing, building, and testing several complex workflows in Alteryx
TECHMAHINDRA JUNE 2018-FEB 2022
Company : TechMahindra
Role : Data Scientist & Data Engineer
Clients : GE(Aviation,Renevables,Power),BNSF
Duration : June 2018-Feb 2022
Business Solution : Classification, text mining, Forecasting, EDA, statistical tests.
M.L.Algorithms : Decision trees, random forest, logistic regression, Time Series.
Technologies : BigData Hadoop System, Hive, Python, R Languages,Azure,AWS
Roles & Resposibilities:
Clinical data analysis and clinical data extraction.
Clinical trial data treatment effects(ANOVA) for health care.
Create S3 buckets in AWS
Develop the python function,UDF which was implemented in AWS LAMBDA
Data definition, Collecting the data as per data definition, Data Cleansing
Analyzing the data using the required software (Excel/R/Minitab/python) by applying the required mathematics and statistics
Build the model, Pilot the model, Present the pilot model results to stakeholders
Deploy the model, maintain the model
Perform exploratory and targeted data analyses using descriptive statistics.
Identifying trends, patterns, and outliers in data.
Analyzing the data using the required software (Excel/R/Minitab etc.) by applying required mathematics and statistics.
As there were so many variables we do start with feature selection and then check certain measures to evaluate the model accuracy.
Building the model where Variance-Bias trade-off is taken care of results in fewer errors with high accuracy.
Developing a one-step transition matrix and probability distributions for corresponding performance ratings.
Make business recommendations (e.g. cost-benefit, forecasting, and experiment analysis) with effective presentations of findings at multiple levels of stakeholders through visual displays of quantitative information.
Involved in the design of Data-warehouse using Star-Schema methodology and converted data from various sources to SQL tables.
Experience in Developing Spark applications using Spark, python, SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Good understanding of Spark Architecture including Spark Core, Spark SQL, Data Frames, Spark Streaming, Driver Node, Worker Node, Stages, Executors and Tasks.
Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons Experience in implementing hybrid connectivity between Azure and on-premise via Integration Runtime.
Database and AZURES SQL data warehouse environment
Develop dashboards and visualizations to help business users analyze data as well as provide data insight to upper management
SKYBRIDGE SOLUTIONS (DATAVAIL INFOTECH PVT. LTD) MAY 2016 - MAY 2017
Role : Sr.Software Developer
Client :Boarshead
Company : Skybridge solutions Pvt Ltd
Duration : May 2016-May 2017
Roles & Responsibilities:
Production support and job monitoring.
Query-based OBIEE reports generation.
Create New Business Models in OBIEE ETL as Informatica.
Build new Subject Areas, Hierarchies, Dimensions, and Fact Tables.
Creating OBIEE Dashboard, and Dashboard pages.
Troubleshooting an issue.
IBOT Scheduling for reports.
Validation of Oracle EBS workflows.
Customization of reports adding Columns, Filters, and prompts.
Delivering Reports in Various Phases.
Technical and configuration documents during the BI Apps implementation.
Project tracker up to date with Report Phases Delivery Dates and changes of Plans.
Scheduling daily standup calls.
Carrying out functional testing of the product.
Planning and preparing GAP Analysis Document between requirements and out-of-box analytics functionality.
Worked closely with stakeholders and testing of the reports.
CAPGEMINI INDIA PVT LTD NOV 2009-MAY 2016
Role : Sr.Consultant (Lead Consultant)
Company : Capgemini India Pvt Ltd
Duration : Nov 2009-May 2016
Responsibilities:
Involved in the Informatica mapping developments.
Developed autosys execution plan like the creation of autosys boxes and associate
dependencies
Created bash scripts for making process automation
Created/Modified Metadata repository (.rpd) by using Administration Tool: Physical, Business
Model, Mapping, and Presentation Layers.
Created connection pools, imported physical tables, and then created Alias tables, defined joins,
and implemented authorizations in the physical layer of the repository.
Created Dimensional Hierarchies for the Drill Down functionality and implemented Business
logic on metrics.
Developed Calculated Measures and Dimension Hierarchies and Time Series Functions in the BMM Layer.
Involved in Unit Testing and created reports for testing.
Created Drill-down and Sub-Report. Monitored the performance of the reports in
Tableau/Obiee
Generated Dashboards with Quick filters, Parameters, and sets to handle views more efficiently
by using Tableau.
Generated context filters and data source filters while handling a huge volume of data for
Tableau Reports.
Built dashboards for measures with the forecast, trend line, and reference lines by using
Tableau.
Published Workbooks by creating user filters so that only appropriate teams can view it
Visualization-Reporting:
Created Drill-down and Sub-Report. Monitored the performance of the reports in Tableau
Generated Dashboards with Quick filters, Parameters, and sets to handle views more efficiently by using Tableau.
Generated context filters and data source filters while handling a huge volume of data for Tableau Reports.
Built dashboards for measures with the forecast, trend line, and reference lines by using Tableau.
Published Workbooks by creating user filters so that only appropriate teams can view it
Support Responsibilities
Enhancements/customizing/supporting Oracle BI RPD.
Support for exiting 60 Subject areas.
Production Follow-up: Monitored Production system's background jobs and troubleshot any issues related to the problem and fixing on Adhoc
Data Quality Follow-up: Analyzed error data in the report and troubleshoot any issues related process.
Performance Follow-up: Analyzed System logs, Database logs, Table locks and system performance.
Enhancements/customizing/supporting to exiting Informatica mapping design.
Enhancements/customizing/supporting to exiting bash scripts.
Enhancements/customizing/supporting to exiting Autosys design.
Handling the ticket according to priority(Low, high, showstopper)
Release management activities
Follow the ITIL methodology and AL
DAC Configuration and DAC loading
Reporting Customization
Involved in Fixing the Complex and Priority Issues.
Prepared technical Document