Manpreet Kaur
*************@*****.***
SUMMARY PROFESSIONAL:
Overall, 10+ years of technical IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
5+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop, AWS, Spring Boot, Spark integration with Cassandra, Avro, Soler and Zookeeper.
Demonstrated proficiency in Camunda BPM, with over 3 years of experience in the BPM space.
Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, and SQS.
Specialized in leveraging Camunda BPM to drive process automation and efficiency improvements.
Used Informatica Power center for (ETL) extraction transformation and loading data from heterogeneous source systems.
Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server. Worked on different file formats like delimited files, Avro, Jason and parquet. Docker container orchestration using ECS, ALB and lambda.
Extensive knowledge on QlikView Enterprise Management Console (QEMC), QlikView Publisher, QlikView Web Server.
Implemented a batch process to load the heavy volume data loading using Apache Dataflow framework using Nifi in Agile development methodology.
Successfully designed, developed, and maintained Camunda 8 applications to meet diverse business needs.
Worked as team JIRA administrator providing access, working assigned tickets, and teaming with project developers to test product requirements/bugs/new improvements.
Experienced in Pivotal Cloud Foundry (PCF) on Azure VM's to manage the containers created by PCF.
Hands on experience in test driven development (TDD), Behavior driven development (BDD) and acceptance test driven development (ATDD) approaches.
Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB), SQL Server, Oracle,Data Warehouse etc. Build multiple Data Lakes.
Confidential has worked on technical issues during installation of PRISM features that include Active Directory Single Sign - on, Application Server load balanced sites, application pools, Oracle Real Application Clusters (RAC), Oracle Data Guard and DBVisit High Availability (HA) database products, COM + and third party interfaces to the Windows IIS 7 Server.
Can work parallelly in both GCP and Azure Clouds coherently.
Strong technical knowledge with hands on experience in CICD using different DevOps toolset.
Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, PowerBI.
Worked extensively with ETL Testing including Data Completeness, Data Correctness, Data Transformation, Data Quality with analyzing feed files in Downstream scope (Balance, Transactions and Master data type of feeds)
Worked with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance. Also designed star schema in Big Query.
Extensive programming expertise in designing and developing web-based applications using Spring Boot, Spring MVC, Java servlets, JSP, JTS, JTA, JDBC and JNDI.
Experience in MVC and Microservices Architecture with Spring Boot and Docker, Swamp.
Expertise in Java programming and have a good understanding on OOPs, I/O, Collections, Exceptions Handling, Lambda Expressions, Annotations
Experience in Spring Frameworks like Spring Boot, Spring LDAP, Spring JDBC, Spring Data JPA, Spring Data REST
Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging.
Leveraged Spark as ETL tool for building data pipelines on various cloud platforms like AWS EMRs, Azure HD Insights and MapR CLDB architectures.
Technical Skills:
Big Data Ecosystem
HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Kafka,
Flume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR, IBM BPM v8.5.7
Machine Learning Classification Algorithms
Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor (KNN), Gradient Boosting Classifier, Extreme Gradient Boosting Classifier, Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayes Classifier, Extra Trees Classifier, Stochastic Gradient Descent, etc.
Cloud Technologies
AWS, Azure, Google cloud platform (GCP)
IDE’s
IntelliJ, Eclipse, Spyder, Jupyter
Ensemble and Stacking
Averaged Ensembles, Weighted Averaging, Base Learning, Meta Learning, Majority Voting, Stacked Ensemble, AutoML – Scikit-Learn, MLjar, etc.
Databases
Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE
Programming / Query Languages
Java, SQL, Python Programming (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), NoSQL, PySpark, PySpark SQL, SAS, R Programming (Caret, Glmnet, XGBoost, rpart, ggplot2, sqldf), RStudio, PL/SQL, Linux shell scripts, Scala.
Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools
Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, YARN, Hortonworks, Cloudera, Mahout, MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, NI-FI, GCP, Google Shell, Linux, Big Query, Bash Shell, Unix, Tableau, Power BI, SAS, We Intelligence, Crystal Reports, Dashboard Design, hadoop
WORK EXPERIENCE:
Commonwealth of Kentucky, KY August 2023 - Present
Data engineer
Responsibilities:
Performed data analysis and developed analytic solutions. Data investigation to discover correlations / trends and the ability to explain them.
Used Chef for configuration management of hosted Instances within GCP. Configuring and Networking of Virtual Private Cloud (VPC).
Developed frameworks and processes to analyze unstructured information. Assisted in Azure Power BI architecture design
Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
Designed and developed end-to-end Camunda applications, demonstrating a strong proficiency in workflow management and automation.
Data Stack typically includes - AWS, Snowflake, DynamoDB, S3, RDSs, AI & ML Data exploration, RPA-co-relations & causations, Spark SQL, SQLs, Data Modeling, Tableau, Excel
Designing and Developing Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
Successfully incorporated business rules and decision tables into Camunda workflows, ensuring efficient and logic-driven processes.
Performing data analysis, statistical analysis, generated reports, listings and graphs using SAS tools, SAS/Graph, SAS/SQL, SAS/Connect and SAS/Access.
Create and place Alteryx server on MS Azure VM and Hadoop.
Kept up to date with the latest versions and features of Camunda, particularly version 8, and effectively maintained existing applications while ensuring compatibility with future releases.
Tested the ETL Ab Initio mappings and other ETL Processes ( DW Testing ).
Analyzed and understood complex business problems, providing BPM-centric solutions through process digitization, automation, user interventions, and the creation of streamlined user experiences.
Involved in testing dashboards of QlikView 11.2 version to migrate it to Qlikview 12.1. Extensive experience with Extraction, Transformation, Loading (ETL) process using Ascential Data Stage EE/8.0/
Developed a python script to transfer data, REST API’s and extract data from on-premises to AWS S3. Implemented Micro Services based Cloud Architecture using Spring Boot.
Created yaml files for each data source and including glue table stack creation. Worked on a python script to extract data from Netezza databases and transfer it to AWS S3
Developed Lambda functions and assigned IAM roles to run python scripts along with various triggers (SQS, EventBridge, SNS)
Hands on experience in GCP services like EC2, S3, ELB, RDS, SQS, EBS, VPC, EBS, AMI, SNS, RDS, EBS, Cloud Watch, Cloud Trail, Cloud Formation GCP Config, Autoscaling, Cloud Front, IAM, R53.
Experienced in onboarding wide verity of technologies(java,.net, Nodejs etc) on to the CICD pipeline
Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab. Created a Lambda Deployment function, and configured it to receive events from S3 buckets
Built the machine learning model include: SVM, random forest, XGboost to score and identify the potential new business case with Python Scikit-learn.
Worked on migration project from TeamWorks 6.2 to IBM BPM 8.5.7 .
Overseeing teh migration of teh database from staging area to Data warehouse usingETL tool (Informatica)
Experience in Converting existing AWS Infrastructure to Server less architecture (AWS Lambda, Kinesis), deploying via Terraform and AWS Cloud Formation templates.
Worked on Docker containers snapshots, attaching to a running container, removing images, managing Directory structures and managing containers.
Involved in Optimizing existing QlikView dashboards with a focus on usability, performance, long-term, flexibility, and standardization.
Developed complex Talend ETL jobs to migrate the data from flat files to database. Developed Talend ESB services and deployed them on ESB servers on different instances.
Developed merge scripts to UPSERT data into Snowflake from an ETL source.
Wolters Kluwer, Chicago, Illinois December 2020 – July 2023
Data Engineer
Responsibilities:
Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap. Installing, configuring, and maintaining Data Pipelines
Responsible for creating CICD patterns for deploying various MQ objects as code
Responsible for creating CICD pattern for DB object migration via CICD
Developed the features, scenarios, step definitions for BDD (Behavior Driven Development) and TDD (Test Driven Development) using Cucumber, Gherkin, and ruby.
Integrated and maintained Camunda BPM solutions throughout the systems development lifecycle.
Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Files extracted from Hadoop.
Create ETL between different data warehousing such as snowflake & Redshift via Alteryx workflow
Successfully addressed complex business problems by providing BPM-centric solutions.
Developed interactive dashboards for Camunda Operate, enhancing monitoring and management capabilities.
Maintained JIRA team and program management review dashboards and maintained COP account and JIRA team sprint metrics reportable to customer and SAIC division management, and COP User Metrics.
Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL
Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI.
Used Apache Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
Experience in moving data between GCP and Azure using Azure Data Factory.
Set up Alteryx server & managed server/configurations
Applied various machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clustering to identify Volume using scikit-learn package in python, R, and Matlab. Collaborate with Data Engineers and Software Developers to develop experiments and deploy solutions to production.
Actively participated in the design, architecture and development of user interface objects in QlikView applications.
Created User manual on using Atlassian Products (Jira/Confluence) and trained end users project wise.
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
Involved in Unit Testing the code and provided the feedback to the developers. Performed Unit Testing of the application by using NUnit.
Developed ELT processes from the files from abinitio, google sheets in GCP with compute being dataprep, dataproc (pyspark) and Bigquery.
Migrated Database from SQL Databases (Oracle and SQL Server) to NO SQL Databases (Cassandra/MONGODB);
Studied the existing OLTP systems (3NF models) and created facts and dimensions in the data mart. Worked with different cloud - based data warehouse like SQL, Redshift.
Write research reports describing the experiment conducted, results, and findings and make strategic recommendations to technology, product, and senior management. Worked closely with regulatory delivery leads to ensure robustness in prop trading control frameworks using Hadoop, Python Jupiter Notebook, Hive and NoSql.
WellCare, Tampa, FL June 2017 – November 2020
Data Engineer
Responsibilities:
Gathered business requirements, definition and design of data sourcing, worked with the data warehouse architect on the development of logical data models.
Automated Diagnosis of Blood Loss during Emergencies and developed Machine Learning algorithm to diagnose blood loss.
Extensively used Agile methodology as the Organization Standard to implement the data Models. Used Micro service architecture with Spring Boot based services interacting through a combination of REST and Apache Kafka message brokers.
Created several types of data visualizations using Python and Tableau. Extracted Mega Data from AWS using SQL Queries to create reports.
Converted SAS code to python/spark-based jobs in cloud dataproc/big query in GCP.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery
Involved in creating teh front-end for ETL validation tool by Django framework.
Performed reverse engineering using Erwin to redefine entities, attributes, and relationships existing database.
Administer Alteryx Gallery Platforms & Data Connections
Alteryx Systems Settings, ODBC & DSN connections to different data sources
Developed predictive models using Decision Tree, Random Forest, and Naïve Bayes.
Involved in creating ETL validation script in Python.
Vast experience in identifying production bugs in the data using stack driver logs in GCP.
Experience in GCP Dataproc, GCS, Cloud functions, Cloud SQL & BigQuery.
Research on Reinforcement Learning and control (TensorFlow, Torch), and machine learning model (Scikit-learn).
Performed K-means clustering, Regression and Decision Trees in R. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
Implemented various statistical techniques to manipulate the data like missing data imputation, principal component analysis and sampling.
Responsible for design and development of Python programs/scripts to prepare transform and harmonize data sets in preparation for modeling.
Mayo Clinic, Rochester, NY April 2015 - May 2017
Java Developer
Developed front-end screens using JSP, HTML, CSS, JavaScript, JSON.
Developed SCM by using the JSP/HTML like one form for each functionality user interface, standard validations using the JavaScript, Servlets used as the controllers for the business logic and business logic using JDBC, XML parsing techniques etc. using MVC.
Developed Single Sign On (SSO) functionality, through which we can run SCM from Oracle Applications.
Developed Server-Side components for the business services for creating Items, BOM, Sourcing Rules, and substitute.
Involved in developing the Routings and configured Routing Program as scheduled the concurrent request.
Involved in raising the Notifications to Oracle Users through Mailing Concept for intimating to start the next process using workflow.
Used Cloud shell SDK in GCP to configure the services Data Proc, Storage, BigQuery.
Overseeing teh migration of teh database from staging area to Data warehouse usingETL tool (Informatica).
Extensively worked on creating the setups for Organizations, Templates, Concurrent Requests, Cross Reference Types, User Creations, assigning responsibilities, creating value sets, Descriptive Flex Fields etc., in Oracle Applications.
Used CVS as version control system.
Moved Data between bigquery and Azure Data Warehouse using ADF and created Cubes on AAS with lots of complex DAX language for memory optimization for reporting.