Resume

Data Analyst Engineer

Location:

North Brunswick, NJ

Posted:

August 26, 2023

Contact this candidate

Resume:

Deepthi N

ady8qv@r.postjobfree.com 716-***-**** linkedin.com/in/deepthi-yadav-85672a283

PROFESSIONAL SUMMARY:

• 10+ years of experience as a data engineer, data analyst, finance predictive model building, data visualization and statistical analysis

• Experience in building intuitive products and experiences, while working alongside an excellent, cross - functional team across Engineering, Product and Design

• Hands on experience in writing queries in SQL and Python to extract, transform and load (ETL) data from large datasets using Data Staging

• Strong experience with python and its libraries Pandas, NumPy, Sci-Kit learn, Seaborn, Matplotlib and R for algorithm development, data manipulation, analysis and visualization.

• Worked in Data Extraction, Transforming and Loading (ETL) using various tools such as SQL Server Integration Services (SSIS), Reporting Services (SSRS) and Analysis Services (SSAS), Pipeline Pilot

• Expert in transforming business requirements into analytical models and designing algorithms.

• Proficient in developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data to improve business performance in every aspect.

• Experience in designing and developing SSRS reports using data from ETL Loads, SSAS

• Expert in Data Visualization, Data Analysis, Business Intelligence, IT Analysis/Design, and Client/Server System Architecture

• Knowledge in developing and designing analytical reports and dashboards using guided analysis, interactive dashboard design and visual best practices.

• Worked on Long Short-Term Memory (LSTM) using Keras for auto speech recognition and anomaly detection.

• Used Sentiment Analysis to determine the emotional tone behind the series of words and gain the express of the attitudes to analyse the market of a product, customer service, fraudulent activities

• Proficient mathematical knowledge on Linear Algebra, Probability, Statistics, Stochastic Theory, Information Theory, and logarithms

• I have knowledge in data analysis skills including Data Mapping, Data Cleansing, logical data modelling, writing data extraction scripts and querying to report the progress of migration of jobs.

• Proficient in writing complex SQL queries like stored procedures, triggers, joints and subqueries to access and manipulate database systems like MySQL, PostgreSQL, NoSQL

• Experience in performing analytics on structured data in hive with Hive queries, Views, Partitioning, Bucketing and UDF’s using HiveQL.

• Knowledge in extracting data from multiple sources such as Excel, Access, Flat files, SQL Server to design and development dashboards using TIBCO application.

• Experience in using Alteryx data tool for data cleaning, data blending and to create more efficient data sets for reporting purposes.

• Experience in using Tableau for data visualization and designing dashboards for publishing and presenting storyline on web and desktop platforms.

• Proficient in the entire project life cycle and actively involved in all the phases including data acquisition, cleaning, engineering, feature scaling, feature engineering, statistical modelling, and visualization.

• Experienced working with AWS & GCP - using tools such as EMR, S3, EC2, Sage Maker. TECHNICAL SKILLS:

Analysis Tools: Tableau, Power BI, MS SSRS, MS SSAS, MS Excel (VBA, VLOOKUP, Pivot tables)

Languages: Python, R, SQL, Java, C, C++

Programming Languages Java, Python, Scala

Database/Analytics: MySQL, PostgreSQL, NoSQL, DynamoDB, Aurora, MongoDB, Cassandra, Hive, Teradata, Vertica, Hadoop Streaming, MapReduce, SPSS, SAS, Weka, SSIS

Technical: Amazon Web Services (AWS), Spark, GIT, Jenkins, Agile RDBMS Oracle, MySQL, MS SQL

Programming languages Unix Shell Scripting, SQL, PL/SQL, Perl, Python, T-SQL AWS Cloud AWS EC2, AWS S3, Redshift, Athena, Glue, Cloud watch, Amazon RDS, AWS Transfer family, EMR, Amazon, Lambda, AWS Data pipeline Testing: MS Office, Eclipse, MATLAB, Visual Studio, MS Excel advance, MS Access, MS Visio, Toad Oracle

Machine Learning Algorithms: Linear Regression, Logistic Regression, Linear Discrimination Analysis

(LDA), Decision Trees, Random Forests with Adaboost and Gradient Descent Boosting, Naïve Bayes, K - Nearest Neighbor, Hierarchical clustering, K-means clustering, Density based clustering (DBSCAN) PROFESSIONAL EXPERIENCE:

ICBCFS, Princeton, NJ Mar 2021 to Present

Sr Data Engineer/Data Analyst

Responsibilities:

• Responsible for understanding and analyzing business requirements to develop and debug applications using BI tools

• Interact with the users and business analyst to assess the requirements and perform impact analysis.

• Responsible for managing data structures and pipelines using SQL/snowflake.

• Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis

• Worked on SQL Queries, PL/SQL procedures and convert them to ETL tasks

• Retrieved, manipulated, analyzed, aggregated and performed ETL through billions of records of claim data from databases and Hadoop cluster using PL/SQL

• Created dashboards in Tableau to determine profit generated, customer sentiment such as call volumes, case volumes for every quarter to be presented

• Responsible for design, development and testing Hadoop ETL using Hive on data at different stages of pipeline

• Knowledge in using analytical features in Tableau like Statistical function and calculations, trend lines, forecasting in tableau reports.

• Used Matplotlib, Seaborn in Python to visualize the data and performed featuring engineering such as detecting outliers, missing value and interpreting variables.

• Experience with Python Libraries Pandas, NumPy, Seaborn, Matplotlib, NLTK, Spacy, Scikit - learn, Kera’s and TensorFlow in developing end to end Analytical models.

• Created SSIS packages to load data from Oracle to SQL Server using various transformations in SSIS

• Responsible for manipulating data set and building models for document classification and OCR using Python and SQL

• Wrote SQL queries to identify and validate data inconsistencies in data warehouse against source system

• Created reports from OLAP, sub reports, bar charts and matrix reports using SSRS

• Worked on different data formats such as JSON, XML, CSV

• Used the AWS environment to extract, transform, load data from various data sources to tableau for reporting

• Worked with data-sets of varying degrees of size and complexity including both structured and unstructured data.

• Worked on data processing on very large datasets that handle missing values, creating dummy variables and various noises in data.

• Collaborated with different functional teams to implement models and monitor outcomes

• Developed process and tools to monitor and analyze model performance and data accuracy

• Performed data wrangling to clean, transform and reshape the data utilizing Pandas library. Analyzed data using SQL, Python and presented analytical reports to management and technical teams

• Used Gitto apply version control. Tracked changes in files and coordinated work on the files among multiple team members

• Participated in regular grooming sessions to discuss the stories and ensure report development tasks are up to date using agile tools like RALLY and JIRA

• Worked with Data Management team to analyze Reinsurance Data using tools like SQL, Access, Excel

(Lookups, Pivot tables) and predicting any future impacts on business.

• Prepared financial analysis reports with Data Management team to assist senior management.

• Worked in collaboration with Actuary team to improve predictive models.

• Worked on internal projects to improve the standards of routine data processing. Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Snowflake, AWS, Amazon Sage Maker, Amazon Textract, Amazon Athena, Amazon Comprehend, Microsoft Excel, SSIS, SSRS, SSAS, SAS Analytics, Python, Hive, Spark, Rally.

KPMG, New Jersey, United States Aug 2018 to Feb2021 Sr Data Analyst/Data Engineer

Responsibilities:

• Performed end-to-end analytics from data extraction to insights development for quarterly data using Tableau

• Responsible for development of data warehouse, ETL system using relational and non-relational tools like SQL and NoSQL

• Worked on combination of structured and unstructured data from multiple sources and automated the cleaning process using python scripts.

• Responsible for working on analytical problems such as data gathering, exploratory analysis, data cleaning, feature engineering using large scale complex data sets using python.

• Used Pandas, NumPy, seaborn, SciPy, Matplotlib, Seaborn, Scikit-learn, NLTK in Python at various stages for developing various python scripts

• Performed in-depth analysis on data and prepared ad-hoc reports in MS Excel and SQL scripts.

• Involved in creating SSIS packages with various transformations like Slowly Changing Dimensions, Look up, Aggregate, Derived Column, Conditional Split, Fuzzy Lookup, Multicast and Data Conversion

• Used SSIS to create ETL packages (.dtsx files) to validate, extract, transform and load data to data warehouse databases

• Worked on Tableau dashboards optimization for better performance and monitoring and dealing with huge volume of data sets in Tableau

• Created Drill down Reports, Parameterized Reports, Linked Reports and Sub Reports apart from generating Ad-hoc Reports using SSRS

• Experienced in scripting HIVE QL queries to analyze millions records with different file formats like RCFile, Sequence files and text files using portioning and bucketing concepts on customer data

• Collected historical data and third-party data from different data sources. Also, worked on outlier's identification with box-plot using Pandas, Numpy

• Implemented a recommendation model to optimize sales and marketing efforts that Increased the revenue by ~3%.

• Worked on reading queues in Amazon SQS, which have paths to files in Amazon S3 Bucket. Also worked on AWS CLI to aggregate clean files in Amazon S3

• Performed database audits and created reports based on audit logs in Tableau

• Provided clear direction and motivation to project team members, champion communication across all levels of the organization and provide daily reports to management on project status.

• Developed GIT hooks for the local repository, code commit and remote repository, code push functionality and worked on the GIT-Hub.

Environment: NumPy, Pandas, Matplotlib, Seaborn, Scikit-Learn, Tableau, SQL, Snowflake, AWS, Microsoft Excel, SSIS, SSRS, SSAS, SAS Analytics, Python, Hive, Spark. IntouchCX, India June 2015 to May 2017

Sr. Data Analyst/Data Engineer

Responsibilities:

• Anti-Money Laundering (AML) - Validating the EBA process is running as originally designed and documented.

• Utilized AWS EMR with spark to perform batch processing operations on various Big Data sources.

• Installed the applications on AWS EC2 instances and configured on the storage S3 buckets.

• Created data pipeline to extract data from various API and ingested them in S3 Buckets using Apache Airflow.

• Deployed Lambda functions to be triggered on certain events in S3 Bucket to perform data transformation and load the data in AWS Redshift.

• Worked with Linux EC2 instances and RDBMS databases.

• Developed python and shell scripts to perform spark jobs for batch processing data from various sources.

• Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) to store objects.

• Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.

• Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD.

• Extracted data using spark from AWS redshift and performed data Analysis.

• Developed POC to perform ETL operations using AWS glue to load Kinesis stream data into S3 buckets.

• Performed cleaning of data, data quality checks, data governance for incremental loads.

• Normalized data and created correlation plots, scatter plots to find underlying patterns.

• Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems.

• Testing dashboards, to ensure data is matching as per the business requirements and if there are any discrepancies in underlying data.

• Created data pipelines using Step functions, implemented state machines and ran pipelines at different times to process data.

• Develop and implement databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and quality.

• Work with management to prioritize business and information needs. Environment: Python, AWS, Jira, GIT, CI/CD, Docker, Kubernetes, Web Services, Spark streaming API, Kafka, Cassandra, Python, MongoDB, JSON, Bash scripting, Linux, SQL, Apache Airflow. IBing Software Solutions Private Limited, India Oct 2013 to May 2015 Data Analyst/Data Engineer

Responsibilities:

• Created database objects like tables, views, procedures, and functions using SQL to provide definition, structure and to maintain data efficiently

• Created dashboards and interactive charts using Tableau to provide insights for managers and stakeholders and enable decision-making for market development

• Worked on designing ETL pipelines to retrieve the dataset from MySQL and MongoDB into AWS S3 bucket, managed bucket and objects access permission

• Performed data cleaning and wrangling using Python with a cluster computing framework like Spark

• Ability to manage multiple project tasks with changing priorities and tight deadlines in Agile environment

• Employed statistical analysis with R to examine hypothesis assumptions and choose features for machine learning

• Worked with cross-functional team, designed, developed and implemented a BI solution for marketing strategies

• Implemented Feature Engineering in Spark to tokenize text data, transform features with scaling, normalization and imputation

• Involved in building machine learning pipelines to do customer segmentation with Spark, clustered with PCA and K-means, and assisted the Data Scientist team to implement association rules mining

• Developed presentations using MS PowerPoint for internal and external audiences. Environment: Python, Azure, Pyspark, Hive, Mspark, SQL, ETL, EC2, Jenkins, Jira, JavaScript, AWS. Hudda Infotech Private Limited, India June 2012 to Sept 2013 Data financial Analyst

Responsibilities:

• Collaborated with data managers to define and implement data standards and common data elements for data collection

• Built ETL Pipeline using SQL to query telecom data from MySQL database by filtering, joining and aggerating various tables

• Used Tableau to design and maintain reports and dashboards to track and communicate customer churn prediction performance

• Manipulated the raw data with NumPy and Pandas library in Python for data cleaning, exploratory analysis and feature engineering

• Generated interactive charts with Matplotlib and Seaborn library in Python for exploring and explaining data

• Collaborated with Marketing Managers to identify root causes of customer discontent and constructing dashboards that reflect these predictions

• Designed A/B tests to identify variables that contributed to customer churn and used Shiny library in R to turn analyses into dashboards

• Applied data mining in Spark to extract diverse features that provide additional information to enhance the churn prediction

• Supported in constructing machine learning models using scikit-learn library in Python to predict customer churn, including Decision Tree Model, Random Forest Model, Gradient Boost Model.’

• Served as an outlet business consultant for regional planning and consolidation of actuals, forecast, and budgeting

• Performed account analysis, variance analysis, and GL account reconciliation of the p l and balance sheet

• Distributed individual business unit monthly reports of the operating expenses, gross margin and operating margin

• Provided weekly sales volume and trade spending report to determine actual vs. goal, including promotional details

• Benchmarked SGA against budget and prepared reports for DSMs as areas to watch for austerity

• Performed in depth trend analysis of sales, cost of goods sold, inventories, purchases, and inter-unit transfers

• Acted as a liaison between information services and the business units to address efficiency and functional issues

• Extensive experience with Oracle Financials, including GL, AP, AR, purchasing, customer order, and invoicing.

Environment: Python, Azure, Pyspark, Hive, Mspark, SQL, ETL, EC2, Jenkins, Jira, JavaScript, AWS.

Contact this candidate