Resume

Data Engineer

Location:

Sunnyvale, CA

Salary:

75$

Posted:

June 02, 2023

Contact this candidate

Resume:

Pooja Arvind Dharmik

+1-732-***-**** adxg55@r.postjobfree.com https:/linkedin.com/in/poojads Sunnyvale, CA

Experienced IT professional (8 years), with solid work experience in Data Analysis, Data Engineer, and Data Science with a broad analytics skillset, stakeholder management expertise and ability to execute complex projects end to end.

Passionate about partnering with business leaders to develop creative solutions and leveraging data-drive methodologies to build the foundation for strategic decision.

Proficient in Tableau visualizations and dashboards creation using Tableau Desktop.

Strong understanding of the principles of Data warehousing, Fact Tables, Dimension Tables, star, and snowflake schema modelling.

Experience in designing Data Mart and Data Warehouse using Star and Snowflake Schemas.

Working experience in data mart design, creation of cubes, identifying facts & dimensions, star & snowflake schemes, and canonical model.

Knowledge of ER modelling tools such as Erwin, ER/Studio, and MDM.

Expertise of using classical statistical methods (such as logistic regression, decision trees) in a commercial environment.

Background in writing Business Requirements and Functional/Technical Specification Documents including reviewing and analysing end-users’ requirements.

Experienced in Data Analysis, Predictive Analysis, Data Science to improve strategy, process, and profitability as a Data Analyst.

Experience in developing SSIS Packages to Extract, Transform and Load (ETL)/DTS data into the Data warehouse from Heterogeneous databases.

Involved in using Jenkins, Docker and Kubernetes clusters and power shell scripting for efficient data management and decentralized access.

Knowledge and ability to use Hadoop Ecosystem Components such as MapReduce, Spark, HIVE, HDFS, etc. to implement data pipelines.

Able to configure, utilize and install Hadoop Ecosystem Components such as MapReduce, Spark, HIVE, HDFS, etc.

Familiarity with Docker through creating simple Docker files and publishing them on Docker Hub.

Worked with AWS EMR, EC2, ECS, Redshift, GLUE, CloudWatch, DynamoDB, and Lambda functions.

Worked on Normalization and De-Normalization techniques for both OLTP and OLAP systems.

Solid hands-on work experience on SQL queries and creating database objects like stored procedures, triggers, packages, and functions using SQL and PL/SQL for implementing the business techniques.

Worked on different Python IDE such as Visual Studio Code and Jupyter Notebook.

Capable of all phases of data mining, data collection, data cleaning, developing models, validation, and visualization.

Capture, validate and publish metadata in accordance with enterprise data governance policies and MDM taxonomies.

Proficient in using Microsoft applications, Word and PowerPoint, Advance Excel (Pivot Tables, VLOOKUP, Macros & VBA & functions) & Advance SQL (Joins, view, triggers, transactions, query optimization)

EDUCATION

Drexel University

MS in Information System (Data Analytics) Philadelphia, PA (2021)

Welingkar Institute of Management and Research

MBA in Marketing (E-Business) Bangalore, India (2016)

Nagpur University

Bachelor of Engineering in Computer Science (2013)

Programming Skills

C, Python (NumPy, Pandas, Sklearn, SciPy), R, Scala, JavaScript, XML, SQL, Machine Learning, Pyspark

Tools

Tableau, Adobe Analytics Omniture, QlikView, Optimizely (A/B Testing), Microsoft Excel, PostgreSQL, Oracle SQL, MySQL, Power BI, SAP Hana, Kafka, Content Square, Jenkins, Kubernetes, MAKO-One Ribbon

Cloud Architecture

Amazon AWS, EC2, EC3, Lambda, Elastic Search, Elastic Load Balancing & Azure Data Lake, Data factory, Azure Databricks, Azure SQL database, Azure SQL Data warehouse, Big Data, Hadoop, Hive, Oozie, Scoop

Frameworks

Django and Flask

IDE(s)

PyCharm, Eclipse, Android Studio, Spyder

Database

SQL Server, MYSQL and Sybase, Oracle, MongoDB

Operating Systems

LINUX, UNIX, VMware and WINDOWS

Servers

Apache Tomcat, HP Server and Web Logic

Version Control

GitHub and SVN

Development Process

Agile and Scrum

Bug Tracking Tool

JIRA

SKILLS

PROFESSIONAL EXPERIENCE:

Microsoft - Redmond, WA

Sr. Data Analyst Aug 2021 - Present

Microsoft Corporation is an American multinational technology corporation headquartered in Redmond, Washington. Microsoft's best-known software products are the Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Microsoft ranked No. 14 in the 2022 Fortune 500 rankings of the largest United States corporations by total revenue.

Responsibilities

Followed Agile testing methodology, participated in daily SCRUM meetings, and tested each Sprint deliverable.

Analyse traffic and sales data for commercial hardware products and investigate key parameter that leads to the change in KPI and observed trends and growth opportunities across our product marketing initiative.

Prepare plans for the new launch products and the performance based on previous launches to help target right market and segment of our customer base.

Use different Python packages (Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, and others) to do preliminary data analysis.

Present weekly and monthly business reviews to the stakeholder to help them make required business decision to increase the traffic as well as sales through targeted customer enabling an E-commerce team to understand and scale +$400M business (as of 2022) focused on Surface, HoloLens, Azure Kinect, and accessories.

Perform comprehensive segmentation analysis of user demographics using SQL to generate monthly reports for stakeholders, resulting in actionable insights.

Analyse and understand each element on the source system to match it to the existing data dictionary of the Common Data Warehouse.

Facilitate implementation of a data governance framework by establishing data policies and best practices.

Create complex stored procedures to perform various tasks including, but not limited to, data profiling, metadata searches, and loading of the data mart.

Build and automate dashboard for stakeholders to understand performance of products in different market and analysing promos and marketing campaign to accelerate traffic and revenue.

Develop SQL queries/scripts to validate the data such as checking duplicates, null values, truncate values and ensuring correct data aggregations.

Owned and managed core data pipelines, fixed data leaks due to legacy rules, and designed standard analysis methodologies for internal analysis projects supporting Mission teams.

Design incremental loads to populate the data mart with checksum and timestamp techniques.

Migrated on-premises data on SQL Server to Azure SQL databases and Azure dedicated pools.

Perform Customer Journey Analysis to understand how customers will efficiently be able to use the website to reach desired product.

Environment: Agile (SCRUM), Microsoft Azure, Python, Oracle

State of Indiana (Department of Transportation DOT)- Indianapolis, IN

Data Analyst/Data Engineer Sep 2020 - Jul 2021

The Indiana Department of Transportation (INDOT) is a governmental agency of the U.S. state of Indiana charged with maintaining and regulating transportation and transportation related infrastructure such as state-owned airports, state highways and state-owned canals or railroads.

Responsibilities

Used ggplot2 package in R Studio for data visualization and generated scatter plots and high-low graphs to identify the relationship between different variables.

Worked on data analysis, data profiling, source-to-target mapping, and Data specification document for the conversion process.

Developed Hive scripts using Spark SQL to build different tables for analysts.

Fine-tuned Hive queries for better performance outcomes.

Utilized AWS CLI to aggregate clean files in Amazon S3 and Amazon EC2 Clusters to deploy files into s3 buckets.

Extracted and transformed the log data files from S3 by Scheduling AWS Glue jobs and loaded the transformed data into Amazon Elasticsearch.

Implement Data Exploration to analyse patterns and select features using Spark SQL and other Pyspark libraries.

Conducted data analysis using Python (Sklearn) by linear regression and joint analysis method, analysed the factors with the highest percentage of influence and proposed the optimal design solutions.

Utilized Excel Power Pivot to query test data and customize end-user requests.

Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.

Automated the manual Excel process using MS Access queries and VBA coding.

Used Pandas Data Frame for working with data and manipulating data.

Created fully automated DevOps CI/CD pipelines using Jenkins & GitHub Actions.

Generated graphs using MS Excel Pivot tables and created presentations using PowerPoint.

Developed a manager dashboard of call centre metrics and KPIs to analyse the team performance using Tableau.

Developed visuals, KPI scorecards, & dashboards using Power BI and advanced-level calculations on the data set using Dax and Power Query.

Developed purging scripts and routines to purge data on Snowflake DB and Azure Blob storage.

Used GIT for version controlling and JIRA to track and update Epic tasks in each sprint and to keep track of project milestones.

Wrote calculated columns and Measured query's using DAX in Power BI to show good data analysis techniques.

Environment: Python, AWS, Pyspark, Hive, MS Excel, ggplot2, Amazon Elasticsearch, Git, MySQL, SciPy, Pandas, Scikit-learn, PowerBI, Matplotlib, ggplot2 AWS, Shell Script, Jenkins, Jira, Agile

Parexel International – Durham, NC/Nagpur, India (Offshore)

Data Analyst/Data Engineer Oct 2018 – Jul 2019

Parexel is a global clinical research organization that was founded in 1982 and specializes in conducting clinical studies on behalf of its pharmaceutical partners in order to accelerate and ensure the drug approval process of up-and-coming potential treatments. It currently operates in more than 50 countries and is run by more than 18,000 employees around the world.

Responsibilities

Analysed large scale, high dimensional past clinical dataset and built appropriate statistical models to gather useful insights about clinical trial data and present findings to stakeholders by building interactive dashboards.

Executed extensive market research about Indian medical field on Radiology department and AI applications along with building extensive client base to enhance the quality data for successful AI product.

Worked on predicting the incidence of pulmonary tuberculosis by establishing the autoregressive integrated moving average (ARIMA) model and providing support for pulmonary tuberculosis prevention and control.

Designed, developed, and modified data pipeline infrastructure to expedite data analysis and reporting.

Implemented end-to-end delivery of AWS and Snowflake data solutions that includes building, managing, and optimizing data pipelines across environments.

Integrated data from Stripe, Shopify, messaging and marketing platforms, and electronic health record system to the data warehouse.

Utilized Tableau, SAS, and other software, to provide ad hoc reporting, tables and listings and graphs for clinical trial data, regulatory submissions, and publications.

Deployed dashboards in Google’s Looker visualization tool to track company-wide objectives and key metrics for finance, membership, marketing, growth, and clinical operations.

Formulated a strategy and created a proof of concept to migrate from a legacy data streaming system to a distributed open-source Apache Kafka and Spark Streaming system.

Developed and implemented data-driven strategies that improve patient outcomes and drove business growth.

Identified, analysed, and interpreted trends or patterns in complex data sets.

Designed and customized dashboards with the use of Tableau and SSRS.

Conducted exploratory data analysis using tools such as Pyspark to identify patterns and trends in patient data and collaborated with business stakeholders to define data requirements and ensure that data was aligned with business objectives.

Developed reports using calculated sheets, parameters, groups, sets, filters etc.

Created complex views and queries in databases and used as source in Tableau to develop high end reports.

Built machine learning algorithms that detected anomalies and redundancies in the large dataset, which helped make patient enrolment platform user friendly.

Worked with various clients in different domains; worked closely with their product, engineering, marketing and operations teams which helped clients’ grow their user base and revenue.

Environment: Python, AWS, Pyspark, JIRA, SSRS, Tableau, MySQL, Kafka, ARIMA, AWS, Docker, Jenkins, Excel

HP Inc (Manpower Group) - Bangaluru, India (Offshore)

Web Data Analyst/Data Scientist May 2014 - May 2018

HP Inc. is an American multinational information technology company headquartered in Palo Alto, California, that develops personal computers (PCs), printers and related supplies, as well as 3D printing solutions. It is the world's 2nd largest personal computer vendor by unit sales, and is ranked 58th largest United States corporation by total revenue.

Responsibilities

Performed extensive analysis on major deviation in KPIs held in Marketing and Branding of products and implemented framework of metrics, reporting, predictive modelling and online marketing activity before communicating summaries.

Prepared Web Analytics reports by building dashboards and ad hoc analysis using Adobe Analytics tool to identified and tapped into new channels to optimize ROI and fuel revenue growth by forecasting.

Utilized Python-based web crawlers to extract large dataset of over 6 million records of KOLs’ (Key Opinion Leaders) operational data.

Forecasted the sales trends with 93% accuracy using Time series analysis in Python, providing insights into the factors that influenced market share and quantifying their impact.

Automated data processing with Python-based analysis scripts, improved processing time from 20 days to 24 hours and boosting efficiency.

Extracted data from several sources like PostgreSQL and applied several transformations to manipulate the data and loaded that transformed data into SQL server destinations using SQL server integration services (SSIS).

Performed data mining on large datasets (mostly structured) and raw text data using different Data Exploration techniques.

Developed descriptive analysis in Python (Pandas/Matplotlib) and Power BI to guide training events, including statistical measures and data visualization (bar chart, scatter plot) to provide insights into KOL performance.

Develop SSIS Packages to Extract, Transform and Load (ETL)/DTS data into the Data warehouse from Heterogeneous databases.

Generated Heat Maps for better understanding and data visualization of the site performance and A/B tests results to reduced web analytics reporting hours by creating report templates and automation with tableau.

Environment: MySQL, XML, Python, Adobe Analytics, Tableau, PostgreSQL, SSIS, Autosys

Contact this candidate