Post Job Free
Sign in

Data Analyst Machine Learning

Location:
Newark, NJ
Posted:
July 03, 2025

Contact this candidate

Resume:

PROFILE SUMMARY

** ***** ** ********** ** a Data Analyst in Banking, and Healthcare domains with a good exposure to various phases of the Software Development Life Cycle (SDLC) including Analysis & Design, Development, Testing, and User Training.

Extensive experience in building and deploying machine learning models, including end-to-end lifecycle management and MLOps practices.

Proficient in designing NLP pipelines for data extraction, sentiment analysis, and text classification, leveraging advanced Python libraries like TensorFlow and Keras.

Skilled in implementing and optimizing large-scale A/B testing frameworks to evaluate model performance and experiment outcomes.Developed and managed logical data models to translate complex business requirements into technical specifications, ensuring alignment with organizational data governance standards.

Led initiatives in complex metadata management and data curation, enhancing data quality and accessibility for cross-functional teams.

Collaborated closely with ingestion and curation teams to define and align technical data requirements, resulting in improved data integration and usability across platforms.

Worked closely with Data Engineers and other Data Analyst to build ETL data pipelines using Alteryx to load the data into IDM (Integrated Data Management) data warehouse.

Proficient in Requirements Gathering, Surveys/Questionnaire Development, Business Requirements Documentation (BRD), Functional Requirement Specifications/Documentations (FRS/FRD).

Demonstrated expertise in working with Apache Spark for large-scale data processing and analytics.

I proficiently employ TIBCO Spotfire's predictive analytics, adeptly utilize data wrangling techniques for dataset preparation, and implement IronPython scripts for tailored automation and enhanced interactivity in visualizations.

Performed automated data extraction and migration into targets, such as Data Warehouses, Data Lakes using ETL tools and technologies like Snowflake, Azure Data Factory, and SQL Server Integration Services.

Experience of SQL in constructing tables, joins, indexed views, indexes, complex stored procedures, triggers, and useful functions to facilitate efficient data manipulation, and consistent data storage in accordance with business standards.

Proficiency in Microsoft Excel Data Cleansing, Data Transformations, Data Analysis and Visualizations using Math, Statistical, Logical, and other functions, Vlookups, Macros, Pivot Tables, Pivot Charts and Graphs.

Worked extensively with Data Architects and Data Modelers to create systematic processes for data extraction, transformation, and analysis.

Experience in monitoring data flows and analyzing data lineages.

Worked on several Python packages like NumPy, SciPy, Pandas, TensorFlow, Keras, Matplotlib, Seaborn, etc. to conduct EDA (Exploratory Data Analysis) for identifying patterns, trends, and relationships among various features within the data.

Experience with highly skilled and results-driven Data Analyst with a specialization in big data analytics.

Possessing a profound understanding of intricate datasets, I excel in the precise extraction of valuable insights and hidden patterns from raw and big data.

Extensive experience in exploring and analyzing large datasets using AWS QuickSight to derive actionable insights.

My expertise extends to the proficient execution of advanced techniques for data extraction, transformation, and loading (ETL) from diverse sources.

I construct resilient data processing pipelines and seamlessly integrate data using cutting-edge tools, particularly Azure Cloud Services such as ADLS, Data Factory, Data Flows, Data Bricks, and Power BI.

Proficient in the technical aspects of data extraction, data transformation, and data loading, I adeptly harness data visualization techniques to communicate intricate findings to stakeholders, empowering data-driven decision-making.

My meticulous and detail-oriented approach ensures unwavering commitment to data quality and integrity throughout the analytical process.

Continuously dedicated to learning and staying abreast of emerging trends in big data analytics, enabling me to drive innovation and optimize data management practices.

Familiar with Processing Real-Time Streaming data using Azure Event Hubs and Azure Stream Analytics and visualizing the results using Power BI and Tableau.

Worked closely with business leadership, key stakeholders, developers, and clients to identify, implement, and distribute Salesforce and other CRM processes & system solutions.

Managed and successfully delivered key project outcomes in stringent timelines.

Partner with client teams to increase efficiency and reduce operational costs in a global Salesforce environment.

Initiated and self-motivated to gather, document, and present operational data & requirements from key stakeholders including the sales team, client pool, upper and mid-level management.

Conduct gap analysis prior to integrating platform solutions within a defined project scope.

Technical Skills:

Cloud & Big Data Engineering

Cloud Platforms: GCP, AWS, Azure, Snowflake, Databricks, Dataproc

ETL & Streaming: Kafka, Spark (incl. PySpark, Spark SQL, Streaming), Flink, Airflow, Beam, Dataflow, Kinesis, Pub/Sub, Python, Scala, DBT, LookerML, Matillion, ETL, ELT

Architecture & Design: Solution Architecture, Scalable Distributed Systems Design, Data Modeling (Relational, NoSQL, Warehouse, Lakehouse/Delta Lake), Real-time/Event-Driven Architectures, High Availability/DR Patterns, Data Product Design Principles

DataOps & Automation: CI/CD Pipeline Development (GitHub Actions, Jenkins), DevSecOps Implementation, Infrastructure as Code (Terraform, CloudFormation), Kubernetes, Docker, GitOps

AI & Machine Learning:

Techniques: NLP Implementation, AI-Driven Analytics Development, Predictive Modeling, Gen AI/LLM Implementation & Automation (incl. RAG patterns, Prompt Engineering

Models & Frameworks: PyTorch, TensorFlow, Scikit-Learn, GPT-4

Cloud AI Platforms: Sagemaker, Bedrock, Azure AI, OpenAI

Leadership, Strategy & Governance:

Strategy & FinOps: Enterprise Data Strategy Definition & Roadmap Development, Data as Product Strategy, Cloud Cost Optimization Strategy & FinOps Governance (Reduced Costs 30%+)

Governance & Compliance: Data Governance Framework Implementation (Collibra, Informatica), Regulatory Compliance Automation (HIPAA, GDPR), Cloud Security Strategy & Implementation (IAM, RBAC, KMS), Secure Data Lineage Engineering, Data Product Quality & Usability Standards

Program Management: Agile/SAFe Leadership, PI Planning, Scrum, Kanban, SDLC Governance (Jira), GitOps

Data Lakes & Warehousing:

Platforms: Snowflake, BigQuery, Redshift, Azure Synapse, Cloud SQL, Teradata, Oracle, SQL Server, IBM DB2

NoSQL: PostgreSQL, DynamoDB, MongoDB, Elasticsearch, Redis

Legacy: HDFS, Hive, Pig, Sqoop, Oozie, Zookeeper, Impala, Presto, MapReduce, SSIS, SSRS

Business Intelligence & Reporting:

BI Tools: AWS QuickSight, Tableau, Power BI, Looker, QlikView

Capabilities: Data Visualization Implementation, Dashboard Optimization, KPI Data Pipeline, Real-time Analytics Engineering

Data Modeling & Architecture:

Data Modeling Techniques: Relational, NoSQL, Data Warehousing (Star, Snowflake, Data Vault), Dimensional Modeling.

Architecture Patterns: Data Lakehouse, Event-Driven Architectures, Real-time Data Processing.

Tools & Technologies: DBT, Matillion, Python, SQL, Azure Synapse, Snowflake, Redshift, BigQuery

Performance Optimization: Query tuning, indexing strategies, data partitioning.

Projects Profile:

Client: Salesforce Mar 2021 – Till Date

Role: Senior Data Engineer/Data Analyst

Designed and developed ETL pipelines and orchestration workflows using Azure Data Factory (ADF), managing data ingestion from APIs and transactional stores into Azure Data Lake Storage Gen2 and Azure Synapse Analytics.

Built production-grade APIs using FastAPI, enabling seamless integration with real-time data ingestion microservices hosted on Azure Kubernetes Service (AKS).

Followed CI/CD best practices using Azure DevOps Pipelines for rapid deployment of Python services across Azure environments.

Implemented event-driven architectures with Azure Event Grid, Service Bus, and Durable Functions for scalable, reactive systems.

Designed cost-efficient data warehouse models and schemas in Azure Synapse, leveraging PolyBase for fast ingestion from external sources.

Built GraphQL APIs for microservices accessing Azure-hosted data; enabled real-time analytics via Power BI Embedded connected to Synapse.

Monitored performance and anomalies with Azure Monitor, Log Analytics, and integrated Slack alerts via webhooks.

Developed revenue-critical DAGs using Apache Airflow deployed on Azure Container Instances for ORC/Parquet data ingestion workflows.

Expertise in designing and developing interactive dashboards and reports for key supply chain metrics such as inventory levels, order fulfillment, lead times, and supplier performance.

Mentored junior engineers on Azure tooling, Synapse development, and best practices for secure, scalable data modeling.

Architected and implemented highly available (99.99% uptime SLA) and resilient cloud-native data solutions using Azure incorporating advanced networking, load balancing, and disaster recovery patterns.

Pioneered the technical integration and architecture of Generative AI and ML models (LLMs like GPT-4/Llama, NLP, RAG patterns) within the enterprise AI strategy, designing key components for automating workflows and enhancing knowledge discovery tools.

Architected and managed end-to-end ML pipelines using Vertex AI, automating model training, validation, deployment, and monitoring, reducing model deployment time by 50%.

Hands-on experience with ERP and supply chain management systems like SAP, Oracle SCM, and Manhattan Associates, with deep understanding of retail supply chain workflows.

Automated data validation and transformation tasks using Python and SQL, improving the accuracy and consistency of healthcare data ingested from multiple sources.

Led the technical optimization of high-performance Spark pipelines, personally diagnosing complex bottlenecks and implementing architectural solutions that increased processing speeds by 30–40%.

Established and governed enterprise data security and compliance frameworks (IAM, RBAC, encryption, lineage), ensuring adherence to HIPAA, GDPR standards and maintaining quality/usability standards for data products.

Strong understanding of supply chain workflows, including demand planning, procurement, inventory management, and logistics, to build relevant BI solutions.

Skilled in working with supply chain-specific datasets such as Bill of Materials (BOM), order history, shipment tracking, and inventory aging to drive operational improvements.

Familiarity with integrating Power BI with other analytics and visualization tools like Tableau and Excel, ensuring comprehensive supply chain reporting.

Developed automated reporting pipelines and ETL processes to streamline data collection from POS, WMS (Warehouse Management Systems), and ERP platforms like SAP, Oracle Retail.

Designed and built comprehensive data observability solutions (using Prometheus, Grafana, custom tooling), reducing critical issue detection and resolution time by 60% for key data pipelines and products.

Experience in applying machine learning techniques for demand forecasting, seasonality analysis, and retail trend prediction models.

Collaborated with cross-functional teams, including supply chain managers, procurement teams, and IT stakeholders, to understand business needs and deliver tailored BI solutions.

Collaborated with data scientists, analysts, and healthcare professionals to deliver data-driven insights that improved patient care, reduced operational inefficiencies, and optimized healthcare delivery.

Championed self-service analytics by engineering governed data models and delivering curated datasets as discoverable, reliable data products via BI tools (QuickSight, Tableau, Looker) for business users.

Collaborated with clinical teams to ensure that data models and pipelines met the specific needs of healthcare operations, including patient data management, medical billing, and resource allocation.

Designed, implemented, and governed real-time data ingestion architectures using Kafka Connect and Confluent Schema Registry, ensuring stability, compatibility, and clear data contracts for consumers

Client: Tik Tok, IL Aug 2019 - Feb 2021

Role: Data engineer/Data Analyst

Participated in Code Reviews, Enhancement discussion, maintenance of existing pipelines & systems, testing and bug-fix activities on-going basis.

Developed web applications using Python Flask, ensuring efficient API development and seamless backend integrations.

Implemented automated testing frameworks with Python Pytest, reducing manual testing efforts by 40% and improving application reliability.

Designed and optimized Flask-based RESTful APIs, improving data retrieval efficiency in cloud-based architectures.

Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and Step Functions.

Implemented real-time streaming pipelines using AWS Kinesis and DynamoDB Streams, integrated with Event Bridge for async event handling.

Developed reusable AWS infrastructure using AWS CDK (Python) for automated deployment of Lambda functions, DynamoDB tables, and CloudWatch alarms.

Proficient in data analytics and reporting using Power BI, Tableau, Spotfire, and SQL, delivering insights on reservoir performance, production forecasting, asset utilization, and supply chain optimization.

Successfully migrated data from one partition in AWS to another and flowed the data to AWS Redshift clusters using ETL Pipelines with AWS Glue, Lambda

Developed serverless applications using AWS Lambda, including creating, deploying, and managing Lambda functions to execute code in response to events from various AWS services.

Conducted root cause analysis on supply chain disruptions (e.g., delivery delays, inventory shortages) and recommended data-driven corrective actions to maintain retail store fulfillment targets.

Skilled in implementing event-driven programming models with AWS Lambda, including processing events from sources such as S3, SQS, SNS, DynamoDB Streams, Kinesis, and AWS IoT.

Scheduled Spark processes/applications in the AWS EMR cluster and Performed Data Ingestion from multiple internal clients using Apache Kafka.

Knowledge of supply chain forecasting and optimization models, leveraging Power BI's advanced analytics capabilities to support strategic planning.

Integrated alerting systems to detect revenue anomalies using automated rule-based monitoring and Slack-based notifications

Leveraged Ralph Kimball’s dimensional modeling techniques including star and snowflake schemas for analytical reporting.

Employed My SQL to craft optimized views, stored procedures, scripts, and summary records, enhancing data manipulation efficiency and optimizing query performance.

Collaborated with supply chain managers, merchandisers, and logistics teams to align data reporting with business goals, supporting omnichannel retail strategies.

Managed and fine-tuned data pipelines to facilitate data extraction, transformation, and loading (ETL) operations, with a focus on upholding data accuracy and maintaining consistency.

Client: 3M, NJ Mar 2017-Jul 2019

Role: Data Analyst/Data Modeler

Consulted with application development business analysts to translate business requirements into data design requirements used for driving innovative data designs that meet business objectives.

Involved in information-gathering meetings and JAD sessions to gather business requirements, deliver business requirements document and preliminary logical data model.

Performed requirements gathering and design analysis for customer contact history for customer relationship management.

Worked as business requirements analysts with subject matter experts to identify and understand requirements.

Supported data warehouse projects with logical and physical data modeling in Oracle environment and assured accurate delivery of DDL to the development teams.

Documented the data requirements and system changes into detailed functional specifications.

Took ownership of Source to Target Mapping and tracked and maintained changes.

Performed Extraction, Transformation and Loading (ETL) using Informatica power center.

Expertise in authentication, authorization, and IAM concepts, including user identity management and secure access control.

Proficient in documenting Functional and Non-Functional Requirements to ensure clear alignment with business objectives.

Expertise in designing solutions and developing process flow diagrams using tools like Jira, Confluence.

Creating process flow diagrams using tools like Jira, Confluence, and PowerPoint.

Skilled in Agile Business Analyst with a strong understanding of Scrum concepts and methodologies.

Experience working with Databricks and MS SQL databases; involved in data migration projects.

Experience in Agile environments; leading requirements gathering sessions and stakeholder meetings. Partnered with business teams, IT, and leadership to drive data-driven solutions.

Proficient in defining acceptance criteria and managing the end-to-end acceptance process to ensure quality deliverables.

Strong Data Management experience in managing, organizing, and optimizing enterprise-level data for accuracy, consistency, and accessibility

Identifying duplicates using matching algorithms like management tools IBM initiate for large data.

Designed and implemented data profiling and data quality improvement solution to analyze, match, cleanse, and consolidate data before loading it into data warehouse.

Extensively used ERWin to design Logical, Physical Data Models, and Relational database and to perform forward/reverse engineering.

Involved in user training sessions and assisting in UAT (User Acceptance Testing).

Responsible in designing new FACT or Dimension Tables to existing Models.

Skilled in conducting A/B testing to evaluate and enhance business decision-making.

Performed exploratory data analysis (EDA) to uncover patterns, trends, and insights.

Expertise in evaluating and enhancing ambulatory patient access processes. Analytics for healthcare I worked on insights and reports about scheduling, patient access, and operational effectiveness.

Developed interactive dashboards using Tableau and Power BI for effective data visualization and reporting.

Implemented stored procedures, functions, views, triggers, packages in PL/SQL

Involved in Data Migration, Developing Automated Test Scripts in Python and SQL to validate the data migrated from the Source to Target Systems involving NPI data.

Generated Data reports / Dashboards in Tableau with interactive views, trends and drill downs with quick/context/global filters, parameters and calculated fields.

Developed risk prediction models during the Variable Rate Conversion in a Finance Transformation Project using Python to assess downstream system impact. Created visualizations with PowerPoint, Matplotlib, Seaborn, and Bokeh for stakeholder presentations.

Built anomaly detection and risk models for Identity Management Control Access using Python clustering and regression with OneHotEncoding via Sci-Kit Learn. Automated data comparison between Teradata and Google Cloud to ensure consistency and detect irregularities.

Led stakeholder and vendor meetings (Black Knight, Oracle, Mohela) to gather requirements and address roadblocks for a data migration project. Executed strategic plans for Credit Risk, Financial, and Regulatory Reporting while decommissioning the ALS Mainframe system.

Client: Paypal, CA Sep 2015 – Apr 2017

Role : Data Analyst

Developed a predictive analytics model for financial forecasting and risk assessment, enabling informed investment decisions and enhancing risk management strategies.

Utilized APIs to collect real-time market data and historical financial data, enhancing data accuracy and comprehensiveness for analysis.

Employed data wrangling techniques using Python and libraries like Pandas to clean and preprocess data, ensuring accuracy and consistency.

Engineered financial indicators and relevant features from raw data, resulting in improved model performance and predictive accuracy.

Created visualizations using Matplotlib and Seaborn to identify patterns and trends in financial data, providing valuable insights into market behavior and risk factors.

Conducted statistical analysis to validate assumptions and assess relationships between financial variables.

Developed predictive models using machine learning algorithms (e.g., XGBoost, ARIMA) for market forecasting and risk assessment, achieving an average prediction accuracy of 95%.

Applied hyperparameter tuning techniques to optimize model parameters, enhancing robustness and reliability of forecasts.

Deployed predictive models as RESTful APIs using Flask or FastAPI, enabling real-time access to forecasts for stakeholders and enhancing decision-making speed.

Established a CI/CD pipeline using Jenkins or GitHub Actions to automate model deployment, improving integration efficiency and version control.

Utilized monitoring tools to track model performance and accuracy over time, enabling proactive adjustments based on market changes.

Implemented a retraining strategy based on new market data, maintaining model relevance and effectiveness in predicting trends and risks.

Created interactive dashboards using Tableau or Power BI to visualize market trends and risk assessments, facilitating data-driven decision-making for stakeholders.

Prepared and presented actionable insights and recommendations to senior management, driving informed investment strategies based on model outputs.

Maintained comprehensive documentation of project methodologies and model specifications to ensure transparency and regulatory compliance.

Adhered to data privacy and security protocols, safeguarding sensitive financial information and ensuring compliance with industry regulations.

Client: Gspan, India Nov 2011 - Aug 2014

Role: Data Analyst

Performed Data analysis and Data profiling using complex SQL on various sources systems. Developed SAS macros for data cleaning, reporting and to support routing processing. Used MS Visio for business flow diagrams and defined the workflow.

Created SQL scripts to find Data quality issues and to identify keys, Data anomalies, and Data validation issues.

Actively involved in writing T-SQL Programming for implementing Stored Procedures and Functions and cursors, views for different tasks.

Performed Gap Analysis to check the compatibility of the existing system infrastructure with the new business requirement.

Wrote SQL Stored Procedures and Views, and coordinate and perform in-depth testing of new and existing systems.

Experienced in developing business reports by writing complex SQL queries using views, volatile tables.

Extensively use SAS procedures like means, frequency and other statistical calculations for Data validation.

Performed Data Analysis and extensive Data validation by writing several complex SQL queries. Involved in design and development of standard and ad-hoc reporting using SQL/SSRS Conducted data mining and data modeling in coordination with finance manager.

Identified source databases and created the dimensional tables and checked for data quality using complex SQL queries.

Responsible for data lineage, maintaining data dictionary, naming standards and data quality.

Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.

Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.

Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.

Extracted data from different sources like Oracle and text files using SAS/Access, SAS SQL procedures and created SAS datasets.

Developed SQL Joins, SQL queries, tuned SQL, views, test tables, scripts in development environment.

Worked on Requirement Analysis, Data Analysis and Gap Analysis of various source systems sitting and coming from multi systems. Responsible for BI Data Quality. Conducted JAD sessions to allow different stakeholders such as editorials, designers, etc.,

Performed Data Validation with Data profiling, Involved in Data Extraction from Teradata and Flat Files using sql assistant.

Use SQL, PL/SQL to validate the Data going in to the Data Ware House, creating complex data analysis queries to troubleshoot issues reported by users Evaluates data mining request requirements and help develop the queries for the requests.

Conducted UAT (User Acceptance Testing) for multiple iterations by writing Test Cases and signed off the same after approval

Involved in Designing Star Schema, Creating Fact tables, Dimension tables and defining the relationship between them.



Contact this candidate