Data Analyst Machine Learning

Location:

Bear, DE

Posted:

March 10, 2025

Contact this candidate

Resume:

K Snigdha

Role: Sr Data Analyst

Phone No: 281-***-****

Email ID: *********@*****.***

PROFESSIONAL SUMMARY

• Highly efficient Data Analyst with 7+ years of experience in Data Analysis, Machine Learning, Data mining with large data sets of structured and unstructured data, Data acquisition, Data Validation, Predictive modeling, Data Visualization.

• Worked with various clients in Banking, Health Care, Financial and Insurance domains.

• Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison, and validation.

• Well-versed with Agile/SCRUM model, participated in daily SCRUM meetings, coordinated team activities and maintained excellent client communication.

• Extensive experience in database programming using oracle SQL/PLSQL, Informatica and Unix shell scripting.

• Responsible for all backup, recovery, and upgrading of all the PostgreSQL databases.

• Responsible for configuring, integrating, and maintaining all Development, QA, Staging and Production PostgreSQL databases within the organization.

• Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.

• Utilized Kanban boards to manage ongoing data analysis tasks, track progress, and ensure the timely delivery of reports and dashboards.

• Strong understanding of the principles of Data warehousing, Fact Tables, Dimension Tables, Star and Snowflake schema modeling.

• Skilled in performing data parsing, data manipulation and data preparation with methods including describing data contents, computing descriptive statistics of data, regex, split and combine, remap, merge, subset, re-index, melt, and reshape

• Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.

• Proficient in analyzing the end user requirements and converting them into efficient Technical Solutions.

• Excellent interpersonal and communication skills, and had experience in working with managers, business users and developers across multiple disciplines.

• Experience in writing toad and PL/SQL Stored procedures, Triggers and Functions.

• Strong experience in Database such as Microsoft Access, Oracle 11g/10g, SQL Server, DB2, Teradata, on Windows 2000/NT platforms and latest.

• Extensive experience in Data Visualization includes producing tables, graphs, listings using various procedures and tools such as Tableau.

• Worked and extracted data from various database sources like Oracle, SQL Server, DB2, regularly accessing JIRA tool and other internal issue trackers for the project development.

• Performed statistical analysis and predictive modeling using R to identify trends and patterns in large datasets.

• Good industry knowledge, analytical & problem-solving skills and ability to work well within a team as well as an individual.

• Worked in an Agile Scrum environment, collaborating with cross-functional teams during daily stand-ups, sprint planning, and retrospectives to deliver data-driven insights.

• Involved in reviewing business requirements business requirements and analyzing data source.

• Coordinated with the stakeholders and project key personnel to gather functional and non-functional requirements.

• In depth knowledge, support, and training of the complete trade life cycle to the team and the business users.

TECHNICAL SUMMARY

Tools

Oracle Warehouse Builder, SQL Developer, PLSQL Developer, SQL Navigator, SQL *Plus, SQL*Loader, Dream viewer AQT, D3.js, Oracle forms and reports, Qlik Sense, Export & Import (DTS), SQL, PL/SQL, Google Data Studio, JDeveloper, Oracle Identity Manager Application

Programming Languages

PL/SQL, PL/PGSQL, SQL, C, C++, .NET, CORE JAVA

Web / Application Servers

HTML, HTML5, PHP

Database

Oracle(9i,10g,11g), MongoDB, SQL, Server 2008, My SQL, Teradata, Snowflake

IDE

CRXDE, Eclipse, NetBeans, IntelliJ Idea

CI Tools

Maven, Jenkins, Git, SVN, Stash, Bit Bucket

Operating Systems

MS Windows (10,8, XP,2000,98), Linux CentOS, UNIX

EDUCATION DETAILS

• Bachelor’s in Electronics and Communication Engineering from JNTUH in 2017

PROFESSIONAL EXPERIENCE

Role : Sr DATA ANALYST Aug 2023 – TILL DATE

Client : BNY Mellon, Dallas, TX

Responsibilities:

• Involved in the full development cycle of Planning, Analysis, Design, Development, Testing, and Implementation.

• Created an aggregated report daily for the client to make investment decisions and help analyze market trends.

• Built an internal visualization platform for the clients to view historic data, make comparisons between various issues, analytics for different bonds and market.

• The model collects, merges daily data from market providers and applies different cleaning techniques to eliminate bad data points.

• Developed and Debugged PL/SQL packages, procedures, functions and to ensure integrity of loaded data, based on pre-defined table-driven validations.

• Designed and implemented ETL workflows to extract, transform, and load data from diverse sources into data warehouses for analytics.

• Optimized data warehouse queries and indexing strategies, reducing execution time for large-scale analytics.

• Developed custom reporting solutions using Google BigQuery, Google Sheets, and Looker Studio to analyze social and paid media performance.

• Automated ETL processes using tools like Informatica, Talend, or SSIS, reducing data pipeline execution times

• Worked with business teams to define ETL requirements and implemented efficient data pipelines to support analytics.

• Designed and implemented star and snowflake schemas to optimize data warehouses for analytics and reporting.

• Worked closely with geologists, geophysicists, and reservoir engineers to integrate multi-disciplinary datasets for enhanced decision-making in field development and asset management.

• Created project roadmaps and milestones, tracking progress and mitigating risks for successful project delivery.

• Developed complex SQL queries in Teradata for data extraction, transformation, and reporting, supporting data-driven decision-making.

• Gathered and analyzed diverse Tech Investment and Platform Health datasets to identify patterns, trends, and outliers for strategic decision-making.

• Scheduled, monitored, and troubleshot Informatica workflows to ensure seamless data pipeline execution.

• Developed SQL queries and scripts to validate ledger entries, detect anomalies, and ensure consistency between Sub-ledgers and the General Ledger.

• Worked closely with data engineers, business analysts, and product managers to define data requirements, design schemas, and implement solutions.

• Managed and analyzed subsurface well data, including well logs, drilling, completion, and production data to support reservoir characterization and field development.

• Assisted in ingesting structured and unstructured data into enterprise Data Lakes using AWS S3, Azure Data Lake, or Google Cloud Storage for scalable storage solutions.

• Designed interactive dashboards and custom visuals in Power BI, Tableau, or Google Data Studio to effectively communicate findings to leadership.

• Extracted, transformed, and loaded (ETL) data from multiple sources into data warehouses and reporting platforms.

• Established data stewardship practices to enforce data quality, lineage, and cataloging across business units.

• Worked with Azure Synapse Analytics, Azure Data Lake, and Azure SQL Database to store, process, and analyze large datasets efficiently.

• Ensured role-based access control (RBAC), encryption, and masking to comply with GDPR, HIPAA, and SOC2 standards for secure data handling.

• Debugged and optimized long-running queries, resolving data anomalies and inconsistencies to ensure reliable database operations.

• Created on-demand PowerPoint presentations, including tables, charts, and visuals, to translate complex data into easy-to-understand insights for leadership meetings.

• Proficient in managing relational (SQL, PostgreSQL) and NoSQL (MongoDB, Cassandra) databases for structured and unstructured E&P data storage.

• Developed and optimized complex SQL queries using joins, subqueries, CTEs, and window functions to efficiently analyze large datasets.

• Analyzed Meta and Instagram ad campaigns to measure audience engagement and optimize ad spend efficiency.

• Developed and optimized ETL workflows and mappings using Informatica Power Center for data integration.

• Integrated SQL databases with third-party APIs, cloud storage (AWS S3, Azure Blob), and BI tools (Power BI, Tableau, Looker) for real-time analytics.

• Designed and implemented star and snowflake schema models, ensuring efficient data storage and retrieval for analytics.

• Supported data governance initiatives by providing detailed lineage information for regulatory compliance

Role : DATA ANALYST Feb 2021 – July 2023

Client : Canon Medical Systems, Tustin, CA

Responsibilities:

• Implemented Data Exploration to analyze patterns and to select features using Python SciPy.

• Build Factor Analysis and Cluster Analysis models using Python SciPy to classify customers into different target groups.

• Built predictive models using SAS (e.g., logistic regression and decision trees) to forecast trends and improve decision-making accuracy.

• Developed complex PL/SQL queries, stored procedures, and functions, optimizing data retrieval for large datasets.

• Developed and maintained data pipelines on Google Cloud (BigQuery, Dataflow) and AWS (S3, Redshift, Glue), enabling seamless data integration and transformation.

• Developed and automated marketing dashboards using Google Data Studio, Power BI, and Tableau to track real-time performance.

• Created data marts and aggregation tables to improve query performance for business intelligence reporting.

• Analyzed and processed seismic reflection, velocity models, and well log data for subsurface mapping and reservoir characterization.

• Designed and modified SQL-driven reports to meet changing business needs, incorporating new KPIs, calculated fields, and dynamic filters.

• Created interactive dashboards and reports using Databricks notebooks, enhancing real-time data visualization for stakeholders.

• Built automated dashboards in Power BI, Tableau, and Qlik Sense that refresh dynamically from cloud data sources.

• Optimized cloud-based ETL processes using AWS Lambda, Glue, and Google Cloud Functions, improving data ingestion efficiency.

• Applied data mining techniques to extract actionable insights, identify trends, anomalies, and correlations within massive datasets.

• Partnered with Platform Health program teams and leadership to understand data needs, define key metrics, and ensure alignment with business goals.

• Developed SQL queries and Python scripts to extract, transform, and analyze Medicaid eligibility and claims data.

• Developed ETL pipelines to extract, clean, and transform geological and geophysical data for downstream analysis.

• Optimized SAS programs for efficiency, reducing processing time through indexing, array processing, and macro variable optimization.

• Designed and deployed batch processing jobs using AWS Lambda, Glue, and S3 to automate data aggregation and reporting.

• Worked closely with the Technology Portfolio Management team to validate data findings, ensuring accuracy and consistency in reporting.

• Performed ad-hoc analyses to support business decisions, leveraging SQL, Python, and Tableau for data exploration.

• Ensured data integrity and quality by implementing validation processes for geological, geophysical, and reservoir engineering datasets used in decision-making.

• Built dashboards and reports in Tableau/Power BI to visualize Medicaid enrollment trends and healthcare utilization.

• Applied AI/ML techniques using Python (Pandas, NumPy, SciPy, TensorFlow, and Scikit-Learn) to predict reservoir properties, optimize drilling efficiency, and forecast production trends.

• Optimized data lake performance with Delta Lake in Databricks, ensuring faster queries and reliable data versioning.

• Took initiative to discover new data-driven insights and implemented solutions to improve Platform Health transparency and reporting.

• Utilized SAS SQL (PROC SQL), DATA step, and functions to clean, transform, and preprocess large datasets for analysis.

• Designed and implemented ETL workflows using Azure Data Factory, automating data ingestion and transformation from multiple sources.

• Streamlined project documentation and reporting by developing templates and standards in Word and PowerPoint.

• Implemented best practices for data modeling to ensure scalability, data consistency, and performance.

• Conducted quality audits on healthcare claims data, identifying coding errors, incorrect billing, and fraudulent claims.

• Developed stored procedures, functions, and triggers to automate data processing, reducing manual intervention and increasing efficiency.

• Utilized advanced analytics techniques (SQL, Python, or R) to identify anomalies, detect trends, and predict risks within Tech Investment datasets.

• Integrated Python with SQL databases (PostgreSQL, MySQL, Oracle) to perform complex queries and retrieve large datasets.

• Extracted, merged, and integrated data from multiple sources (SQL databases, Excel, CSV, and cloud systems) using PROC IMPORT, PROC EXPORT, and LIBNAME statements.

• Used SQL functions (CASE, COALESCE, CAST, STRING functions) to cleanse, normalize, and standardize raw data before integrating into reports.

• Applied PROC MEANS, PROC FREQ, PROC REG, PROC LOGISTIC, and PROC UNIVARIATE for descriptive statistics, regression analysis, and predictive modeling.

• Applied data validation techniques using SQL to identify duplicates, inconsistencies, and missing values, improving overall data accuracy.

• Conducted reservoir data analysis, integrating well logs, core samples, and seismic interpretations to support reservoir characterization and production optimization.

• Wrote technical documentation and business reports, outlining key trends, anomalies, and performance metrics.

• Improved query execution times using indexes, partitioning, and query tuning techniques, ensuring seamless processing of large datasets.

• Developed automated SAS reports using PROC REPORT, PROC TABULATE, and scheduled jobs with SAS Macro and SAS Enterprise Guide.

• Designed and implemented ETL processes to extract, transform, and load millions of records from diverse sources into data warehouses.

• Standardized data models and schemas for enterprise-wide data governance initiatives, improving data integrity and compliance.

• Managed and analyzed petrophysical data to evaluate rock properties, fluid saturations, and permeability, aiding in well performance assessments.

• Recommended new methodologies and frameworks to enhance Platform Health monitoring, ensuring proactive issue resolution and better decision-making.

• Deployed predictive analytics models and machine learning workflows using Azure Machine Learning Studio.

• Documented ETL processes and conducted regular audits to ensure compliance with business and regulatory standards.

• Delivered ad-hoc analyses with SAS, responding to urgent business queries with timely and accurate insights.

Role : DATA ANALYST Dec 2019 – Jan 2021

Client : CYIENT, Hyderabad, India

Responsibilities:

• Build scalable and deployable machine learning models.

• Performed Exploratory Data Analysis, trying to find trends and clusters.

• Built models using techniques like Regression, Tree based ensemble methods, Time Series forecasting, KNN, Clustering and Isolation Forest methods.

• Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.

• Optimized Qlik Sense applications for performance and scalability.

• Trained teams on best practices for data privacy, security, and ethical AI usage.

• Created data-driven visual presentations using PowerPoint and Tableau, facilitating decision-making processes.

• Worked with business teams to map data fields across systems, ensuring smooth data migrations and reporting accuracy.

• Developed and maintained data processing applications using .NET frameworks (C#/VB.NET) to automate ETL workflows and reporting processes.

• Ensured adherence to industry standards (PPDM, WITSML, SEG-Y, LAS, RESQML) for E&P data governance and reporting.

• Ensured data integrity and consistency by implementing indexing, constraints, and referential integrity in RDBMS.

• Parsed and integrated XML data from multiple sources into analytical workflows using Python, SQL, or ETL tools.

• Linked data from MS Office tools (e.g., Excel to Word or PowerPoint) to ensure dynamic updates and consistency.

• Developed and maintained normalized and denormalized database schemas for efficient data storage and retrieval.

• Developed and maintained data processing applications using .NET frameworks (C#/VB.NET) to automate ETL workflows and reporting processes.

• Integrated E&P data with cloud platforms (AWS, Azure, GCP) to enhance scalability and real-time decision-making.

• Utilized Databricks, Snowflake, and Apache Spark for high-volume seismic data processing.

• Optimized Snowflake queries and database performance using features like clustering, caching, and partitioning.

• Integrated .NET-based APIs with SQL Server and other relational databases to extract, transform, and analyze business data.

• Designed and optimized SQL queries for data extraction, transformation, and reporting from relational databases like Oracle, SQL Server, and PostgreSQL.

• Automated data extraction and reconciliation processes using ETL tools like Informatica, Alteryx, or Python to improve efficiency.

• Optimized Glue jobs by tuning job parameters, partitioning data, and reducing processing times for large datasets

• Created advanced charts (histograms, waterfall charts, box plots, and dynamic trend analysis) to present data-driven insights clearly and effectively.

• Applied ML models for reservoir characterization, production forecasting, and anomaly detection in well operations.

• Leveraged Python (Pandas, NumPy, SciPy), TensorFlow, and Scikit-Learn for predictive insights.

• Trained team members on Jira best practices, ensuring effective adoption and consistent usage across departments.

• Automated repetitive reporting tasks using Excel Macros and VBA, reducing manual workload and improving efficiency.

• Created data-driven visualizations and reports in PowerPoint using embedded charts, graphs, and Power BI/Excel integrations.

• Utilized ASP.NET and Web APIs to retrieve and manipulate structured and unstructured data for analytical reports.

• Worked with legal and compliance teams to align data policies with global data protection laws.

• Worked with Big Data platforms (Hadoop, Spark) for large-scale data mining, enabling scalable analytics solutions.

• Partnered with developers throughout the SDLC to optimize data integration, ensuring alignment with project goals.

• Responsible for building data analysis infrastructure to collect, analyze, and visualize data.

Role : BUSINESS ANALYST/DATA ANALYST Dec 2017 – Nov 2019

Client : ANZ Bank, Bangalore India

Responsibilities:

• Developed and implemented predictive models using Natural Language Processing techniques and machine learning algorithms such as linear regression, classification, multivariate regression, K-means clustering, KNN, PCA and regularization for data analysis.

• Designed and developed Natural Language Processing models for sentiment analysis.

• Worked on development of data warehouse, Data Lake and ETL systems using relational and non-relational tools like SQL, NoSQL.

• Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

• Translated complex analytical results into clear, concise, and visually appealing slides to support strategic decision-making.

• Integrated D3.js visualizations into web applications, ensuring seamless interaction and real-time data updates.

• Designed dynamic dashboards and automated data models in Excel to simplify financial and operational analysis for stakeholders.

• Enhanced data visualization and reporting by integrating .NET solutions with Power BI, Tableau, or Excel-based dashboards.

• Collaborated with DBAs and developers to manage database structures, backups, and security.

• Designed financial data models to optimize ledger and sub-ledger reporting structures for enterprise-wide financial analytics.

• Automated ETL workflows using .NET frameworks, optimizing data ingestion and transformation for business intelligence.

• Ensured compliance with financial regulations (GAAP, IFRS) by performing audits and validating ledger data integrity.

• Leveraged Snowflake's scalable architecture for real-time data processing and analytics, enhancing decision-making processes.

• Developed and maintained .NET-based data processing applications, improving data extraction, transformation, and reporting efficiency.

• Worked extensively with RDBMS platforms such as Oracle, PostgreSQL, MySQL, SQL Server, and DB2, designing and managing complex databases.

• Standardized and templatized reporting decks for seamless data updates, improving consistency in organizational reporting.

• Enabled seamless cross-departmental collaboration by setting up secure data sharing capabilities in Snowflake.

• Developed complex database objects like Stored Procedures, Functions, Packages and Triggers using Oracle Database, SQL and PL/SQL.

• Supported month-end and year-end financial closing activities by providing data analysis and ensuring ledger accuracy.

• Conducted data profiling and validation in data warehouses to ensure accuracy, consistency, and completeness of datasets.

• Integrated .NET applications with SQL Server and other databases, ensuring seamless data retrieval and analytics.

• Designed and implemented complex ETL workflows using Informatica PowerCenter, ensuring accurate and efficient data integration.

• Integrated ETL workflows with data visualization tools like Tableau, Power BI, and QlikView for enhanced analytics.

• Collaborated with stakeholders to design shareable reports in Google Data Studio, facilitating data transparency and cross-team alignment.

Worked with cross-functional teams to improve data governance practices related to financial data and reporting.

Extracted data from multiple sources, including Oracle, SQL Server, and flat files, and loaded it into target systems using Informatica.

Role : BUSINESS ANALYST May 2016 – Nov 2017

Client : INTEGRA TECHNOLOGIES, HYDERABAD, INDIA

Responsibilities:

• Gathering business requirements and converting them to technical solutions

• Involved in preparing the Business and Functional documents

• Provided regular status reports to the management.

• Tested the applications along with the Quality Assurance testing teams.

• Designed excel Pivot tables and V-Lookup tables to analyze large data.

• Worked on error logs to identify the errors in daily processed feeds.

• Developed error detection and resolution mechanisms using SAS Data Flux, minimizing inconsistencies in data pipelines.

• Customized charts, tables, and visual elements in Google Data Studio to provide tailored reporting solutions that meet specific business needs.

• Worked with the data steward and owners in creating, metadata, Lineage, Data Quality rules and guidelines.

• Analyzed data lineage process to identify vulnerable data points, control gaps, data quality issues, and overall lack of data governance.

• Optimized the performance of ETL pipelines by integrating Data Flux with SAS, achieving faster data processing and analysis.

• Documented control points, data enhancement processes, and transformation logic identified during stakeholder interviews.

• Reorganized the existing automated reporting scripting system to increase performance and reliability.

• Implemented robust error handling and debugging mechanisms in Informatica workflows, ensuring seamless execution.

• Troubleshoot issues, including creation/follow-up of Oracle service requests.

• Wrote test scripts for testers to test the system before it goes to production.

• Missing value treatment, outlier capping and anomalies treatment using statistical methods.

Contact this candidate