Data Scientist Engineer

Location:

Round Rock, TX

Posted:

May 12, 2025

Contact this candidate

Resume:

Hepsyba

************@*****.*** Cell: 512-***-****

linkedin.com/in/hepsyba-d-5958342a8

PROFESSIONAL SUMMARY

•Data Engineer/Data Scientist with 10+ years of experience in data visualization, business intelligence, and data analytics.

•Expertise in designing, developing, and optimizing interactive dashboards and reports using Power BI.

•Strong knowledge of DAX, Power Query, SQL, and data modeling to transform raw data into meaningful insights.

•Experience in integrating Power BI with Azure SQL, Azure Data Lake, and on-premises databases.

•Proven ability to collaborate with stakeholders, gather requirements, and deliver customized data solutions.

•Ensure data quality and integrity throughout the ETL process by implementing validation and cleansing mechanisms.

•Developed and maintained Azure SQL Databases, optimizing performance, security, and scalability.

•Proficient in implementing data governance, security policies, and role-based access control (RLS).

•Adept at automating data workflows, scheduling refreshes, and optimizing report performance.

•Hands-on experience with Agile and Scrum methodologies for efficient project delivery.

•Designed, built, and maintained ETL pipelines using Azure Data Factory, ensuring efficient data integration and transformation processes.

•Designed, developed, and maintained scalable data pipelines using Azure Databricks and Azure Data Lake, enabling efficient processing of high-volume structured and unstructured financial data.

•Integrated diverse data sources, including on-premise databases, REST APIs into the Databricks platform, streamlining data accessibility and reducing manual data collection efforts by 40%.

•Engineered high-throughput data pipelines using Python, Apache Airflow, and cloud-native services to enable continuous data exchange between AI agents and backend data repositories.

•Implemented modular data ingestion and transformation pipelines to support autonomous agent behaviors, including context retrieval, decision-making, and real-time feedback processing.

•Validated system performance and usability through rigorous functional testing, ensuring a high-quality user experience.

•Managed schema evolution and data versioning in Delta Lake tables on Azure Databricks, ensuring consistency and backward compatibility for downstream consumers.

•Built dynamic PySpark pipelines that integrated with Azure Key Vault for secure handling of secrets and connection strings while accessing protected data sources.

•Designed and developed robust, scalable data pipelines to support agentic systems, enabling complex interactions between autonomous AI agents and diverse data sources.

•Built and optimized ELT workflows to efficiently move and transform large volumes of structured and unstructured data from source systems to analytical environments for machine learning applications.

•Architected and maintained cloud-based data infrastructure, including databases, data lakes, and distributed storage systems to support AI/ML workflows and model training pipelines.

•Trained and fine-tuned large language models (LLMs) using domain-specific datasets, applying transfer learning and hyperparameter optimization techniques for performance gains.

•Integrated CDP data with downstream marketing automation platforms to enable dynamic audience segmentation and real-time campaign triggers.

•Collaborated with marketing and data science teams to operationalize audience insights derived from CDP analytics into omnichannel campaign delivery.

•Managed ingestion and transformation of behavioral, transactional, and demographic data from CRM, web analytics, and email systems into the CDP using ETL frameworks.

•Built custom connectors and REST API integrations to stream data from MarTech tools into centralized data lakes and CDPs for analytics and activation.

•Applied advanced statistical analysis and pattern recognition to unify and format data from multiple heterogeneous sources for analytics and decision-making.

•Possess foundational knowledge in R and Scala, with experience in using R for basic statistical analysis and data visualization, and Scala for functional programming concepts and data processing tasks with Apache Spark.

•Engineered data pipelines that supported real-time and batch processing for AI-driven applications using tools like Apache Spark, Airflow, and Kubernetes.

•Enhanced data retrieval and storage efficiency by applying partitioning, indexing, and caching strategies across distributed systems and cloud platforms.

•Skilled in data analysis, Data Modeling, and Data Visualization techniques, utilizing tools such as SQL, Python, and Tableau to extract insights and drive data-driven decision-making in healthcare settings.

•Developed and maintained Power BI dashboards and reports to visualize data, key metrics, and KPIs.

TECHNICAL SKILLS

Big Data/Hadoop Technologies

Hadoop, MapReduce, Pig, Hive, Apache Spark, Spark SQL, Informatica Power Center 9.6.1/8.x, Kafka, NoSQL, Elastic MapReduce (EMR), Hue, YARN, Apache Nifi, Impala, Sqoop, Solr, Oozie.

Languages

Java, Scala, SQL, UNIX shell script, JDBC, Python, Perl.

Cloud Environment

AWS, S3, Cloudwatch, SageMaker, AWS Glue, Azure Cloud, Azure Blob, Azure Data Lake, Azure Data Factory, Azure Storage, Azure SQL, Azure Databricks.

Operating Systems

All versions of Windows, UNIX, LINUX, Macintosh HD, Sun Solaris

Web Design Tools

HTML, CSS, JavaScript, JSP, jQuery, XML

Development Tools

Microsoft SQL Studio, IntelliJ, Eclipse, NetBeans.

Databases

Oracle 10g, 11g, 12c, Microsoft SQL Server 2008,2010/2012, MySQL 4.x/5.x, DB2, Teradata, Netezza

NoSQL Databases

Cassandra, HBase, MongoDB

Development Methodologies

Agile/Scrum, UML, Design Patterns, Waterfall

Build Tools

Jenkins, Toad, SQL Loader, PostgreSQL, Talend, Maven, ANT, RTC, RSA, Control-M, Oozie, Hue, SOAP UI

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio/Outlook), Crystal reports XI, SSRS, Cognos, Power BI, Tableau

PROFESSIONAL EXPERIENCE

Client: Physician’s Mutual Insurance, Omaha, Nebraska Sep 2023 to Till Date

Role: Data Scientist -I

Responsibilities:

•Performed functional data analysis by collaborating with business partners and technical teams to gather and translate requirements into actionable solutions.

•Developed a scalable and secure Azure Data Lake to store and process large volumes of patient's insurance data generation, transmission, and distribution data.

•Utilized Azure Data Lake Storage (ADLS) for hierarchical storage and optimized access patterns for analytics workloads.

•Implemented robust ETL workflows using Azure Data Factory, orchestrating data ingestion, transformation, and loading across multiple environments while ensuring data integrity and lineage.

•Engineered data solutions to support Bloomberg data acquisition, cleansing, and transformation, enabling real-time insights for trading analytics and portfolio risk assessment.

•Designed and implemented scalable ETL/ELT pipelines using Azure Data Factory to ingest and transform structured and semi-structured data from on-prem and cloud sources into BigQuery.

•Developed modular SQL-based data models using Dataform for building transformation layers in BigQuery with built-in data quality checks and dependency management.

•Built and optimized data pipelines in Azure Databricks using PySpark to process high-volume transactional data and load analytics-ready datasets into BigQuery.

•Designed and implemented HL7 FHIR-based APIs to enable secure, real-time exchange of patient health records across multiple healthcare systems.

•Integrated FHIR resources such as Patient, Encounter, Observation, and MedicationRequest into enterprise data warehouses for analytics and reporting.

•Developed FHIR-compliant data ingestion pipelines using Python, Azure Data Factory, or Apache NiFi to standardize and transform clinical data from EHR systems.

•Mapped and transformed HL7 v2/v3 messages to FHIR resources to support interoperability between legacy hospital systems and modern cloud-based platforms.

•Worked with healthcare providers and payers to design FHIR-based solutions that meet ONC and CMS interoperability and patient access mandates.

•Monitored and optimized BigQuery performance through clustering, partitioning, and query tuning, resulting in reduced compute cost and latency.

•Collaborated with marketing and data science teams to operationalize audience insights derived from CDP analytics into omnichannel campaign delivery.

•Managed ingestion and transformation of behavioral, transactional, and demographic data from CRM, web analytics, and email systems into the CDP using ETL frameworks.

•Built custom connectors and REST API integrations to stream data from MarTech tools into centralized data lakes and CDPs for analytics and activation.

•Applied advanced statistical methods and pattern recognition techniques to normalize and integrate data from disparate sources, enhancing data consistency and usability across analytics platforms.

•Unified structured and unstructured datasets from various systems using statistical matching, clustering, and anomaly detection to enable accurate downstream reporting and machine learning.

•Developed custom data transformation pipelines to format and enrich heterogeneous data, enabling seamless ingestion into data lakes and analytical environments.

•Utilized Python libraries such as Pandas, NumPy, and SciPy for data wrangling, statistical analysis, and cleaning of multi-source data for business intelligence dashboards.

•Identified trends, correlations, and outliers within large datasets using regression analysis, time-series forecasting, and hypothesis testing to support strategic decision-making.

•Collaborated with data architects and analysts to build a standardized data model from diverse formats (CSV, XML, JSON, relational DBs) for unified business reporting.

•Validated system performance and usability through rigorous functional testing, ensuring a high-quality user experience.

•Translated analytical findings into clear and detailed functional requirements for system development and process enhancements.

•Executed complex SQL queries and stored procedures to support data manipulation and reporting requirements.

•Designed and implemented data models and optimized queries for efficient data retrieval and analysis.

•Demonstrated strong SQL skills by writing and optimizing complex queries, joins, and stored procedures.

Client: JPMorgan Chase, New York NY Jun 2022 – Jul 2023

Role: Data Engineer- III

Responsibilities:

•Completed upgrades, data migrations, backups, and recovery tasks for databases.

•Experience in financial services, including banking data, payments, trades, securities, ACH payments, credit cards, and customer information systems.

•Analyzed complex financial transactions and payment workflows, ensuring data accuracy and regulatory compliance across various banking operations.

•Designed and deployed a secure and scalable data lake architecture on AWS S3 to store transactional data, ensuring compliance with FINRA, GDPR, and PCI-DSS regulations.

•Developed and optimized ETL pipelines using AWS Glue to process high-volume financial data from diverse sources, improving data ingestion efficiency by 35%.

•Implemented real-time fraud detection using AWS Kinesis Streams and Lambda functions, reducing fraudulent transactions by 20%.

•Developed and maintained scalable back-end services using Node.js and TypeScript, ensuring strong typing and maintainable code.

•Improved application performance by optimizing asynchronous operations and using Node.js event-driven architecture.

•Built end-to-end data pipelines using AWS Data Pipeline, Lambda, and Step Functions to automate data workflows for credit risk analysis and portfolio reporting.

•Integrated AWS Redshift with core banking systems to centralize financial data for regulatory reporting and advanced analytics.

•Leveraged AWS Athena to enable ad-hoc querying of financial data stored in S3, improving the speed of internal audits and financial reconciliations.

•To store the data, tables were made in the SQL database using facts and dimensions.

•Examined the effects of the remote patient monitoring system, a connection to a SQL database was established using Power BI.

•Expertise in designing star and snowflake schema and ETL data migration procedures.

•Used RPM system devices to analyze patients with specific conditions like diabetes and hypertension.

•Performed reporting and performance analysis on complex patient data.

•Skilled in performing data mapping, creating comprehensive data dictionaries, and defining data structures to support data analysis and enhance data quality across diverse business systems and analytical initiatives.

•Examined SQL queries used to build tables, pivot tables, and list reports.

•Used SQL to perform source to target data mapping.

•Recognized patterns, trends, and anomalies in the patient data to spot potential medical problems.

•Examined the patient's vital signs and medical history to provide any necessary health warnings.

•Reviewed basic SQL queries and edited inner, left, and right joins in Power BI Desktop.

•Created dashboards and visualizations in Power BI to track the usage of the system by patients and evaluate the results.

•Built into each data processing workflow data encryption and de-identification mechanisms to ensure that patient confidentiality is maintained as their personal data are exchanged between internal systems.

•Designed dashboards in Power BI to examine how the system performs under various conditions and for creating several business intelligence dashboards and reports.

•Reviewed reports of patient vitals following system use and evaluated advancements.

•Collaborated with cross-functional teams, including business analysts and database administrators, to optimize data structures and improve query performance.

Client : Vasanth Software Solutions Oct 2018 – Feb 2022

Role: Data Modeler-I

Responsibilities:

•Maintained sizable collections of patient data in well-organized spreadsheets in Excel.

•To store the data, tables were made in the SQL database using facts and dimensions.

•Examined the effects of the remote patient monitoring system, a connection to a SQL database was established using Power BI .

•Expertise in designing star and snowflake schema and ETL data migration procedures.

•Used RPM system devices to analyze patients with specific conditions like diabetes and hypertension.

•Performed reporting and performance analysis on complex patient data.

•Examined SQL queries used to build tables, pivot tables, and list reports.

•Used SQL to perform source to target data mapping.

•Recognized patterns, trends, and anomalies in the patient data to spot potential medical problems.

•Examined the patient's vital signs and medical history to provide any necessary health warnings.

•Reviewed basic SQL queries and edited inner, left, and right joins in Power BI Desktop.

•Created dashboards and visualizations in Power BI to track the usage of the system by patients and evaluate the results.

•Designed dashboards in Power BI to examine how the system performs under various conditions and for creating several business intelligence dashboards and reports.

Client: Enliven Infotech, Hyderabad Apr 2015 – July 2018

Role: Data Analyst

Responsibilities:

•Maintained sizable collections of patient data in well-organized spreadsheets in Excel.

•Completed upgrades, data migrations, backups, and recovery tasks for databases.

•To store the data, tables were made in the SQL database using facts and dimensions.

•Used RPM system devices to analyze patients with specific conditions like diabetes and hypertension.

•Performed reporting and performance analysis on complex patient data.

•Conducted ad-hoc analysis to support strategic decision-making processes.

•Identified opportunities for process improvement and optimization through data analysis.

•Ensured data accuracy and reliability by validating and cleansing datasets.

•Examined SQL queries used to build tables, pivot tables, and list reports.

•Used SQL to perform source to target data mapping.

•Recognized patterns, trends, and anomalies in the given data to spot potential problems.

•Examined the patient's vital signs and medical history to provide any necessary health warnings.

•Reviewed basic SQL queries and edited inner, left, and right joins in SSMS.

•Created dashboards and visualizations in Excel to track the usage of the system by patients and evaluate the results.

EDUCATION

Master's in CSIT, 2023Sacred Heart University, Fairfield, CT – US Mar 2022 — Aug 2023

Master's in Computer Science, JNTU-H, Hyderabad, India Dec 2011 — Dec 2013

Bachelor's in Computer Science Engineering, JNTU-H, Hyderabad, India Aug 2002 — May 2006

CERTIFICATIONS

•Oracle Cloud Infrastructure 2023 AI Certified Foundations Associate

•Build and Deploy APIs with a Serverless CI/CD – AWS

•DP-900 Azure Fundamentals-Microsoft

Contact this candidate