Data Engineer Warehouse

Location:

Gaithersburg, MD

Posted:

February 13, 2025

Contact this candidate

Resume:

Navya Sri Dameruppula

Data Engineer Email: ***********@*****.*** 667-***-**** github.com/dspnavyasri Professional Experience:

● Highly skilled Data Engineer with 4+ years of experience with Data Mining, Data Cleansing, Data Transformation on Multi-Terabyte, distributed Platforms.

● Experience in Data Integration, optimization of Cloud Computing performance with various tools and technologies including Databricks and Azure and Data Pipelines.

● Hands-on experience with the CI/CD and orchestration using automation tools like jenkins.

● Expert in data governance, Data Model design, Predictive analytics, Extract-Transform-Load jobs, serverless architecture, and data warehouse development.

● Extensive Experience in programming tools like Python, R, Spark, SQL and complete maintenance in Software Development Life cycle (SDLC) like Agile/Scrum/Waterfall environment.

● Strong Hands-on Experience in handling Large Data of Structured and Unstructured format with Cloud Data Warehouse and Data Lakehouse Techniques.

● Lead projects involving Installing, Configuring VDI {Building CI/CD Data Pipelines on Databricks} on Azure Cloud.

● Working on an OLTP / OLAP environment that includes Production and development databases in SQL Server. Extensive Experience with Big Data platforms, like Apache Hadoop, Spark, Solr, Hive, Kafka, Airflow.

● Experience in writing Python Jupyter notebooks for Data Understanding, Data Cleaning and Data Visualization and ML Training using libraries like Pandas, NumPy, Matplotlib, Seaborn, Plotly, Keras, Scikit-learn, NLTK.

● Deep understanding of Machine Learning algorithms like Linear, Logistic Regression, SVM, Decision Trees, Random Forest Classifier and Ensemble Methods.

● Strong background working with Healthcare Domain.

● Predictive Risk Analysis on Clinical Data using Databricks Notebooks and ML Flow on distributed platforms for Standard Scaling, PCA, Agglomerative, K-means Clustering and finding the patterns in Patient Data based on the Diagnosis, Lab, Immunizations and Screening Tests.

● Automated a python Script to build a Dashboard and Finding the Trends in Vitals and Lab tests by Clustering for every ICD.

● State Crime Prediction using Predictive modeling with various ML algorithms {Random Forest Regressor} and based on the performance and accuracy build an Ensemble model.

● Adept skills in Web/API scraping and parsing XML, and JSON documents by writing Python scripts and creating tables in the database also familiar with Star/ Snowflake schema with Unstructured Data.

● Hands-on experience in advanced Excel and features like Pivot Tables, LOOKUP’s, Macros etc. Dashboards and Insights built in Tableau/ Power BI for forecasting and predicted Clinical Data Analysis.

● Solid experience and understanding in Synapse/Azure Big Data platform, data ingestion, strong SQL, Cosmos Azure and/or Databricks for development and services. Technical Skills:

Languages Python 3.x, R, SQL, T-SQL, Spark SQL, PySpark, Java Database PostgreSQL, SQL Server, MSSQL, MySQL, SQLite, Snowflake, MongoDB Methodologies Agile, Scrum and Waterfall

Operation Systems Linux(Any Distro), Unix, Windows Cloud Services Azure, GCP

Apache Hadoop, Spark, Kafka, Solr, Pig, Hive

Data Integration: ETL/ELT methods, Azure Data Factory, Erwin Modelling, Airflow, Databricks Statistical Methods Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation Business Intelligence

and Predictive models

Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, K-Means Clustering, KNN and Ensemble Method, Natural Language Processing

Reporting Tools Tableau 10.x, 9.x, 8.x which includes Desktop, Server and Online, Microsoft Power BI, Smartsheet, Google Sheet, Google Data Studio, Microsoft Excel Data Visualization Tableau, Microsoft Power BI, Matplotlib, Sea-born, Plotly, Microsoft Excel Machine Learning Regression, Clustering, SVM, Decision trees, Classification, Recommendation systems

Big data Framework Amazon EC2, S3 and EHR

ETL/Data Warehouse

Tools

Web Intelligence, Talend, Informatica, Tableau, Data Modeling Star-Schema Modeling, Snowflake-Schema Modeling, and FACT and dimension tables, Pivot Tables

Web Framework Flask, Django, Streamlit {No code solution} Professional Experience:

Regions Bank January 2023 to Till Date

Role: Data Science Engineer

Anomaly Detection

Description - Regions Bank offers secure transactions, and anomaly detection projects play a key role in identifying unusual patterns in banking transactions, improving fraud detection, and enhancing security measures. My role involves developing machine learning models to analyze transaction data, detect anomalies, and provide actionable insights to mitigate risks and optimize banking operations. [ELT MODEL] Responsibilities:

● Built data pipelines using Azure Data Factory (ADF) to automate data flow from multiple sources, including TSYS, and securely retrieve sensitive transaction/customer data in JSON format via APIs. Stored the data in Azure Data Lake Storage.

● Integrated Azure Data Lake Gen2 Parquet data stored under timestamps, with Databricks using Azure Key Vault for app key management.

● Managed a data warehouse, implementing Change Data Capture (CDC) logic using spark sql and Slowly Changing Dimensions (SCD) to track and manage historical data changes using spark, ensuring accurate and up-to-date information for reporting and analytics.

● Performed Databricks Spark SQL to identify and resolve 19k duplicate records, identifying the root cause as batch load data ingestion errors, ensuring accurate and non-redundant data loads for the business.

● Developed automated systems using Python to generate lists of non-compliant customers based on transaction history and other behavioral patterns, contributing to compliance and risk management.

● Conducted performance tuning on long-running Spark jobs using techniques such as memory and executor tuning, optimizing job execution for better resource utilization.

● Worked with SQL and Databricks to build and optimize reporting systems and dashboards in Power BI, providing actionable insights for clients.

● Developed machine learning models for anomaly detection, identifying unusual patterns in financial transactions, and providing insights to improve fraud detection and risk management.

● Proficient in querying data warehouses and data lakes using advanced SQL functions such as window functions, procedures, and views to extract valuable insights for business optimization. Environment: Azure, Databricks, Spark, SQL Server, Power BI, Python, Visual Studio Code, OLTP, API’s, JSON, Git

Focus1 Insurance September 2022 – October 2023

Role: Data Scientist

Fraud Detection in Claims Processing

Description - Focus1 Insurance offers genuine claims processing based on machine learning models to analyze the historical claims data, detect anomalies, and flag suspicious patterns such as duplicate claims, inflated amounts, and unusual claim frequency. The goal was to proactively detect fraud, reduce financial losses, and improve the efficiency of claims processing.

Responsibilities:

● Handled creation of data models design in Snowflake Schema.

● Performed ELT process using AWS Glue underlying pyspark scripts to extract the data from multiple sources SFTP/ API and loaded the data into s3 buckets securing the data at rest.

● Integrated AWS S3 bucket with Databricks notebooks for an interactive and collaborative environment to develop and execute a comprehensive predictive claim fraud analysis framework.

● Applied advanced data preprocessing techniques, including standard scaling and PCA, to prepare the dataset for Clustering analysis.

● Conducted Agglomerative and K-means clustering to identify patterns and trends in patient data based on Diagnosis, Lab results, Immunizations, and Screening tests, enhancing the understanding of patient risk profiles.

● Automated the creation of a dashboard using Python scripts to visualize trends in vitals and lab tests by clustering for every ICD, facilitating data-driven decision-making for healthcare providers.

● Achieved significant improvements in the accuracy and efficiency of risk prediction models through iterative experimentation and model evaluation, contributing to more informed patient care.

● Performed Tableau dashboard based on the report’s from the analytics framework. Environment: AWS Glue, AWS S3, Databricks Notebooks, Spark SQL, Python, OLAP, Machine Learning

{PCA, agglomerative and K-means}, Streamlit, Plotly, Power BI, Visual Studio code, MS Outlook, MS Excel, VDI on Azure Cloud.

Customer Risk Profiling

Description - Focus1 Insurance has personalized insurance plans and pricing, while integrating risk scores into the underwriting process to help underwriters make informed decisions and reduce potential claim losses.

● Built and managed data pipelines using AWS Glue to automate the flow of customer behavior data, including claim frequency, payment history, and policy changes, and stored the data in S3 for seamless access and analysis.

● Utilized Databricks as a data warehousing tool to perform data transformations, including data cleaning, aggregation, and feature engineering, ensuring high-quality data for predictive modeling.

● Developed and deployed machine learning models using algorithms such as Random Forest, Logistic Regression, and Gradient Boosting to predict customer churn and retention risk, accurately identifying at-risk customers.

● Integrated model outputs into actionable insights displayed on interactive dashboards in Tableau, enabling stakeholders to visualize churn risk and take proactive retention measures like targeted offers and policy adjustments.

● Implemented automated batch processing for continuous model retraining and live data updates, ensuring up-to-date predictions and optimal decision-making for customer retention strategies.

● Leveraged Hyperparameter Tuning and Cross-validation techniques to fine-tune model performance, achieving high accuracy and reducing false positives in churn predictions. Environment: AWS Glue, Amazon S3, Databricks, Power BI, Python, Scikit-learn, Random Forest, Logistic Regression, Gradient Boosting, Apache Spark, SQL, Databricks Notebooks. Acmatix Solutions Private Limited July 2020 – August 2022 Role: Data Engineer

Description - Acmatix Solutions Private Limited is a company dedicated to providing innovative data-driven solutions for the education sector. The company specializes in utilizing advanced technologies, including machine learning, data analysis, and business intelligence, to help educational institutions enhance student performance, optimize resources, and improve decision-making. By developing predictive frameworks, visualizing key metrics, and offering data-driven insights, Acmatix Solutions empowers educators and administrators to make informed decisions and support student success. Responsibilities:

● Designing and developing various Machine learning frameworks Python and R to forecast student performance, achieving an accuracy rate of 84% and contributing to a robust predictive framework.

● Conducted data cleaning, preprocessing, and feature engineering to ensure data integrity, facilitating accurate analysis and insights generation.

● Using Agile methodology to develop a project when working on a team.

● Developing and maintaining SQL scripts and stored procedures for automating repetitive data analysis tasks.

● Clean and transform data using pandas and automate the data manipulation techniques.

● Built dashboards using Power BI, Tableau to visualize core student KPI’s and helped Students to identify potential.

● Generated good story telling dashboards for different use cases like Student performance, Subject’s Analysis.

● Developed Tableau workbooks from multiple data sources using data blending and interactive views, trends, and drilldowns.

● Implemented techniques like forward selection, backward elimination, and stepwise approach for selection of most significant independent variables.

● Feature selection, Dimensionality reduction methods performed to figure out significant variables.

● Involving requirements gathering and source data analysis and identified Student’s data migration and for developing the data warehouse.

● Performing data analysis and maintenance on information stored in MySQL.

● Conducting Data analysis including acquisition, cleansing, transformation, modeling, visualization, documentation, and presentation of results.

● Created complex Excel formulas and functions to perform data transformations, calculations, and analysis.

Environment: Python, R, SQL, Tableau, Power BI, Data Lake, MS Outlook, Power Automate-MS Flow, MS SharePoint, MS Power Point, MS Word, MS Excel, OLAP Certifications:

● Academy Accreditation - Generative AI Fundamentals, Databricks

● Academy Accreditation - Databricks Lakehouse Fundamentals

● Introduction to Generative AI, Google

● Data Analysis with Python: Zero to Pandas, Jovian

● Python & SQL Certificate, Hackerrank

Contact this candidate