Data Engineer Power Bi

Location:

Cumming, GA

Salary:

60000

Posted:

October 15, 2025

Contact this candidate

Resume:

Snehith Reddy Banala

Data Engineer

Cumming, GA, USA 551-***-**** ***************@*****.*** www.linkedin.com/in/snehithreddybanala/ SUMMARY

Data Engineer with 3+ years of experience building scalable data pipelines, cloud data warehouses, and real-time analytics across AWS, Azure, Snowflake, and Databricks. Proficient in Python, SQL, and R with expertise in PySpark, Scikit- learn, TensorFlow, Pandas, and NumPy for data modeling and transformation. Skilled in managing batch and streaming workflows using Apache Kafka, Airflow, NiFi, Docker, and Terraform in Agile environments. Experienced in ETL/ELT development using SSIS, Informatica, and Dataiku, ensuring high data quality, governance, and compliance. Adept at delivering insights through Tableau, Power BI, and Excel, translating complex data into actionable business intelligence. EDUCATION

Master of Science in Computer Science

- Auburn University at Montgomery (AUM), Montgomery, Alabama, USA – May 2025 Bachelor of Technology in Computer Science and Engineering

- Institute of Aeronautical Engineering, Hyderabad, India – July 2022 SKILLS

Methodologies: SDLC, Agile, Waterfall

Programming Language: Python, SQL, R

Packages: NumPy, Pandas, SciPy, Scikit-learn, TensorFlow, PySpark Visualization Tools: Tableau, Power BI, Advanced Excel (Pivot Tables, VLOOKUP), Matplotlib, Plotly, Seaborn IDEs: Visual Studio Code, PyCharm, Jupyter Notebook Cloud Platforms: Amazon Web Services (AWS), Azure Databricks, EC2, S3, Glue, Redshift, Athena, AWSLambda, Snowflake, DynamoDB, Kinesis, SQS, SNS. Database: MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, DynamoDB Big Technologies: Apache Airflow, Apache Spark, Apache Hadoop, Apache Kafka, ETL/ELT, HDFS, Hive, NIFI Other Technical Skills: SSIS, SSRS, SSAS, Docker, Kubernetes, Jenkins, Terraform, Informatica, Talend, Dataiku, Google Big Query, Data Quality and Governance, Machine Learning Algorithms, Big Data, Advanced Analytics, Statistical Methods, Data Mining, Data warehousing, Git, Github Operating Systems: Windows, Linux, Mac

WORK EXPERIENCE

Data Engineer McKesson Corporation, GA, USA August 2024 – Present

Built real-time data pipelines using Azure Data Factory, Blob Storage, and Cosmos DB Change Feed to integrate EHR, billing, and IoT feeds—reducing ingestion delays by 30% and enabling faster reporting for hospital operations and clinical decision-making.

Developed machine learning workflows using Python and applying Pandas, NumPy, and Scikit-learn for feature engineering, model training, and evaluation to forecast patient readmission risk and contributed 30% to the solution, improving early intervention across chronic care programs.

Designed interactive dashboards using Tableau and Plotly, integrating real-time data via Apache Kafka, and owned 35% of development while collaborating with clinicians and analysts to tailor views for actionable patient insights.

Created secure ETL pipelines using SSIS, Informatica, and SQL Server, migrating legacy healthcare data into Azure Synapse Analytics while ensuring HIPAA compliance and contributed to 40% of the architecture in collaboration with compliance and engineering teams.

Automated batch processing workflows using Azure Durable Functions, Logic Apps, and Service Bus, orchestrating data transfer from clinics and vendors, and led 25% of pipeline automation efforts while aligning with infrastructure standards and alerting protocols.

Data Engineer Zensar Technologies, India June 2022– July 2023

Designed and automated end-to-end data workflows using Python (Pandas, NumPy) and SQL, enabling faster extraction, transformation, and loading (ETL) of large datasets across business units. Reduced manual effort by 40% and improved overall data processing efficiency.

Developed cloud-based data storage and integration solutions using AWS S3, AWS Glue, and Lambda, ensuring secure, scalable, and cost-effective data pipeline deployments. Streamlined data movement from on-prem to cloud, improving availability for analytics teams.

Implemented CI/CD pipelines using Jenkins, Git, and Docker to automate the deployment of data pipelines and analytics services. Enabled smoother version control, reduced deployment errors by 60%, and improved team productivity in Agile sprints.

Engineered business dashboards in Power BI and integrated backend datasets using SQL Server and REST APIs, delivering actionable KPIs for marketing and operations teams. Helped leadership track performance trends, customer behavior, and conversion metrics.

Collaborated with Data Scientists to implement ML models using Scikit-learn and XGBoost, validating data integrity and improving model prediction accuracy. Conducted EDA and feature engineering, enabling targeted strategy recommendations for client growth.

Data Engineer Cognizant, India March 2021 – May 2022

Developed robust data pipelines using Apache NiFi and PostgreSQL to ingest and transform large volumes of transaction and loan data from core banking systems, reducing data processing latency by 45% across reporting layers.

Built batch-processing ETL jobs using Apache Spark and Hadoop (HDFS) for high-volume credit scoring and account risk modeling workflows, enabling scalable compute across distributed clusters.

Designed and implemented data warehouse schemas using SQL Server and SSIS, streamlining reporting for regulatory audits (SOX, IFRS9) and improving query performance for finance BI users.

Automated deployment and containerization of data ingestion services using Docker and integrated CI/CD pipelines via Jenkins, reducing release cycle times by 50% in Agile sprint environments.

Integrated semi-structured KYC, CRM, and account feed data from MongoDB into structured relational models using Python (Pandas) and data wrangling workflows, improving availability for AML and compliance analytics.

Led reconciliation and data governance routines for financial master data, using Talend and SQL validations to ensure completeness and accuracy across P&L, balance sheet, and GL datasets.

Collaborated with risk analysts to implement data mining and statistical profiling methods in R, identifying suspicious account behaviors and contributing to early fraud detection across customer portfolios. PROJECTS

Real-Time Fraud Analytics Platform PySpark, Apache Kafka, Apache NiFi, Apache Airflow, PostgreSQL, Tableau, Informatica, Git

Built ingestion and ETL pipelines using Apache Kafka, NiFi, and Airflow to process 60M+ daily transactions from mobile and card systems. Transformed data using PySpark and stored it in PostgreSQL with CDC-based tracking. Created fraud alerting dashboards in Tableau and implemented data quality rules in Informatica. Enabled rule- based flagging for high-risk behavior with dynamic thresholds.

Customer Churn & Usage Pattern Pipeline Talend, SSIS, Apache Spark, MongoDB, Oracle, Snowflake, Power BI, Python, Pandas, Seaborn, Jenkins, GitHub

Designed robust ETL workflows using Talend and SSIS to process customer activity and call records from MongoDB and Oracle. Leveraged Apache Spark for distributed processing and stored cleaned datasets in Snowflake. Built predictive churn models using Python (Pandas, Seaborn) and automated visual reporting via Power BI. Ensured code versioning and pipeline integration using Jenkins and GitHub.

Student Behavior Analytics & Learning Prediction Python, SQL Server, MySQL, Scikit-learn, SSRS, Plotly, Matplotlib, Git, Dataiku, Data Mining

Built batch and streaming pipelines using SQL Server, MySQL, and Python to unify student activity, course progress, and assessments. Modeled learning outcomes using Scikit-learn and visualized trends using Plotly, Matplotlib, and SSRS. Applied data mining techniques to identify dropout patterns and built automated reporting pipelines. Managed data validation and schema control through Git-based CI/CD workflows.

Contact this candidate