SAI VARUN BUCHUPALLI
Texas, USA 469-***-**** *****************@*****.***
SUMMARY
Experienced Data Engineer with over 3 years of a strong background in building scalable data pipelines and cloud-native ETL solutions across banking and AML domains. Skilled in PySpark, Informatica, Teradata, and cloud platforms like GCP, AWS, and Azure, with hands-on expertise in processing 50M+ transactions weekly for real-time compliance monitoring. Proven ability to optimize SQL/PLSQL workflows, integrate diverse data sources, and develop containerized solutions using Docker and Kubernetes. Adept at collaborating with cross-functional teams to deliver secure, high-performance, and audit-compliant data infrastructure in Agile environments.
SKILLS
Programming & Scripting: Python, SQL, R, PySpark
Business Intelligence & Visualization: Power BI, Tableau, Looker, MS Excel, DAX Cloud & Data Platforms: AWS (S3, EC2, Redshift, Lambda, IAM, Glue), GCP (BigQuery, Dataflow, Dataplex), Azure Data Lake, Snowflake, Teradata, HL7 Data Engineering & Automation: ETL, Apache Airflow, FastAPI, Streamlit, API Integration, Data Warehousing, XLR, Juncines, Ottosis Statistical & Analytical Tools: NumPy, Pandas, Scikit-learn, SciPy, Matplotlib, Seaborn, Plotly, NLTK, A/B Testing, Regression, Forecasting Project & Collaboration Tools: Jira, Confluence, Git, GitHub, Agile, Scrum, Waterfall, Lean, Six Sigma WORK EXPERIENCE
Data Engineer & Analyst Mar 2023
Bank of America
• Architected and implemented enterprise-grade data pipelines using PySpark, Informatica, Teradata, and GCP/AWS for structured and unstructured banking datasets, ensuring high data integrity, integration, and distribution across business units.
• Led modernization initiatives to migrate legacy AML workflows into scalable big data frameworks, reducing processing latency by 30% and enabling real-time compliance monitoring.
• Developed sophisticated ETL workflows integrating diverse data formats (JSON, Parquet, relational DBs, cloud sources) into centralized repositories, improving downstream analytics readiness.
• Collaborated with stakeholders to translate AML and compliance requirements into optimized data models and SQL-based rules, boosting detection accuracy by 25% while lowering false positives.
• Designed modular orchestration frameworks leveraging layered schema logic, enabling process automation and reusability across multiple AML use cases.
• Built and maintained enterprise data management tools to curate critical customer and transaction datasets, enforcing governance, auditability, and compliance controls.
• Applied object-oriented programming (Python/Java) to develop transformation scripts, reusable components, and REST API integrations for automated data delivery pipelines.
• Utilized containerization (Docker, Kubernetes) for portable, cloud-ready ETL deployments, ensuring efficient scaling and resilience.
• Partnered with risk analytics and data science teams to integrate statistical models and scoring algorithms into production data pipelines.
• Worked in Agile/Scrum environments, contributing to sprint planning, backlog grooming, and iterative enhancements for data infrastructure.
• Ensured adherence to enterprise data management principles, security best practices, and regulatory requirements (AML, KYC, GDPR). Data Engineer & Analyst July 2019 - June 2021
KPMG
• Developed SQL reports to analyze claims and financial transactions, improving fraud detection accuracy by 20% through anomaly identification.
• Built Python automation scripts for repetitive SQL tasks, significantly reducing manual workload and improving reporting efficiency.
• Conducted enterprise-wide analysis of operational and claims datasets to identify performance trends and root-cause issues.
• Partnered with data engineering teams to integrate structured and unstructured data (Google Analytics, HL7, API logs) into a centralized data warehouse, enhancing data availability and reportability.
• Utilized Python (Pandas, NumPy) for preprocessing and cleaning unstructured data to ensure seamless integration into ETL pipelines.
• Optimized complex SQL queries, reducing execution time by 40%, which improved fraud detection systems and real-time transaction tracking.
• Created interactive dashboards using Tableau, Power BI, and Looker to visualize cost optimization trends, patient outcomes, and KPIs.
• Applied statistical techniques (A/B Testing, Regression, Forecasting) to develop predictive models for patien readmissions and cost patterns.
• Enhanced data quality by 15% using GCP Dataplex for profiling and cleansing, improving overall reliability of analytics.
• Maintained detailed documentation of data sources, ETL processes, and dashboard specifications in Confluence to support data governance.
• Collaborated cross-functionally with clinicians, business analysts, and IT teams to define KPIs and reporting requirements.
• Supported Agile methodologies to ensure rapid iteration and delivery of business insights and reports.
• Leveraged tools like Git and GitHub for version control and documentation management in analytics projects. EDUCATION
The University of Texas at Dallas - Dallas, Texas, USA Master of Science Aug 2021 - May 2023
• Relevant Coursework: Business Analytics, Data Analytics Amrita Vishwa Vidyapeetham - India Bachelor of Technology Aug 2015 - May 2019