MIDHUN PUTCHAKAYALA
******.************@*****.***
https://www.linkedin.com/in/midhun-putchakayala/
Data Engineer with 17+ years of experience in building and optimizing scalable data pipelines, ETL workflows and cloud-based data lake house solutions.
Proficient in Dimensional Modelling including Star & Snowflake Schema, and adept in ETL/ELT tools such as Databricks, Azure Data Factory, Informatica and Snowflake for complex data integration.
Demonstrated capability in designing and managing Azure and Databricks Delta Lake platforms using medallion architecture and Delta Live Tables for optimized data workflows.
Extensive experience in Apache Spark and Python for processing large datasets and building scalable, high-performance data pipelines.
Collaborated with cross-functional teams to gather requirements, design solutions, and provide technical expertise on AWS and Azure cloud architecture and services.
Experience in data quality assurance, Data Profiling, mapping, and implementing robust validation frameworks for seamless integration and high data integrity
Practiced servant leadership to mentor and empower engineering teams, fostering professional growth and innovation.
Good understanding of Microsoft Data Fabric, Data mesh and Delta Lake platforms, medallion architecture.
Proficient in managing NoSQL databases like MongoDB and Azure Cosmos DB, alongside
Strong communicator with the ability to translate complex business requirements into actionable technical solutions and well-defined sprint backlog.
Collaborative team player, experienced in engaging with cross-functional teams and stakeholders to align technical initiatives with organizational goals. Skills:
Cloud Platforms: Microsoft Azure (Databricks, Data Factory, Synapse Analytics), Amazon Web Services (Glue, EMR, S3, etc.), Snowflake.
Data Modeling: Dimensional Modeling, Data Vault, Star and Snowflake Schemas
ETL/ELT Tools: dbt, Databricks, Azure Data Factory, Microsoft Fabric, Informatica PowerCenter/Cloud, SSIS, Delta Live Pipelines.
Programming Languages: Python, SQL, T-SQL, PL/SQL, SOQL.
Big Data Tools: Apache Spark, Hadoop, Delta Lake.
Visualization Tools: IBM Cognos Analytics, Tableau, Power BI.
Database Systems: SQL Server, DB2, Oracle, Netezza, Cosmos DB, Mongo DB.
Other Tools: Kafka/Event Hub, Erwin Data Modeler, CI/CD, Terraform, YAML, Azure DevOps.
Architecture Expertise: Lakehouse Architecture, Medallion Architecture. Academics & Trainings:
Bachelor of Technology (Electrical and Electronics Engineering) passed out with distinction in May-2007.
Data visualization and Storytelling from North-eastern University - 2021.
Presented in an IBM Cognos Analytics Conference - 2021.
DATA + AI Summit 3-day workshop - 2024.
Accomplishments:
Received Impact Award at Bangor Savings Bank for a project that helped Bank with a merger/acquisition.
Earned Star Award at Capgemini for delivering complex projects within tight timelines.
Sustained Edge Award for major data warehouse integration at UnitedHealth Group
(Optum).
Work History
Medical Solutions, Raleigh, NC, July 2022 to Jan 2025 Lead Data Engineer
Defined lake house architecture to support raw(unstructured)/bronze (raw as SCD2)/silver
(light transformations)/ gold (analytical) use cases.
Designed and implemented scalable data pipelines using Databricks and dbt, reducing ETL runtime by 40%. Developed and optimized bronze/silver/gold Lakehouse layers, improving data quality and accessibility for analytics teams.
Spearheaded and mentored a team of 8 data engineers, providing them the data models and transformation definitions.
Created data models for cross brand reporting for semi-structured data coming from datahub.
Optimized Azure Data Factory and Databricks pipelines, reducing cloud infrastructure costs by $2,000/month. Improved query performance by 60% through indexing and partitioning strategies.
Developed and implemented in-house validation frameworks and environment cloning tools, saving the team over 20 hours per month.
Optimized Azure costs by improving Data Factory pipelines and ADLS storage through automation and process enhancements.
Designed scalable data pipelines, Azure Synapse, Azure Databricks, and Power BI for seamless data transformation and integration across diverse systems.
Established validation frameworks to address late-arriving dimensions, orphan surrogate keys, and other data quality issues.
Designed and implemented a unified data fabric architecture, integrating on-premises and cloud platforms (Azure, AWS) to enable seamless data access, governance, and real-time analytics.
Championed automation strategies, streamlining data operations using Azure Functions, PowerShell scripting, and CI/CD practices.
Provided technical guidance to stakeholders, aligning data solutions with strategic business objectives
Pioneered automation and optimization strategies for data operations, leading to streamlined workflows, reduced manual interventions, and increased efficiency by leveraging Azure Functions, PowerShell scripting, and CI/CD practices. Bangor Savings Bank, Bangor ME, Sep 2017 to July 2022 Senior Data Modeler /Data Engineer
Built comprehensive data models, ETL packages, and Cognos Framework Packages to develop Scorecards for various leadership committees, encompassing KPIs for Loans, Deposits, Accounting, and IT departments.
Designed and implemented Dimensional Models, ETL processes, and BI packages for critical dashboards such as Workday DM, Wealth Management Group Performance, Business Loans Underwriting, Payroll Protection Program, and Customer Demographics.
Worked on a confidential project reporting directly to the CEO’s office, analyzing internal and public domain data to summarize market situations before a merger and evaluate market share.
Created a robust Party Model for internal CRM, identifying and resolving data quality issues during initial and ongoing data loads, establishing party merging processes, and integrating customer relationships and interactions into model tables.
Developed data pipelines to flatten unstructured JSON data via APIs provided by external vendors such as Pay with, Marqeta, and Moody's.
Conducted data analysis to define parameters for reassigning 400,000+ customers to 50+ branches, collaborating with various business departments to estimate true branch values.
Leveraged Python and Google Distance Matrix API to calculate customer proximity to branch locations, identifying nearest branches for optimal service.
Built a data lake for lending cloud initiatives, creating a business conformance layer for reports like underwriter productivity and average processing times.
Created BUOY Local/Prepaid Cards reports and dashboards in Cognos Analytics, encompassing data modeling, ETL, and dataset creation.
Conducted data profiling and analysis to identify and resolve data quality issues using advanced SQL techniques.
IDEXX Laboratories (Employer: Tabner Inc.), Westbrook, ME, Mar 2016 to Sep 2017 Data Architect/ETL Lead
Informatica Intelligent Cloud Services (IICS), Salesforce, SQL, Relational Junction, Informatica Power Center, Data Architecture, REST API’s, Master Data Management, Data modeling, GIT, CI/CD, Metadata & Data Lineage, Data Quality, SOQL, Operational Data Store
(ODS), JSON, Streaming Data, Data Profiling.
Zurich Insurance, UK (Employer: Capgemini), Hyderabad, India, Nov 2014 to Jan 2016 ETL Lead /Architect
Guidewire, SQL, PL/SQL, IBM IIW (Insurance Information Warehouse), Data Vault modeling, Informatica Power Center, SSIS, Test Data Management, Team Leader, Offshore delivery management, XML.
Optum (Formerly United Health Care Information Services), Mar 2010 to Nov 2014 ETL Lead
Team Leader, Informatica Power Center, SQL & T-SQL, DB2, Netezza, MicroStrategy, SSIS, Dimensional Modeling, Erwin.
Mahindra Satyam (Client: GE Energy), June 2007 to Mar 2010 ETL Developer
Informatica Power Center, Oracle, Cognos, Business Objects, SQL, PL/SQL.