Data Engineer - ETL, Cloud Data Warehousing, Dashboards

Location:

Missouri City, MO

Salary:

65000

Posted:

December 08, 2025

Contact this candidate

Resume:

MANIKANTA PUDOKA

Location: MO Email: ***************@*****.*** Phone: +1-314-***-**** GitHub LinkedIn PROFESSIONAL SUMMARY

Results-driven Data Engineer with 3+ years of experience designing, building, and optimizing data pipelines, ETL workflows, and analytics ecosystems across cloud platforms. Skilled in Python, SQL, and Apache Airflow for data ingestion, transformation, and automation. Strong background in data warehousing (Snowflake, Redshift, BigQuery) and data modeling (Star and Snowflake Schema) to power BI, reporting, and machine learning pipelines. Proficient in developing interactive dashboards using Power BI, Tableau, and Looker to translate complex data into actionable insights. Hands-on experience with AWS, Azure, and GCP, ensuring scalable, reliable, and secure cloud data infrastructure. Adept in data validation, pipeline optimization, and cross-functional collaboration, driving data- driven decision-making across engineering, research, and business teams. SKILLS

● Data Engineering: SQL (PostgreSQL, Redshift, BigQuery), Python (pandas, pyarrow), ETL & orchestration (Apache Airflow, Azure Data Factory, dbt), Data Warehousing & Modeling (Snowflake, Star/Snowflake Schema), PySpark, Object Storage (AWS S3, GCS), API Integration, Automation Scripts

● Data Analysis: Exploratory Data Analysis (EDA), Advanced Excel (Pivot Tables, Formulas), SQL Analytics & Window Functions, Dashboarding & Reporting (Power BI, Tableau, Looker), KPI Design, Trend & Root-Cause Analysis, A/B Testing, Hypothesis Testing, Stakeholder Storytelling

● Data Science: pandas, NumPy, scikit-learn, Jupyter, Feature Engineering, Cross-Validation, Regression & Classification Models, Model Evaluation (Precision, Recall, ROC, RMSE), Basic ML Pipelines

● Cloud & DevOps: AWS (S3, Lambda), GCP, Azure (Fundamentals), Git, Docker (Basic), Bash, VS Code

● Databases & Storage: PostgreSQL, MySQL, Snowflake, Redshift, BigQuery, AWS S3, GCS

● Visualization & ML Libraries: Power BI, Tableau, Matplotlib, Seaborn, scikit-learn, MLflow (Basic) PROFESSIONAL EXPERIENCE

McKinsey & Company, USA Data Engineer USA Sep 2025 – Present

● Orchestrated multi-source ETL pipelines using Apache Airflow, enhancing data reliability and cutting daily job failures from 12 to under 3, which improved end-to-end data delivery SLAs for client analytics projects.

● Refactored analytical data models in Snowflake, reducing query latency from 90s to 28s and boosting performance for ad-hoc SQL analytics and executive dashboards across cross-functional consulting teams.

● Engineered reusable transformation scripts in Python (pandas, pyarrow) to process 8M+ transactional records monthly, ensuring data quality, lineage, and validation for client engagement databases.

● Deployed automated ingestion pipelines into AWS S3 via API connectors, scaling centralized data lake storage by 1.5 TB and improving data accessibility for downstream analytics and machine learning models.

● Designed interactive KPI dashboards in Tableau, integrating SQL window functions to automate performance tracking for 20+ senior stakeholders, strengthening data-driven decision-making in strategy reviews. Saint Louis University Graduate Research Assistant - Health Data Science USA Jan 2025 – May 2025

● Engineered automated ETL pipelines using Apache Airflow to clean, preprocess, and integrate ~1.2M electronic health records

(EHRs), shortening monthly refresh cycles by 10 hours and improving data reproducibility across multi-source clinical datasets.

● Queried structured hospital data with SQL (Postgres) to perform statistical and trend analysis on 12,000+ patient encounters, identifying factors linked to extended recovery periods and supporting ongoing outcome-based healthcare studies.

● Developed predictive classification models using scikit-learn with feature engineering and model evaluation (ROC/AUC, precision, recall), raising readmission prediction accuracy from 0.74 to 0.81, directly informing patient risk stratification research.

● Designed interactive Power BI dashboards visualizing population health trends, treatment outcomes, and KPIs, providing 4 faculty research teams with real-time access to insights for evidence-based discussions and publication data preparation.

● Structured and standardized clinical datasets in Snowflake using Star schema data modeling, reducing average query time from 18s to 11s and enhancing data accessibility for cross-department collaborations in health data science projects. Tata Consultancy Services (TCS) Data Engineer INDIA Jul 2021 – Dec 2022

● Architected production-grade ETL pipelines in Apache Airflow, orchestrating ingestion of ~8M daily records and improving data delivery timelines by 15% for national digital service platforms.

● Revamped complex PostgreSQL queries and indexing strategies, reducing query execution time by 35% while supporting real-time analytics for financial inclusion dashboards.

● Orchestrated distributed data processing with PySpark, automating transformation and quality validation that eliminated 18+ manual hours per week and improved data integrity for operations reporting.

● Designed modular data marts in dbt, enforcing schema consistency, lineage documentation, and test automation across five concurrent transformation projects, enhancing governance standards.

● Deployed dynamic Power BI dashboards, translating high-volume transactional data into actionable KPIs that enabled regional administrators to cut service turnaround time by 10%. Infosys Data Science, Intern INDIA May 2021 – Jul 2021

Developed and fine-tuned regression and classification models using Python (scikit-learn), enhancing data-driven forecasting accuracy by 8% and enabling better revenue trend predictions for retail analytics teams.

Orchestrated automated ETL pipelines through Apache Airflow, streamlining data ingestion from Snowflake and APIs, saving 12+ hours weekly in manual maintenance, and improving overall data reliability.

Executed end-to-end Exploratory Data Analysis (EDA) with pandas and NumPy, performing data cleaning, preprocessing, and outlier detection that uncovered actionable customer behavior insights.

Visualized operational and financial KPIs using Power BI, improving dashboard refresh cycles by 65%, and helping stakeholders monitor churn, profitability, and performance trends in real time.

Assessed ML model performance in Jupyter Notebook using metrics like ROC-AUC, Precision, and RMSE, ensuring statistical robustness and model readiness for deployment across multiple datasets. Adani Data Analyst INDIA May 2020 – May 2021

● Optimized large-scale logistics data pipelines using SQL (PostgreSQL), reducing vessel turnaround time by 6 hours and improving operational efficiency across 4 major ports.

● Visualized renewable energy performance trends in Power BI, enhancing KPI transparency and cutting executive reporting cycles by 20%.

● Streamlined cross-system data ingestion with Apache Airflow, consolidating SCADA and ERP feeds and reducing manual workload by 10–12 hours weekly.

● Interpreted solar plant performance data using Python (pandas) and EDA techniques, detecting 3 recurring fault patterns that guided predictive maintenance planning.

● Engineered and maintained Snowflake data models (Star Schema) for financial analytics, boosting query speed by 35% and accelerating revenue forecasting for the energy division. EDUCATION

Saint Louis University USA

Master of Science in Health Data Science Jan 2023 – May 2025

Contact this candidate