Data Engineer Azure

Location:

Jersey City, NJ

Posted:

July 03, 2025

Contact this candidate

Resume:

MANOJ PALADI (He/ Him/ His)

Jersey, NJ 716-***-**** *************@*****.*** linkedin.com/in/manojpaladi/ github.com/Manoj5049/ Professional Summary

Data Science professional with 3.5+ years of experience in data warehousing, ETL automation, data management, AI/ML, and lakehouse architecture on cloud and big data platforms. Skilled in delivering scalable solutions aligned with functional and business requirements across diverse tech stacks. Work Experience

HSBC, United States Data Engineer Sep 2024 – Present

• Developed scalable ETL pipelines using Databricks notebooks and Azure Data Factory to ingest, transform, and load over TB’s of structured and unstructured data from multiple sources and external APIs into Snowflake, designing layered architecture.

• Engineered parameterized Databricks notebooks to ingest and transform financial datasets into Delta Lake tables on Azure Data Lake, streamlining 15 recurring workflows and enabling downstream analytics for treasury and risk teams.

• Tuned Spark job performance by optimizing joins, caching, and partitioning strategies, reducing runtime by 45% across high-volume pipelines.

• Automated anomaly detection & data validation checks, decreasing bad input rates to ML models by 70% and cutting manual QA effort in half.

• Developed custom lineage tracking and data auditing layers for curated datasets using Python and SQL in Azure Data Lake and Databricks, enhancing visibility and regulatory compliance for 100+ datasets across finance and operations teams.

• Resolved 8 critical production pipeline failures by debugging transformation logic and performing root-cause analysis using job logs and execution UI, restoring full pipeline functionality within 1 hour and ensuring 24/7 SLA compliance.

• Implemented row-level access, field encryption, and audit logging in Snowflake to meet internal security policies and compliance standards.

• Built CI/CD-enabled DevOps pipelines using Azure Functions, with integrated Pytest-based unit testing, Azure Monitor logging, exception monitoring, and technical documentation supporting over 10 workflows for traceability and reusability. AXIS MY INDIA, INDIA Data Engineer Sep 2021 – Jan 2023

• Migrated over 20 TB of data from on-prem systems to Snowflake by designing robust ETL pipelines using PySpark in Databricks and Azure Data Factory, employing batch processing techniques and schema optimization, reducing retrieval time from 2 days to 15 minutes.

• Engineered automated ETL pipelines integrating AWS S3, Redshift, REST APIs, and flat files (JSON, CSV) to ingest campaign data into the data warehouse, achieving 99% pipeline uptime and reducing insight delivery time by 40%.

• Monitored PySpark-based pipelines via Azure Monitor and Databricks Jobs, implementing automated alerts, data quality checks, and error- handling scripts, reducing failure rates by 50% and ensuring real-time resolution.

• Developed Python-based data validation scripts for schema drift detection, null profiling, and constraint checks; embedded within workflows, reducing downstream reporting errors by 35%.

• Modeled survey data in Snowflake, designing fact and dimension tables that improved query performance by 40% and reporting.

• Developed automated web scraping pipelines using Python (BeautifulSoup, Requests) to extract real-time public data from news portals and state government websites for market sentiment analysis, enriching primary datasets for election studies and brand tracking.

• Implemented SQL procedures along with Snowflake tasks and streams to automate change data capture (CDC), data validation, deduplication, and transformation, enhancing data governance and quality by reducing inconsistencies in business reporting by 40%.

• Collaborated with clients on 12+ market research projects to understand business objectives, reporting requirements, define metrics, and develop predictive models (regression, classification, hypothesis testing) using SQL, Python, and R to deliver actionable, data-driven solution.

• Developed 15+ dashboards in Power BI using DAX and Power Query M, implementing UI/UX-aligned visual designs and embedded analytics capabilities using drill-throughs, bookmarks, enabling business units to monitor campaign KPIs and reduce reporting turnaround time by 40%. National Institute of Technology Puducherry, INDIA Data Analyst/Engineer Dec 2020 – Aug 2021

• Streamlined data migration to SQL Server by developing ETL pipelines using Python scripts, implementing schema validation, pre-deployment sandbox testing, and data auditing, reducing downtime to less than 1 hour and ensuring 99.5% accuracy.

• Automated reporting workflows with SSIS and Python, saving 400+ annual hours for 10+ departments.

• Applied advanced SQL techniques in SQL Server including stored procedures, triggers, filtered joins, and partitioned views to test and optimize data pipelines, improving query performance by 30% and reducing compute cost. Projects

Vehicle Coupon Recommendation System: Pandas, Scikit-learn, Matplotlib, Streamlit, DagsHub, Feb 2024 – Apr 2024

• Utilized an 8-step data science pipeline to predict coupon acceptance, improving selection strategies and boosting acceptance.

• Predicted coupon acceptance using a modular ML pipeline with classification models and hyperparameter tuning, improving accuracy by 20%.

• Logged experiments with MLflow and deployed a containerized FastAPI backend integrated with a Streamlit front-end for real-time prediction. Customer Churn Prediction & Revenue Impact Analysis: Python, SQL, Scikit-learn Oct 2023 – Dec 2023

• Achieved 85% churn prediction accuracy using decision trees, statistical tests (ANOVA, and Chi-square); identified 3 key behavioral drivers.

• Applied PCA for feature selection, dimensionality reduction, improving model stability and increasing retention strategy effectiveness by 15%. Education

Master of Science in Data Science; GPA: 4.0/4.0 University at Buffalo, NY Jan 2023 – May 2024 Courses: Data Intensive Computing, Deep Learning, Machine Learning, Analysis of Algorithms, Data Models and Query Language. Technical Skills

Programming Languages: Python, C, C++, SQL, R, SAS, Java, HTML, MATLAB. Libraries: Pandas, NumPy, Scikit-Learn, Matplotlib, Seaborn, PyTorch, TensorFlow, Requests, Flask. ML/AI: Linear and Logistic regression, Clustering, SVM, Random Forest, KNN, K-means, ensemble learning, XGBoost, NLP, deep learning. Data Engineering Tools: Databricks, Hadoop, Spark, PySpark, SSIS, DBT, Airflow, Docker, Kubernetes, Kafka, MLflow, MLOps. Databases and Cloud Services: SQL Server, PostgreSQL, MySQL, Snowflake, DynamoDB, Azure (Fabric, Onelake), AWS (Glue, S3, Lambda). IDE & Build Tools, Version Control: Jupyter Notebook, Jupyter Lab, Visual Studio Code, Google Colab, GIT, GitHub, CI/CD, JIRA. Data Visualization Tools: Power BI, Tableau.

Contact this candidate