Sathish Nallamekala
Data Engineer
Alabama, USA *******.************@*****.*** +1-659-***-**** LinkedIn PROFESSIONAL SUMMARY:
Data Engineer with over 4+ years of experience specializing in the design, implementation and optimization of scalable data pipelines for high-stakes financial and e-commerce environments.
Expert in leveraging PySpark and Apache Spark for complex ETL/ELT workflows and proficient in real-time data streaming using Apache Kafka and Azure Event Hubs.
Proven ability to migrate legacy systems to modern cloud architectures, significantly improving processing efficiency and reducing infrastructure costs.
Demonstrated command of Azure and AWS ecosystems, including Azure Data Lake, AWS S3, AWS Glue and AWS Kinesis.
Adept at building robust data warehouses on Snowflake and Amazon Redshift with advanced modeling techniques like Star Schema and Data Vault 2.0.
Possesses a strong track record in automating and orchestrating complex workflows using Airflow DAGs and dbt to ensure regulatory compliance and high data accuracy.
Skilled in implementing comprehensive data governance and security controls, including PII/PCI masking and lineage tracking with tools like Collibra and Great Expectations.
Experienced in performance tuning Spark jobs, developing containerized applications with Docker and Kubernetes and creating BI solutions with Prometheus and Grafana.
A collaborative problem-solver with a passion for building reliable, scalable and secure data platforms that drive business insights.
TECHNICAL SKILLS:
Programming & Scripting: SQL (T-SQL, PL/SQL, Spark SQL, Snowflake SQL), Python (PySpark, pandas, NumPy, SciPy), Scala, R (dplyr, data.table), Java (Big Data Processing), Shell Scripting Big Data & Distributed Processing: Apache Spark, PySpark, Hadoop, Hive, MLlib, Databricks, Delta Lake Cloud Platforms: Microsoft Azure (Data Lake, Event Hubs), AWS (S3, EMR, Glue, Kinesis, Redshift), Snowflake
ETL & Data Integration: Apache Airflow, dbt, Informatica, Talend, AWS Glue Streaming & Real-Time Processing: Apache Kafka, Azure Event Hubs, AWS Kinesis Databases & Data Warehousing: Snowflake, Teradata, Oracle, Amazon Redshift, Delta Lake Data Modeling: Dimensional Modeling (Star Schema, Snowflake Schema), Data Vault 2.0, Normalization & Denormalization
Data Governance & Quality: Collibra, Great Expectations, dbt Tests, Metadata Management, Lineage Tracking Regulatory & Compliance: Basel III, CCAR, BCBS 239, SOX, PCI DSS, Data Encryption, Tokenization, Data Masking Containerization & DevOps: Docker, Kubernetes, Git, GitHub, CI/CD Pipelines Monitoring & Observability: Prometheus, Grafana, Slack Integrations BI & Visualization: Tableau, Power BI (for data validation, regulatory reporting dashboards) Tools & Collaboration: Jira, Confluence, MS Visio
PROFESSIONAL EXPERIENCE:
Morgan Stanley – GA, USA August 2024 – Present
Data Engineer
Designed and implemented PySpark-based ETL pipelines in Databricks to ingest and process structured and unstructured financial datasets including trading, credit risk and derivatives transactions.
Optimized credit risk data pipelines to handle daily trading and exposure feeds, reducing processing time from 9 hours to 4.5 hours and enabling faster downstream analytics.
Engineered real-time data ingestion frameworks using Kafka and Azure Event Hubs to capture Bloomberg and Reuters market feeds for trading desks and risk teams.
Built and optimized Snowflake data warehouse schemas (star schema, Data Vault 2.0) to support liquidity risk, stress testing and capital adequacy reporting dashboards.
Automated regulatory reporting pipelines for Basel III, CCAR and BCBS 239 compliance using Airflow DAG orchestration with dbt-based transformations.
Achieved 99.97% accuracy in regulatory and risk-adjusted capital reports through advanced data quality checks, lineage tracking and metadata governance in Collibra.
Conducted Spark job performance tuning (broadcast joins, partitioning, caching strategies), improving pipeline efficiency for large-scale derivatives and fixed-income datasets.
Integrated Murex trade data with Moody’s Analytics feeds to enable credit exposure calculations, PD/LGD modeling and risk-adjusted capital allocation.
Implemented enterprise-grade data security controls, including encryption, tokenization and masking of sensitive PII/PCI data to ensure PCI DSS and SOX compliance.
Migrated legacy Teradata and Oracle ETL workloads to a modern Azure Data Lake + Delta Lake and Snowflake architecture, reducing infrastructure costs by 35% while improving scalability. Capgemini – India June 2020 - July 2023
Data Engineer
Collaborated with cross-functional teams using Jira and Confluence to define requirements for a unified data platform tailored to Indian retail giants like Reliance Retail and BigBasket, incorporating feedback from operations and marketing stakeholders on handling festive sales surges and regional customer preferences.
Migrated legacy ETL job from Informatica to Talend, enhancing data integration for multi-channel Indian retail source, including offline store & online marketplace, improving processing efficiency by 30% for high-volume transaction in INR.
Designed and implemented scalable ETL pipelines using Apache Spark and PySpark to process daily transaction data from Indian e-commerce platforms like Flipkart and Amazon.in, integrating data from multiple sources, including customer orders, inventory logs and supplier feeds during peak events like Republic Day sales.
Developed real-time data streaming solutions with Apache Kafka and AWS Kinesis to handle live user activity data from mobile apps and websites, enabling instant updates for personalized product recommendations and dynamic pricing adjustments amid India's competitive e-commerce landscape.
Built a data lake on AWS S3 using Delta Lake for storing semi-structured data in Parquet & ORC formats from diverse retail sources, supporting advanced analytics for sales forecasting & customer segmentation across urban & rural market
Orchestrated complex data workflows with Apache Airflow and AWS Glue, automating the ingestion, transformation and loading of e-commerce data daily into Amazon Redshift for BI reporting on metrics like regional demand patterns and GST-compliant invoicing.
Optimized Hive queries on Hadoop clusters to analyze historical sales data from Indian retail chains, reducing query execution time by 40% through partitioning and bucketing techniques tailored to seasonal trends such as monsoon impacts on supply chains.
Managed data modeling with Star and Snowflake schemas in Teradata, creating dimensional models for e-commerce metrics like customer lifetime value and cart abandonment rates specific to India's tier-2 and tier-3 city shoppers.
Integrated machine learning models using MLlib and Scikit-Learn for retail demand forecasting in the Indian market, processing features from transaction histories to predict stock levels with 85% accuracy for products like ethnic wear during cultural festivals.
Implemented data quality checks with Great Expectations & DBT Tests, validating e-commerce datasets for completeness and accuracy in handling multilingual product descriptions and regional pricing, preventing downstream errors in recommendation engines.
Deployed containerized data processing applications using Docker & Kubernetes on AWS EMR, ensuring high availability for real-time inventory tracking during peak shopping seasons like Diwali and Eid in India's fast-paced retail sector.
Configured monitoring dashboards in Prometheus and Grafana to track ETL job performance, alerting teams via Slack for anomalies in data pipelines processing millions of daily transactions from Pan-Indian e-commerce operations. EDUCATION:
Master of Computer Science - Auburn University at Montgomery, Montgomery, Alabama, USA Bachelor of Electronics and Electrical Engineering - Vasireddy Venkatadri Institute of Technology Andhra Pradesh, India. CERTIFICATIONS:
Certified Python 101 for Data Science – IBM
Certified Software Engineering Virtual Experience – Walmart Forage Certified Technical Consulting Virtual Program – SAP Forage Certified Data Visualisation Program – Tata
Certified Agile Methodology – Cognizant