Mallikarjun Goud Kataram
Data Engineer
**********************@*****.*** 972-***-**** 48310 Sterling Heights, MI LinkedIn Professional Summary
Versatile and performance-focused Data Engineer with 4+ years of experience designing, building, and scaling real- time data pipelines and cloud-native data platforms in enterprise environments including Uber, KPMG, and Trigent. Proven expertise in Python, PySpark, SQL, Kafka, and Apache Airflow, with hands-on deployment experience across AWS and Azure ecosystems. Adept at driving initiatives in data governance, CI/CD automation, and ETL optimization, resulting in measurable gains in pipeline reliability, job efficiency, and compliance readiness (SOX, GDPR). Skilled in collaborating across Data Science, DevOps, and Analytics teams to deliver production-grade, low- latency data systems that power business intelligence, ML pipelines, and operational decision-making at scale. Education
Master of science in Information systems,
Central Michigan University
May 2024 Mount Pleasant, MI
Professional Experience
Data Engineer, Uber Jul 2024 – present CA
•Architect and optimize scalable data pipelines using PySpark, Airflow, and Kafka, enabling real-time analytics and cutting batch processing time by 28%.
•Spearhead migration of critical ETL workloads to AWS Glue and Redshift, improving job reliability and scalability while reducing infrastructure costs by 18%.
•Engineer robust data models for consumption by Power BI and Tableau, empowering cross-functional teams with actionable insights and boosting report generation speed by 35%.
•Design and implement custom workflows with Apache Airflow and Prefect, automating complex dependencies across Teradata, BigQuery, and Azure Synapse.
•Leverage Python and SQL to orchestrate massive data transformation jobs across Hadoop, delivering enriched datasets to downstream ML pipelines.
•Collaborate with data science teams to surface key metrics by streamlining data ingestion from AWS S3, APIs, and internal sources into unified data lakes.
•Enhance system observability and alerting by integrating CloudWatch and DataDog, reducing issue detection time by 40%.
•Drive real-time event processing using Kafka Streams, optimizing log aggregation and data partitioning for improved latency.
•Lead initiatives on data governance and compliance, ensuring 100% audit-ready data pipelines for internal and external regulatory checks.
•Implemented comprehensive data governance frameworks ensuring SOX compliance and GDPR data privacy regulations, establishing data quality monitoring protocols that improved audit readiness by 95%.
•Increased pipeline reliability by 42% and accelerated release cycles by 30% by redesigning ETL architecture and incorporating CI/CD in AWS Cloud infrastructure.
Data Engineer Intern, KPMG Jul 2023 – Jan 2024 NY
•Built automated data pipelines using SQL, Apache Airflow, and Python, decreasing manual intervention and increasing pipeline efficiency by 25%.
•Integrated data from multiple sources including Azure Data Factory, S3, and PostgreSQL to support executive dashboards in Power BI.
•Developed analytical models to forecast trends using Python and Spark, contributing to revenue optimization strategies across verticals.
•Designed and deployed ML model pipelines using MLflow and automated model monitoring workflows, reducing model deployment time by 50% and ensuring model performance tracking in production environments.
•Performed end-to-end data migration and validation from on-premise systems to Azure Synapse, ensuring 99.8% data accuracy.
•Implemented performance tuning on ETL jobs, reducing job run time by up to 40% across high-volume datasets.
•Created custom monitoring dashboards using Tableau, enabling leadership to track project KPIs in real time.
•Collaborated on cross-functional projects using Agile methodology and delivered bi-weekly updates to stakeholders, driving transparency and alignment.
•Reduced report delivery time by 35% and improved data refresh intervals by 50%, by optimizing data flow between Power BI and source systems.
Data Engineer, Trigent Aug 2020 – Nov 2022 India
•Designed and developed scalable ETL pipelines using Python, Spark, and Kafka, handling large-scale daily data ingestion from multiple sources
•Migrated legacy data processes to AWS Cloud and Redshift, slashing infrastructure costs by 12% and improving system uptime to 99.5%.
•Constructed star and snowflake schemas for the enterprise data warehouse, facilitating seamless analytics for over 100 stakeholders.
•Tuned complex T-SQL and PL/SQL queries in Teradata, enhancing query performance by 20% on production workloads.
•Orchestrated job scheduling via Apache Airflow, ensuring consistent execution of 200+ interdependent batch jobs daily.
•Visualized performance trends and data quality metrics using Power BI, supporting the leadership in strategic decision-making.
•Collaborated closely with QA and DevOps to implement CI/CD pipelines for data validation and deployment in AWS environments.
•Increased data availability by 40% and reduced latency in reports by 25%, through efficient pipeline redesign and cloud optimization.
•Implemented data governance frameworks and quality assurance protocols, establishing automated data profiling and validation checks to ensure 99.8% data accuracy across enterprise datasets.
•Built real-time streaming data processing infrastructure using Kafka Streams, implementing event processing capabilities with optimized partitioning strategies to handle peak loads efficiently. Projects
VEHICLE MAINTAINANCE TRACKER
•Developed a GUI-based application using Python (Tkinter) for managing and tracking vehicle service schedules.
•Implemented SQLite3 as the backend for efficient local database management and service history storage.
•Enabled users to schedule maintenance, receive automated service reminders, and track expenses proactively.
•Designed features for multi-vehicle management, allowing centralized monitoring for individuals and fleet administrators.
•Ensured data integrity and ease of access through structured input forms and search functions.
•Integrated expense logging and historical tracking to support budgeting and cost-effective maintenance decisions.
•Enhanced user convenience by allowing access to records from multiple locations. Technical Skills
Cloud Platforms — AWS (S3, Redshift, Glue, EC2, CloudWatch), Azure (Synapse, Data Factory) Programming — Python, SQL, T-SQL, PySpark, UNIX Shell Scripting Machine Learning & AI Operations — MLOps, MLflow, Model Deployment, Feature Engineering, Model Monitoring, CI/CD for ML Pipelines
Data Engineering — Apache Airflow, Prefect, ETL Pipelines, Data Modeling, Data Warehousing, Data Migrations Big Data — Apache Spark, Hadoop, Kafka, Kafka Streams Data Governance & Quality — Data Governance, GDPR/CCPA Compliance,Data Quality Management, Data Lineage & Observability
Databases — Teradata, BigQuery, MySQL, PostgreSQL
Data Visualization — Power BI, Tableau
Version Control & CI/CD — Git, Jenkins
Methodology — Agile, Scrum
Certificates
Lean Six Sigma Green Belt Certification
Fabric Data Engineer Associate
Microsoft Certified