Arun Katta
Seattle WA 929-***-**** ***********@*****.*** LinkedIn
SKILLS
Programming Languages: Python, Scala, R, SQL, PL/SQL, SQL Plus, Transact SQL, ANSI SQL, Java, C, C++, Ruby, Unix Shell Scripting
Databases: Snowflake, PostgreSQL, AWS Redshift, Oracle, MS SQL Server, IBM DB2 UDB, Teradata, Sybase
ETL Tools: Business Objects Data Services, BO Data Integrator, SSIS, Informatica PowerCenter, Talend, Matillion
Business Intelligence Tools: Business Objects (CMC, CMS, CCM, Designer, DeskI, Web Intelligence, InfoView, Dashboard Manager, Performance Manager), Tableau, QlikView, Power BI, SAP Information Steward
Big Data & Cloud Platforms: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, HBase, Cassandra, Oozie, Flume, Zookeeper, Elasticsearch, AWS (S3, Glue, Redshift, DynamoDB, Personalize, SageMaker, Lambda, Glacier, ECS, EC2, RDS, Kinesis, Batch, CloudFormation, CloudWatch, EMR, Athena, QuickSight), Google Cloud, Azure Data Lake
Education: Western Illinois University Major: Master of Science in Computer Science Graduated: 2011
EXPERIENCE
T-Mobile Inc Bellevue, WA
Lead Data Engineer Nov 2019 – Present
●Led the design and development of a scalable, multi-terabyte data warehouse on Amazon Redshift, supporting analytics use cases across multiple business units with millions of records processed daily.
●Directed optimization efforts for complex SQL and Redshift queries, improving dashboard and reporting performance by up to 100x for analytics and BI teams using Tableau.
●Migrated on-premise databases to AWS Redshift, optimizing schema design and implementing columnar compression, resulting in faster queries and cost-efficient storage.
●Architected and implemented ETL workflows using AWS Glue, Python, and PySpark, enabling structured ingestion of semi-structured data (JSON, Parquet) into Redshift with robust error handling.
●Collaborated with ML engineers to productionize personalization models using AWS SageMaker and Amazon Personalize, improving product recommendations and user engagement metrics.
●Built CI/CD pipelines for data ingestion and transformation using AWS CodePipeline, Lambda, and S3, improving deployment speed and pipeline reliability.
●Led development of SQL and Python scripts to unlock, transform, and operationalize large-scale datasets, accelerating downstream analytics and ensuring data availability across teams.
●Implemented data governance protocols using AWS IAM, KMS, and Lake Formation, standardizing access controls, encryption, and data security across enterprise datasets.
●Provided technical leadership to a team of 5+ data engineers, conducting code reviews, mentoring junior developers, and defining best practices for coding, testing, and documentation.
●Drove project planning and delivery using Agile methodologies, coordinating with PMs and business stakeholders to prioritize data initiatives and deliver solutions within agreed timelines.
●Maintained a strong track record of project delivery, with consistent stakeholder satisfaction across data platform modernization, data quality initiatives, and pipeline performance optimization.
Costco Corp Issaquah, WA
Senior Data Engineer Sep 2018- Oct 2019
●Led the design and implementation of scalable ETL pipelines using SAP Data Services to ensure accurate data validation and transformation between source and legacy systems.
●Automated error handling and logging frameworks, enhancing operational efficiency and reducing manual remediation time by 40%.
●Built Python-based data quality checks and profiling tools, significantly improving anomaly detection and data reliability across ingestion layers.
●Collaborated with cross-functional teams, including analysts and QA, to deliver reconciled datasets for regulatory audits and business intelligence.
●Integrated SAP pipelines with AWS S3 and Snowflake, enabling centralized, cloud-native analytics and reducing data delivery time for downstream consumers.
Puget Sound Energy Bothell, WA
Data Engineer Mar 2018- Sep 2018
●Designed and implemented ETL pipelines for energy consumption analytics, enhancing data accuracy and reliability for timely reporting.
●Integrated high-volume smart meter data into AWS Redshift, cutting reporting latency by 50% and enabling faster business insights.
●Collaborated closely with data scientists to support and optimize grid load forecasting models through reliable and timely data provisioning.
●Proficient in writing complex SQL queries and Python scripts to efficiently process, transform, and analyze large-scale datasets, enhancing data pipeline performance and minimizing resource consumption.
Zurich Insurance Chicago, IL
BI/Data Engineer Nov 2017 -Feb 2018
●Led end-to-end data migration from AWS S3 to Redshift, ensuring data integrity and minimal downtime throughout the process.
●Optimized ETL pipelines to enhance processing efficiency and reduce overall data latency in analytics workflows.
●Designed and developed insightful BI dashboards using SAP HANA and Business Objects, enabling data-driven decision-making across business units.
Unilever Englewood Cliffs, NJ
Data Engineer Sep 2015-Aug 2017
●Engineered scalable big data pipelines using Apache Spark and Hive to support global supply chain analytics and improve data processing throughput.
●Led the migration of data warehouses to AWS Redshift, optimizing storage and compute resources to reduce costs by 30% without sacrificing performance.
●Implemented comprehensive data governance frameworks and automated quality checks to ensure data accuracy, consistency, and compliance across pipelines.
●Developed and deployed demand forecasting pipelines, leveraging historical and real-time data to improve forecast accuracy by 25%.
Commercial Metals Company Irving, TX
ETL Developer Jul 2014-Aug 2015
●Designed and developed ETL solutions using Informatica to support accurate and timely financial and inventory reporting.
●Created SAP integration pipelines that automated data flows, reducing manual data entry efforts by 50% and minimizing errors.
●Optimized complex SQL queries and implemented effective indexing strategies, boosting overall query performance by 35%.
●Maintained data consistency and integrity across systems to support rigorous financial audit requirements and compliance standards.
Sysco Houston, TX
ETL Developer Jul 2013-Jun 2014
●Designed and deployed interactive BI dashboards in Tableau and Power BI to track and visualize sales performance metrics, enabling faster insights and decision-making.
●Developed and optimized ETL workflows using SSIS, reducing data processing time by 60% and increasing pipeline reliability.
●Applied data modeling best practices to create efficient schemas that improved report accuracy and query performance.
●Automated pipeline monitoring and alerting systems, decreasing downtime by 40% and ensuring timely issue resolution.
Texas Medicaid and Health Corp Austin, TX
ETL Developer Aug 2011-Mar 2013
●Developed robust ETL pipelines for Medicaid claims processing, reducing data errors by 30% and enhancing overall data quality.
●Designed and implemented data models tailored for HIPAA-compliant reporting, ensuring security and regulatory adherence.
●Integrated claims data from multiple sources to generate comprehensive state-level reports, improving data consolidation and accuracy.
●Automated data validation scripts, cutting manual audit efforts by 50% and accelerating compliance verification processes.