ARUN SUMANTH POLINENI
• Data Engineer • 913-***-**** • Plano, TX – 75024 •***************@*****.***
PROFESSIONAL SUMMARY
Data Engineer with 4+ years of experience building scalable data platforms and high-volume ETL pipelines across Azure, AWS, and Snowflake. Proven track record of improving pipeline performance, reducing cloud costs, and enabling business teams with reliable, analytics-ready data. Strong expertise in ADF, Databricks, PySpark, SQL, and data modeling, with experience supporting financial analytics, regulatory reporting, and enterprise BI workloads.
EDUCATION
Master of Science, Computer Science - 3.63/4.0 GPA Jan 2023 – May 2024
University of Missouri - Kansas City, Kansas City, MO
TECHNICAL SKILLS
•Programming Languages: Python, SQL, Apache Spark, PySpark, Spark SQL, DAX
•Data Modeling and ETL: ETL Processes, Data Warehousing, Data Modeling, Informatica PowerCenter, SSIS, Alteryx, Apache Airflow, Medallion Architecture
•Cloud Technologies: Microsoft Azure (Data Factory, Databricks, Synapse, Data Lake Storage, Logic Apps, Cosmos DB, ADLS, Azure Key Vault), AWS (S3, EC2, Redshift, Glue, Lambda, RDS), Google Big Query, Microsoft Fabric
•Databases & Warehouses: MySQL, SQL Server, Azure DB, Postgres SQL, Mongo DB, Snowflake, AWS Redshift, Azure Synapse
•Big Data Technologies: Apache Spark, Apache Hadoop, Apache Kafka
•Packages: NumPy, Pandas, Matplotlib
•Devops & Infrastructure as Code (IaC): Azure DevOps, Jenkins, Kubernetes, Terraform
•Tools: SSMS, Power BI, Visual Studio, Jupyter, Microsoft Word, Excel
•Project Management Methodologies: Agile and Waterfall
•Other Technologies: Version control tools (Git & Github), Linux, Data Governance (Collibra DGC), UNIX, API and Web Services.
WORK EXPERIENCE
Data Engineer Jul 2024 - Present AppWorks, USA
●Designed and deployed scalable 40+ ETL pipelines using Python, PySpark, Azure Data Factory, SSIS, and Databricks, cutting data processing time by up to 30%.
●Implemented Medallion Architecture within Azure Databricks and Snowflake, ensuring high-quality, analytics-ready data layers.
●Orchestrated complex data workflows with Apache Airflow, enhancing pipeline reliability and automation and reducing manual intervention by 40%.
●Engineered secure, scalable data platforms on Azure Data Lake Storage, Synapse Analytics, and Snowflake, optimizing storage, query performance, and cost-efficiency by 35%.
●Processed large-scale datasets with PySpark, improving performance for high-volume data workloads.
●Conducted advanced data analysis using Python (Pandas, NumPy) and SQL, yielding actionable insights for strategic decisions.
●Developed dynamic Power BI dashboards integrated with Snowflake and Azure, boosting real-time business intelligence capabilities.
●Implemented robust data validation frameworks, increasing pipeline reliability and data accuracy by over 20%.
●Built reusable ETL components for ingesting data (APIs, files, databases) into Snowflake and Azure Data Lake, reducing new pipeline development time by 30%.
●Worked in the finance domain, supporting market intelligence and analytics platforms with secure, scalable data pipelines to power financial insights.
Data Engineer Mar 2021 - Dec 2022
Accenture Solutions, Hyderabad, India
●Designed and deployed 50+ parameterized Azure Data Factory (ADF) pipelines for automated ETL, improving efficiency by 30%.
●Implemented ETL in Azure Databricks using Spark and PySpark, boosting performance by 25%.
●Monitored pipeline executions, resolving discrepancies and ensuring robust data processing.
●Built automated workflows in ADF and Databricks with Logic Apps and Functions for alerts and logging, speeding up issue resolution by 50%.
●Established CI/CD pipelines in Azure DevOps, reducing ADF component deployment time across environments by 50%.
●Performed SQL analysis in Snowflake, uncovering insights to support data-driven decisions.
●Collaborated with stakeholders to align data solutions with business needs, generating ad-hoc reports using Snowflake and Power BI.
●Implemented data governance and compliance frameworks (GDPR, CCPA) using Collibra and data classification tools to enforce policies and maintain audit readiness.
●
Data Engineer Feb 2020 - Mar 2021 Capgemini, Pune, India
●Developed ETL pipelines using AWS Glue, Informatica PowerCenter, and PySpark, enhancing data integration and processing efficiency by 25%.
●Optimized real-time streaming pipelines with Kafka and Spark Streaming, reducing latency and accelerating high-frequency data processing by over 25%.
●Managed data storage solutions using MongoDB, Amazon Redshift, GCP Bigquery, and Snowflake, reducing compute costs by 20% and improving analytics performance.
●Automated scalable data workflows via Apache Airflow and Spark, reducing AutoML model training time by 30%.
●Engineered cloud-native data platforms leveraging AWS services (EC2, S3, Lambda, Glue, Redshift, Snowflake) for secure, scalable analytics.
●Provisioned and managed AWS data environments using Terraform, enabling repeatable and secure infrastructure deployment.
●Containerized data processing applications using Docker and deployed them via Kubernetes clusters for scalable, portable execution environments
CERTIFICATIONS
● Microsoft Certified Azure Fundamentals (AZ-900)
● Microsoft Certified Azure Data Fundamentals (DP-900)
● Microsoft Certified Fabric Data Engineer (DP-700)
ACHIEVEMENTS
● Awarded “Rookie of the Month” for outstanding contributions to pipeline automation and performance optimization.
● Recognized with “Monthly Grammy Award” for exceptional teamwork and successful project deliveries.