SUDHEER GAJULAPALLI DATA ENGINEER
Overland Park, KS ************@*****.*** +1-430-***-**** LinkedIn
Experienced Data Engineer with around 5 years of expertise in designing and optimizing ETL processes, data modeling, and advanced analytics for the Retail, Telecommunications and healthcare industries. Proficient in leveraging cloud platforms such as AWS and Azure to build scalable data pipelines and solutions. Adept at working with big data technologies like Databricks and Snowflake to process large datasets and support data-driven decision-making. Strong skills in SQL, PL/SQL, Python, and data warehousing solutions, with a proven ability to deliver high-impact insights in highly regulated industries.
Technical Skills:
Cloud Platforms: Azure (ADF, Synapse, Databricks, SQL DB, Blob Storage), AWS (Redshift, S3, Glue, Lambda, RDS, Athena, DynamoDB)
Data Engineering Tools: ADF, SSIS, Apache Spark, Databricks, Kafka, SSAS
ETL & Data Pipelines: Data Pipeline Design, Transformation, AWS Glue, Snowpipe,data warehousing
Big Data: Spark, Hadoop, Kafka, Delta Lake
Databases: SQL Server, PostgreSQL, MySQL, Redshift, Snowflake, DynamoDB Programming Languages: Python (Pandas, PySpark),Scala, SQL, T-SQL, Shell Scripting DevOps & CI/CD: Git, Jenkins, Azure DevOps
Data Visualization: Power BI, Tableau
Security & Governance: Azure RBAC, AWS IAM, Data Encryption, Data Masking
Job Scheduling: Apache Airflow, AWS Data Pipeline
PROFESSIONAL EXPERIENCE
TCS,Kroger – Data Engineer; United States, Remote (Aug 2023 – Present)
Designed and orchestrated scalable ETL workflows using Azure Data Factory and Databricks, optimizing data integration and processing for analytics.
Utilized Databricks Delta Lake for efficient data lake management, enabling real-time analytics and dynamic schema updates.
Architected cloud-based data warehouses with Azure Synapse and Snowflake, enhancing query performance and data scalability.
Implemented and optimized data pipelines with Apache Spark and PySpark, supporting real-time and batch processing in the Medallion framework.
Integrated Scala with Hadoop, Hive, and HDFS, enabling efficient big data storage and retrieval.
Automated Azure resource deployment and lifecycle management using ARM templates and Azure Blueprints.
Designed and implemented ETL processes using SSIS to extract, transform, and load data into SSAS cubes for improved decision-making.
Connected Tableau to multiple data sources (SQL Server, Excel, and web data connectors) to create unified and comprehensive dashboards.
Developed and maintained ETL pipelines integrating PostgreSQL with Azure Data Factory, Databricks, and Spark.
Designed and deployed end-to-end data pipelines using Alteryx to integrate disparate data sources, enabling faster insights for decision-makers.
Developed ETL/ELT pipelines to ingest, transform, and load structured and semi-structured data from various sources into cloud-based data warehouses.
Developed DataOps practices to streamline data pipelines, enhancing automation and reliability on Azure infrastructure.
Configured cost-effective and auto-scaling Databricks clusters, achieving performance optimization for dynamic workloads.
Integrated advanced analytics and machine learning solutions to enhance business intelligence and decision-making processes.
IBM, AT&T – Data Engineer; India, Bangalore (Feb 2021 – July 2022)
Built, maintained, scaled, and supported 10+ existing data pipelines.
Achieved efficient data orchestration and transformation, as evidenced by a 50% reduction in ETL processing time, by leveraging Azure Data Factory and performing data cleansing and manipulation using Python.
Increased campaign effectiveness by 30% through generating and validating loyalty rewards based on transaction evaluations using Hive and complex SQL queries.
Integrated data lakes (Azure Data Lake, Google Cloud Storage, S3) with data warehouses to enable seamless big data analytics.
Increased data processing efficiency by 40% through data transformations using Azure Databricks.
Improved real-time data processing by 18% by streamlining event capturing processes using Kafka with Spark Streaming.
Reduced fraudulent transactions by 25%, by analyzing gigabytes of data to identify and prevent fraudulent activities using
Hive SQL analytical functions.
Saved 15 hours of manual work weekly by translating client business needs into actionable Power BI reports.
Optimized data pipeline automation, reducing manual intervention by 40%, by using Jenkins & GitHub for ETL workflows.
Promoted effective team communication through collaborative tools and techniques.
Developed and optimized Hadoop-based ETL pipelines using Hive, Spark, and Sqoop for large-scale data processing.
IBM, Cardinal Health – Jr.Data Engineer; India, Bangalore (June 2019 – Jan 2021)
Developed and optimized ETL workflows using AWS Glue and Databricks Delta Lake to ensure efficient, scalable data integration.
Built and managed data warehouses with AWS Redshift, enhancing data aggregation, query performance, and governance.
Designed serverless solutions using AWS Lambda and automated CI/CD pipelines for Fargate-based applications with AWS CodePipeline.
Leveraged PySpark and Spark SQL to process and analyze large datasets, optimizing performance for real-time and batch workflows.
Implemented and customized Oracle Cloud Fusion applications to streamline business processes and improve operational efficiency.
Integrated PostgreSQL with cloud-based data solutions Azure for scalable data warehousing.
Collaborated with cross-functional teams to define business requirements and deliver actionable data solutions via Alteryx.
Led end-to-end ERP system integration across departments, ensuring smooth data synchronization and process alignment.
Implemented data quality checks and resolved issues to ensure the accuracy and integrity of data in Epic Tapestry.
Integrated machine learning models into production using scikit-learn, TensorFlow, and PyTorch for predictive analytics and deep learning.
Automated SSAS cube processing using SQL Server Agent jobs and custom scheduling scripts
Orchestrated containerized applications with ECS and managed secure, scalable EKS clusters using Terraform and AWS CloudFormation.
Managed data integration from Epic and other healthcare systems to provide actionable insights.
Provisioned and maintained big data processing pipelines with Apache Spark, Apache Kafka, and AWS Data Pipeline for real- time analytics.
Championed data quality initiatives by implementing robust audits, cleansing processes, and governance practices across AWS services.
CERTFICATIONS
AWS Certified:AWS Certified Solutions Architect CREDENTIAL
SQL (Advanced) Certificate CREDENTIAL
EDUCATION
MS in CS - University of Central Missouri (USA) Aug 2022 - Dec 2023
Bachelor’s - DBIT Aug 2015 - May 2019