Varshitha Pendyala
Houston, TX (Open to Relocation) +1-346-***-****
LinkedIn *************@*****.*** GitHub
Professional Summary
• Data Engineer with 5 years of experience in designing, developing, and maintaining scalable data pipelines and systems. With a strong focus on ETL processes, data modeling, and business intelligence reporting.
• Engineered end-to-end data solutions using Azure Data Factory, Synapse Analytics, and Azure Databricks to ingest, process, and analyze large datasets.
• Deployed and managed Azure Blob Storage and Data Lake, optimizing structured and unstructured data storage for analytics and reporting.
• Architected and implemented serverless and containerized applications using Azure Functions and Kubernetes
(AKS) for scalable and resilient solutions.
• Strengthened cloud security and identity management with Azure Active Directory (Azure AD), Key Vault, and RBAC policies for access control and compliance.
• Designed and implemented scalable and secure cloud infrastructure. Using AWS EC2, S3, and Lambda for high availability, cost efficiency, and serverless applications.
• Built and optimized big data pipelines leveraging AWS Glue, Redshift, and EMR to process large datasets and enable advanced analytics.
• Integrated event-driven architectures with AWS SQS, SNS, and Kinesis to enable real-time data streaming and messaging across distributed systems.
• Implemented network security best practices using AWS VPC, IAM policies, Security Groups, and KMS to enforce data protection and compliance.
• Automated infrastructure provisioning and deployment with Terraform and AWS CloudFormation, enabling seamless DevOps workflows and CI/CD pipelines.
• Designed and developed ETL workflows using AWS Glue, Lambda, and Step Functions, efficiently transforming and migrating large datasets.
Technical Skills
Programming Languages
• C, C++, Java, Python (Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, PySpark), Scala Data Structures & Algorithms
• Arrays, Linked Lists, Trees, Stack, Queue, Graphs, HashMaps, Heaps, Sorting & Searching, Recursion. Databases and Web Development
• MySQL, Oracle, MongoDB, DynamoDB, HTML, CSS, JavaScript, JSON Data Engineering & Big Data Tools
• Apache Spark, Apache Kafka, Hadoop, Amazon Redshift, Snowflake, Azure Data Lake, Amazon S3 Cloud Platforms & Services
• AWS: EC2, S3, Lambda, RDS, Redshift, Glue, IAM, CloudFormation, CloudWatch, CloudTrail, Sagemaker
• Azure: Azure Data Factory, Microsoft Fabric, Azure Synapse Analytics, Azure SQL Database, Azure Blob Storage, Azure Machine Learning, Azure Functions
Operating Systems & IDEs
• Linux, UNIX, Windows, PyCharm, IntelliJ IDEA, Jupyter Notebooks, Google Collab, Eclipse DevOps Tools & Visualization
• Git, GitHub, GitLab, Bitbucket, Jenkins, Ansible, GitLab CI/CD, AWS Code Pipeline, Docker, Kubernetes, Amazon ECS (Elastic Container Service), Terraform, CloudWatch, Tableau, Power BI, GraphPad Prism. Project Management & Collaboration
• SDLC, Agile, Scrum, Jira, ServiceNow.
Work Experience
Data Engineer: Johnson & Johnson, Houston, TX February 2025- Present
• Designed and developed scalable ETL pipelines using Microsoft Fabric and Azure Data Factory (ADF) to orchestrate data ingestion from SAP (source system) into Azure Databricks for transformation.
• Implemented data processing frameworks in Databricks (PySpark) to efficiently transform and aggregate structured and unstructured datasets.
• Leveraged Delta Lake for optimized data storage and ACID-compliant transactions, ensuring consistency and reliability in Azure Data Lake.
• Optimized Spark job execution by fine-tuning cluster configurations, implementing Adaptive Query Execution
(AQE), and utilizing Databricks Job Clusters.
• Developed parameterized and metadata-driven ADF pipelines, enhancing reusability and reducing maintenance efforts.
• Automated monitoring, logging, and alerting mechanisms using Azure Monitor, Log Analytics, and Application Insights to ensure data pipeline reliability.
• Implemented role-based access control (RBAC) and data security measures using Azure Key Vault, Managed Identities, and Databricks ACLs.
• Scheduled and orchestrated workflows using Azure Logic Apps and Azure Functions, improving pipeline efficiency and automation.
• Developed secure REST and Python APIs for dataset access, integrating with Azure API Management to enable real-time analytics for cross-functional teams.
• Automated data workflows with Airflow and Azure Logic Apps, reducing pipeline maintenance by 30% and enabling real-time streaming with Kafka and Elasticsearch. Data Engineer: University of Houston, Houston, TX September 2024- January2025
• Designed, managed, and optimized scalable data pipelines within Azure Data Factory (ADF) and Azure Databricks, ensuring seamless ETL/ELT processes, high performance, and minimal system downtime.
• Developed and automated data workflows leveraging ADF, Databricks, and Azure Data Lake, optimizing the ingestion, transformation, and storage of large-scale datasets for advanced analytics.
• Implemented performance tuning and troubleshooting techniques using SQL and PowerShell scripts, optimizing query performance and automating recurring tasks across Azure environments.
• Worked on implementing cloud-native solutions for data integration and analytics with Azure Synapse Analytics and Snowflake, streamlining data access and analysis across the organization.
• Automated data quality checks and error handling processes within ADF and Databricks, reducing pipeline failures and improving overall data integrity.
Data Engineer Intern: Pumpkin Tax Company, Houston, TX February 2024- August2024
• Designed and implemented ETL processes using Azure Data Factory, automating 85% of data pipelines to integrate tax data, resulting in a 30% improvement in reporting efficiency.
• Developed and optimized Azure SQL Data Warehouses, improving query performance by 40% and enabling faster tax data analysis through effective partitioning, indexing, and query optimization.
• Implemented security measures in Azure Data Lake and Azure Synapse Analytics by configuring RBAC, Azure Active Directory (AAD) authentication, and data encryption to ensure compliance with industry standards and safeguard sensitive tax information.
• Ensured 100% data accuracy through data quality checks and governance frameworks in Azure Data Lake and Azure Synapse Analytics, integrating Azure Purview for data cataloging and lineage tracking.
• Led the setup and management of critical business systems, including telecommunications (Vonage), secure remote access (Citrix VPN), HR processes, and financial operations (invoice generation). Automated workflows using Azure Logic Apps and integrated CRM and accounting systems to streamline operations.
• Managed website development and integrated with tools such as CRM systems and accounting software (e.g., QuickBooks), utilizing Azure Functions for seamless system integration. Data Analyst: University of Houston, TX. January (2023-2024)
• Contributed to advanced data analysis projects in healthcare by applying machine learning (ML) techniques, including supervised and unsupervised learning, to derive actionable insights from large-scale healthcare datasets, specifically focusing on autoimmune and cancer patient records.
• Processed and analyzed large-scale datasets ranging from 100,000 to 1 million records, utilizing ML algorithms such as linear regression, decision trees, and k-means clustering to uncover complex data relationships, detect anomalies, and provide valuable insights for research teams.
• Created and optimized data visualizations using Python libraries (Matplotlib, Seaborn) and Power BI, generating detailed scatter plots and heatmaps to enhance understanding of complex variable correlations, resulting in a 25% improvement in data comprehension among clinical research teams.
• Applied advanced statistical techniques, including hypothesis testing, ANOVA, and feature selection, to improve model accuracy and refine insights derived from clinical data.
• Evaluated and validated models using cross-validation, ROC curves, and precision-recall metrics to ensure the robustness and generalization of machine learning models in clinical research applications. Cloud Engineer: Tata Consultancy Services, Hyderabad, India. November (2020- 2022)
• Developed over 20 CloudFormation templates to automate the provisioning and management of complex AWS infrastructure, including EC2 instances, S3 buckets and VPC networks, reducing manual configuration errors and enabling faster deployments.
• Optimized over 10 Lambda functions for serverless computing, reducing cold start times by 30% and improving execution efficiency, while integrating these functions within a CI/CD pipeline to accelerate deployment cycles by 40%.
• Streamlined infrastructure provisioning by identifying inefficiencies in CloudFormation templates.
• Parameterizing resources, achieving a 30% reduction in deployment time and reducing provisioning errors by 25%, improving system reliability and uptime.
• Implemented robust IAM policies and KMS encryption strategies for projects, enhancing security by 25%, ensuring compliance with industry standards, and minimizing attack vectors in cloud-native applications.
• Monitored and analyzed AWS resources using CloudWatch for real-time metrics and AWS X-Ray for distributed tracing, improving resource utilization by 20% and proactively scaling resources to meet demand.
• Led the design and deployment of secure VPC architectures for 10+ high-availability applications, implementing subnetting, NAT gateways, VPN connections, and Transit Gateways to ensure secure and reliable connectivity, reducing network latency by 15%.
• Architected secure access to AWS services using VPC endpoints, improving security and performance by 25%, enabling private connections to services like S3 and DynamoDB, and reducing data transfer costs.
• Spearheaded the integration of AWS API Gateway and Lambda to facilitate serverless APIs, improving service response times by 35% and reducing backend infrastructure costs by 40%.
• Led projects implementing AWS Direct Connect and VPN solutions for secure, high-throughput connectivity for Lloyd Financial Services, improving data transfer speeds by 50% and reducing network downtime by 30%.
• Collaborated on projects to implement the AWS Well-Architected Framework, ensuring cost optimization and performance efficiency, leading to a 20% reduction in operational costs across client environments.
• Directed Agile teams in Scrum and Kanban processes, utilizing Jira and Confluence for project tracking, improving project delivery time by 25% and increasing client satisfaction by 30%.
• Enhanced project delivery by automating testing scripts with AWS CodePipeline, CodeBuild, and CodeDeploy, reducing manual testing efforts by 40% and enabling continuous deployment with zero downtime for 10+ applications.
Education
Master’s in Engineering Data Science, University of Houston, TX January 2023- December 2024 Coursework: Database Management Tools, Machine Learning, Data Mining, Text Mining, Visualization, Natural Language Processing (NLP), AI & Large Language Models (LLMs) Bachelor’s in Electronics and Communication Engineering, MREC, India August 2016 – September 2020 Coursework: Data Structures and Algorithms, OOPS concepts, Core Java, Computer Networks, Operating Systems, IOT Certificates and Recognition
• Microsoft Azure Fundamentals (AZ-900)
• Microsoft Fabric Data Engineer Associate (DP-203)
• AWS Certified Cloud Practitioner (CLF-C02)
Professional Development
• Attended Data Science Salon 2024 - Austin
• Engaged in Eleventh Annual Bayou Startup Showcase 2024
• Volunteered DevOpsDays Houston 2024