OBJECTIVE
Highly skilled Data Engineer with *+ years of experience in designing, building, and optimizing scalable data pipelines and architectures. Proficient in leveraging cloud platforms like AWS and Azure, and working with technologies such as SQL, NoSQL, ETL processes, and big data tools to drive actionable insights. Seeking a challenging role to utilize my expertise in data engineering and analytics to enhance decision-making and support business growth. Committed to ensuring data quality, security, and performance in dynamic environments.
PROFILE OVERVIEW
Over 5+ years of experience in Data Engineering, with a strong focus on data integration, ETL pipelines, and cloud-based solutions.
Skilled in leveraging AWS services such as S3, Redshift, Glue, EMR, and Lambda to design and optimize data engineering solutions and ETL workflows. Proficient in building scalable, efficient pipelines.
Skilled in cloud-based data solutions, including AWS Lambda, Azure Synapse Analytics, and Google BigQuery Expertise in designing, developing, and optimizing scalable data pipelines using technologies like Azure, AWS, and GCP.
Proficient in SQL and NoSQL databases, including Azure SQL, Amazon Redshift, Cosmos DB, and MongoDB.
Experienced in data modeling, schema design, and data warehousing for efficient storage and retrieval.
Strong understanding of data lake architecture, particularly with Azure Data Lake and Amazon S3 for large-scale data storage.
Hands-on experience with distributed processing frameworks such as Apache Spark, Hadoop, and Databricks.
Proficient in automating ETL processes and integrating data from various sources into centralized repositories.
Expertise in data transformation, cleansing, and validation techniques to ensure high-quality data for analysis.
Experienced in designing and maintaining data pipelines to support real-time and batch data processing.
Adept at working with business intelligence tools like Power BI, Tableau, and AWS QuickSight for data visualization and reporting.
Proficient in leveraging Google Cloud Platform (GCP) services such as BigQuery, Cloud Storage, and Dataflow to build and manage scalable and efficient cloud-based data processing solutions.
Proficient in working with big data technologies such as Hadoop, Hive, HDFS, Pig, and MapReduce to process and analyze large-scale datasets.
Extensive experience with Hadoop Distributed File System (HDFS) for efficient storage and retrieval of massive data sets. Familiarity with HTML for data integration or visualization tasks.
Experienced in troubleshooting pipeline issues, optimizing performance, and ensuring data quality in dynamic environments
Skilled in MapReduce programming to process large datasets in a distributed computing environment for batch processing.
Expertise in utilizing Pig scripts to simplify data transformations and loading tasks in the Hadoop ecosystem.
Solid understanding and experience with Scala to build scalable and efficient data processing frameworks.
Solid understanding of DevOps practices and version control systems, including Git and Jenkins, for continuous integration and deployment.
In-depth experience with containerization & orchestration tools like Docker and Kubernetes for scalable data processing.
Experienced in setting up and managing data pipelines using tools such as Apache Kafka, Apache Airflow, and Azure Data Factory.
TECHNICAL SKILLS
Azure
Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Blob Storage, Cosmos DB.
AWS
S3, Redshift, Glue, EMR, Lambda, DynamoDB, Kinesis, Athena, QuickSight.
GCP
BigQuery, Cloud Storage, Dataflow, Dataproc, Pub/Sub, Looker, Vertex AI.
Relational Databases
MySQL, PostgreSQL, Microsoft SQL Server, Oracle Database.
NoSQL Databases
MongoDB, Cassandra, DynamoDB, Azure Cosmos DB, Google Firestore.
Data Lakes
Azure Data Lake, AWS Lake Formation, Google Cloud Storage.
Programming Languages
C, Python, Java, Scala, SQL, R, Shell Scripting, Javascript
Big Data Technologies
Apache Hadoop, Apache Spark, Hive, Pig, HBase, Kafka, Flink.
ETL Tools
Informatica, Talend, Apache NiFi, Azure Data Factory, AWS Glue, Google Data Fusion.
Data Warehousing
Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse Analytics, Teradata.
DevOps & CI/CD
Jenkins, GitHub Actions, Azure DevOps, AWS CodePipeline, Docker, Kubernetes, Terraform, Ansible, Airflow, Dbt.
Business Intelligence (BI) Tools
Power BI, Tableau, Looker, QlikView, Microsoft SSRS.
Monitoring & Logging
Prometheus, Grafana, ELK Stack, AWS CloudWatch, Azure Monitor, GCP Stackdriver.
WORK EXPERIENCE
Blue Cross and Blue Shield of Kansas
Topeka, Kansas, USA
Role: Azure Data Engineer Jan 2024 – Present
Description: Blue Cross and Blue Shield of Kansas is a leading health insurance provider dedicated to delivering comprehensive healthcare solutions. As an Azure Data Engineer, My role involves leveraging Azure SQL, Data Lake, Synapse Analytics, and Cosmos DB to design and optimize data pipelines, ensure seamless data integration, and enable advanced analytics for informed decision-making. The focus is on enhancing healthcare data management, scalability, and operational efficiency using cutting-edge Azure technologies.
Responsibilities:
Design and maintain scalable end-to-end ETL/ELT pipelines using Azure Data Factory and tools like Hive, Pig, and Scala for efficient data transformation.
Integrate data from sources like SQL Server, Cosmos DB, and HDFS into unified formats for analytics and reporting.
Manage and optimize Azure Data Lake, Blob Storage, and HDFS for secure and efficient data storage.
Leveraged SSRS and SSIS to build automated reporting systems and data transformation workflows.
Built and optimized high-performance algorithms and prototypes for data modeling, data mining, and production systems.
Leverage Apache Hadoop ecosystem tools, including Hive, HDFS, and MapReduce, for distributed data processing.
Utilize Azure Databricks, Spark, and Scala for big data processing and advanced data analytics.
Demonstrated exceptional problem-solving and communication skills while addressing data infrastructure challenges and translating business needs into technical solutions.
Implement data warehousing solutions using Azure Synapse Analytics to enable business intelligence and reporting.
Ensure optimal performance of relational and NoSQL databases, such as SQL Server, Cosmos DB, and HDFS.
Optimize data pipelines, queries, and distributed storage mechanisms to ensure low latency and high throughput.
Implement logging and monitoring solutions using Azure Monitor, Log Analytics, and Hadoop monitoring tools to troubleshoot pipeline issues. Utilize tools like Shadow to monitor data pipelines and address anomalies in real-time.
Performed integration testing for Azure Data Factory pipelines to ensure end-to-end data flow reliability
Set up and manage CI/CD pipelines with Azure DevOps for seamless deployment of data solutions and Hadoop jobs.
Collaborate with BI teams to create dashboards and reports using Power BI for effective decision-making.
Designed and implemented efficient and scalable data architecture for end-to-end data pipelines, utilizing Azure Data Factory, Airbyte, and DBT to ensure seamless integration.
Applied Agile methodologies to optimize project delivery and enhance team collaboration.
Implement MapReduce jobs and optimize performance for distributed data processing tasks in the Hadoop ecosystem.
Design and manage data pipelines for both batch and real-time processing, utilizing tools like Azure Stream Analytics and Hadoop for seamless integration of large datasets.
Environment: Azure Data Factory, Hive, Pig, Scala, SQL Server, Cosmos DB, HDFS, Azure Data Lake, Blob Storage, Apache Hadoop, MapReduce, Azure Databricks, Spark, Power BI, Azure Monitor, Log Analytics, Azure DevOps.
Garmin
Olathe, Kansas, USA
Role: AWS Data Engineer May 2023 – Dec 2023
Description: Garmin is a global leader in GPS technology and wearable devices, providing cutting-edge solutions in navigation, fitness, and health. As an AWS Data Engineer, I managed data workflows using AWS services such as S3, Redshift, and Lambda, ensuring scalable, secure, and efficient data pipelines. Expertise in ETL processes, NoSQL databases (such as DynamoDB), and data lake architecture will be key in enabling advanced analytics and driving business insights.
Responsibilities:
Designed and optimized data pipelines using AWS Glue, Lambda, and S3 for efficient data ingestion, transformation, and loading into Amazon Redshift and DynamoDB.
Managed large-scale data storage solutions on AWS S3 and enhanced Redshift performance by fine-tuning cluster configurations for faster query execution.
Developed and optimized data structures and high-performance algorithms to support efficient data processing, modeling, and production pipelines.
Utilized Amazon EMR and Apache Spark to process and analyze massive datasets, supporting both batch and real-time data processing workflows.
Optimized and maintained the performance, availability, and fault tolerance of DynamoDB, RDS, Aurora, and Redshift databases.
Developed efficient data models in Redshift and DynamoDB, implementing optimized schema designs to support scalable analytics and reporting.
Automated complex data workflows with AWS Lambda and improved query performance and cost efficiency by optimizing SQL and Redshift configurations.
Implemented Amazon Kinesis and AWS Data Pipeline for real-time data streaming and processing, enabling quick insights for actionable decision-making.
Built and maintained high-performance ETL pipelines, ensuring adherence to change management processes and minimizing business disruption during deployments.
Set up and managed CI/CD pipelines with AWS CodePipeline and AWS CodeBuild to automate the deployment and updating of data solutions.
Monitored pipeline performance and resolved issues using Amazon CloudWatch and AWS X-Ray, ensuring uninterrupted data flows and system optimization.
Conducted rigorous quality assurance on ETL processes, data pipelines, and business-critical workflows to ensure data integrity, accuracy, and compliance.
Enforced stringent data security protocols by configuring IAM roles, KMS encryption, and VPC to maintain compliance and safeguard sensitive data.
Utilized data science principles to develop predictive models, enabling actionable insights from large datasets.
Environment: AWS Glue, Lambda, S3, Amazon Redshift, DynamoDB, Amazon EMR, Apache Spark, RDS, Aurora, IAM, KMS encryption, VPC, Amazon Kinesis, AWS Data Pipeline, AWS CodePipeline, AWS CodeBuild, Amazon CloudWatch, AWS X-Ray.
PepsiCo (TCS)
Hyderabad, India
Role: GCP Data Engineer July 2021 – Oct 2022
Description: PepsiCo is a global leader in the food and beverage industry, known for its portfolio of iconic brands. As a GCP Data Engineer, I built and optimized data pipelines using Google Cloud Platform services like BigQuery, Cloud Storage, and Dataflow to support large-scale data processing and analytics. Expertise in ETL workflows, NoSQL databases, and real-time data processing is essential to drive business intelligence and data-driven decision-making.
Responsibilities:
Developed and optimized ETL pipelines using Google Cloud Dataflow, automating data processing workflows for large-scale energy consumption data.
Designed and managed data lakes on Google Cloud Storage (GCS), enabling scalable and cost-efficient storage for structured and unstructured data.
Built and maintained advanced data warehouses using BigQuery, providing insights for performance monitoring, grid optimization, and customer behaviour analysis.
Engineered both batch and real-time data processing pipelines with Apache Beam on Google Cloud Dataflow, supporting data-driven decision-making in energy services.
Implemented high-performance big data solutions utilizing Apache Spark, Hadoop, and Presto, enhancing analytics for fault detection and energy distribution.
Worked closely with data scientists and analysts to optimize SQL queries and analytical models in BigQuery, utilizing Python and Java to extract actionable insights.
Managed and optimized relational and NoSQL databases such as Cloud SQL, Firestore, and Bigtable, ensuring high availability and scalability of distributed data systems.
Automated infrastructure provisioning using Terraform, ensuring consistent and efficient cloud deployments for data solutions.
Streamlined CI/CD pipelines with Jenkins and DevOps practices to accelerate the deployment of data systems and analytical applications.
Implemented robust security measures, including IAM roles, encryption, and audit logging, to comply with regulatory standards and ensure data privacy.
Integrated real-time data ingestion solutions with Google Cloud Pub/Sub and Apache Kafka, providing up-to-the-minute tracking of energy consumption and grid performance.
Environment: Google Cloud Dataflow, ETL, Google Cloud Storage (GCS), BigQuery, Apache Beam, Apache Spark, Hadoop, Presto, Google Cloud Pub/Sub, Apache Kafka, Cloud SQL, Firestore, Bigtable, Terraform, Jenkins, IAM roles, encryption, audit logs.
Citibank
Mumbai, India
Data Engineer Sep 2019 – June 2021
Description: PNC Bank is a leading financial institution offering a wide range of banking services, including personal, business, and wealth management solutions. As a Data Engineer, I designed and optimized data pipelines using Apache Kafka, ETL processes, and AWS services like Redshift and S3 to manage large volumes of financial data. Expertise in SQL, NoSQL databases, and data warehousing is key to supporting analytics and business intelligence initiatives.
Responsibilities:
Designed and implemented data pipelines to collect, process, and store financial data from diverse sources, including transaction systems and banking applications, using Apache Kafka, Apache Spark, and other ETL tools.
Ensured secure data storage and compliance with regulatory requirements by incorporating encryption and access control measures, leveraging AWS S3.
Integrated and managed financial data sources, such as transaction records and customer profiles, leveraging SQL and NoSQL databases for handling structured and unstructured data.
Developed and optimized data warehouses for analytics, using Amazon Redshift to enable efficient querying and reporting on large-scale banking datasets.
Designed and implemented real-time data streaming solutions, leveraging Apache Kafka and AWS Kinesis to process transaction data and account activity with minimal latency.
Automated ETL workflows for large-scale financial data processing, using tools like Apache Airflow and AWS Glue to streamline and manage complex data pipelines.
Implemented and maintained cloud-based infrastructure, employing Terraform and AWS CloudFormation to automate infrastructure provisioning and ensure consistent deployments across cloud environments.
Collaborated with business and IT teams to derive data-driven insights for improving banking services and customer satisfaction, using (business intelligence) BI tools such as Tableau, Power BI, and Looker.
Collaborated on the implementation of machine learning models, ensuring clean and reliable datasets using tools like Apache Spark and TensorFlow to support advanced analytics and decision-making in banking operations.
Environment: Apache Kafka, Apache Spark, ETL, AWS S3, SQL, NoSQL databases, Amazon Redshift, AWS Kinesis, Apache Airflow, AWS Glue, Terraform, AWS CloudFormation, Tableau, Power BI, Looker, TensorFlow.
Reshma Rani Pamarthi
Data Engineer
**************@*****.***
EDUCATION: Master’s in Computer Science, University of Central Missouri, USA.