Sr. Data Engineer
Name: Hruthik Raj A
Email: **************@*****.***
Phone: 210-***-****
LinkedIn: https://www.linkedin.com/in/hruthik-raj/
Professional Summary:
Over 10 years of experience as a Data Engineer, specializing in the design and deployment of scalable data infrastructure on platforms such as AWS, Azure, and GCP, enhancing data-driven decision-making processes.
Proficient in leveraging AWS services including EC2, S3, Redshift, DynamoDB, Kinesis, and AWS Glue to build robust data storage solutions and streamline data workflows.
Expert in utilizing Azure technologies like Azure Databricks, Azure Synapse, Azure Event Hubs, and Azure Data Factory to develop and optimize data pipelines and big data solutions.
Advanced skills in Google Cloud Platform, including GCP Big Query, Cloud Pub/Sub, and GCP Dataflow, to implement effective data processing and analytical frameworks.
Demonstrated ability to design and manage highly efficient ETL processes using tools such as SSIS, Apache Kafka, and Snowflake, coupled with extensive knowledge of SQL and T-SQL for complex data querying.
Strong background in setting up and maintaining cloud-based environments using infrastructure as code tools like Terraform and CloudFormation for reproducible and scalable cloud configurations.
Proficient in automating and orchestrating workflows using Cloud Composer and Ansible, ensuring seamless data operations across multiple environments.
Skilled in programming and scripting with Python, Shell Scripting, and PowerShell to automate tasks, manipulate data, and enhance data pipeline performance.
Extensive experience with real-time data processing and analytics using Spark, Spark SQL, Spark Streaming, and Kafka for timely insights and decision support.
Advanced user of monitoring and logging tools such as CloudWatch, ELK Stack, and Splunk to ensure high availability and reliability of data services.
Expertise in the deployment and management of database solutions using JBOSS, WebSphere, and Unix/Linux systems, ensuring robust data governance and security.
Experienced in utilizing project management and version control tools such as JIRA, Git, and Bitbucket to maintain high standards of code quality and collaboration within development teams.
Technical Skills:
Category
Skills
Cloud Platforms
AWS (EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, Cloud formation, CloudWatch, ELK Stack, DynamoDB, Kinesis, Redshift, AWS Data Pipe Lines, AWS Glue), GCP (Google Dataflow, GCS, Big Query, Data prep, Dataflow, Data proc, Cloud Composer, Cloud Pub/Sub, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP Databricks), Azure (Azure event hubs, Azure synapse, Azure data factory, Azure Databricks, Azure Service Bus, Azure SQL)
Programming/Scripting
Python, Java, Shell Scripting, PowerShell, SQL, PL/SQL
Data Management
Snowflake, Hadoop, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, Cloudera, Teradata, Federated Queries, SSIS, SSAS, SSRS, Map Reduce, Erwin, OLTP, OLAP, ETL, SAS
DevOps and Automation
Terraform, Ansible, Bitbucket, GIT, JIRA, Maven, Web sphere, Unix/Linux, Code Deploy, Code Pipeline, Code Build, Code Commit, SonarQube
Additional Tools
Splunk, JBOSS, SFDC, SQL Server, SQL Server 2022, Power BI, Tableau, VPN Google-Client, Pub Sub
Professional Experience:
Merck Pharma, Branchburg, NJ
Senior Data Engineer November 2021 to Present
Responsibilities:
Designed and implemented data workflows using AWS Data Pipelines and AWS Glue to ensure seamless data integration and processing across multiple platforms.
Developed and optimized large-scale Spark-based ETL pipelines for processing pharmaceutical data efficiently.
Utilized Spark SQL and Spark Streaming to support real-time analytics and batch processing.
Integrated Spark with AWS Glue and Redshift to enhance performance in large-scale data transformations.
Built real-time data pipelines using Apache Kafka and Spark Streaming, ensuring timely and reliable data ingestion.
Configured and maintained AWS EC2 instances, S3 buckets, and EBS volumes to support scalable data storage and computation needs in a cloud environment.
Utilized Talend to develop ETL workflows, ensuring seamless data extraction, transformation, and loading for business intelligence and analytics purposes.
Utilized AWS CloudFormation and Terraform for infrastructure as code, automating the provisioning of AWS resources and ensuring consistency across development, testing, and production environments.
Developed and optimized Redshift data warehouses and DynamoDB tables to support high-performance querying and data analysis needs specific to pharmaceutical research.
Scripted automated processes using Python and Java, enhancing data transformation and integration tasks to support analytics.
Managed ELB configurations to distribute data processing loads evenly across EC2 instances, improving system responsiveness and reliability.
Implemented security measures using VPC, AWS SNS, and SQS for secure and efficient data messaging and notifications within cloud environments.
Monitored system performance using AWS CloudWatch and ELK Stack, identifying and resolving issues to maintain high availability and performance.
Collaborated with R&D teams using Jira and Bitbucket to integrate software development and data engineering practices seamlessly.
Deployed and managed JBOSS and WebSphere servers in AWS environments, ensuring robust application hosting for data-intensive applications.
Utilized Kafka, Sqoop, and Kinesis for real-time data streaming and ingestion, supporting timely analytics on streaming pharmaceutical data.
Engineered and executed Code Build, Code Deploy, and Code Pipeline for continuous integration and delivery pipelines, reducing downtime and speeding up feature development.
Developed Spark SQL scripts and used Spark Streaming for processing large datasets, enabling real-time data analytics to support business decisions.
Configured and administered AWS RDS and Snowflake instances for structured data storage and complex query execution, enhancing data-driven strategies.
Utilized Git and Ansible for version control and configuration management, streamlining collaboration across data teams and maintaining code integrity.
Employed AWS S3 and AWS Redshift for efficient data storage and warehousing solutions, optimizing data retrieval and analysis processes.
Conducted log analysis using Splunk, facilitating effective debugging and system performance monitoring.
Implemented HBase and Hive for managing large datasets, ensuring scalability and performance in data processing and analysis.
Applied SonarQube for continuous inspection of code quality, integrating security and compliance checks into the development lifecycle.
Operated within Unix/Linux environments to manage system operations and ensure compatibility across different technology stacks.
Environment: AWS (EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, Cloud formation, CloudWatch, ELK Stack), Bitbucket, Ansible, Python, Shell Scripting, PowerShell, GIT, Jira, Snowflake,JBOSS, Terraform, Redshift, Maven, Web sphere, Unix/Linux, DynamoDB, Kinesis, AWS Redshift, AWS S3, AWS Data Pipe Lines, AWS Glue, Code Deploy, Code Pipeline, Code Build, Code Commit, Splunk, SonarQube, Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, Cloudera, Power BI.
Chevron Corporation, Santa Rosa, NM
Senior Data Engineer July 2019 to October 2021
Responsibilities:
Configured and managed data processing workflows using Google Dataflow to ensure efficient data transformation and pipeline execution.
Designed and implemented Spark-based data workflows within GCP Dataproc for large-scale data transformations.
Leveraged Spark SQL and Cloud Pub/Sub for real-time data processing and analytics.
Optimized federated queries in BigQuery using Spark for improved performance across distributed data sources.
Utilized GCP Big Query for executing complex SQL queries and analyses, enabling fast access to insights from large datasets across the energy sector.
Operated GCP Data Prep and GCP Dataflow to cleanse and prepare datasets, ensuring data quality and readiness for analysis.
Implemented GCP Data proc projects to manage and process big data jobs efficiently, leveraging cluster management for optimized resource utilization.
Managed Cloud Composer to orchestrate workflow automation, ensuring seamless integration of various data sources and applications.
Developed and maintained pipelines using Cloud Pub/Sub, allowing for scalable and real-time data messaging services.
Scripted automation tasks and data manipulation processes using Java and shell scripts, enhancing operational efficiencies and data handling capabilities.
Utilized Federated Queries in GCP Big Query to query data across different databases, optimizing data analysis and integration processes.
Implemented Snowflake solutions on GCP to manage and analyze multi-cloud data securely and efficiently.
Configured VPCs and VPNs to secure network architectures, ensuring safe data transmission and access controls within the cloud environment.
Maintained data governance and cataloging with GCP Data Catalog, providing a robust framework for data asset management.
Optimized data transfers with Cloud Storage Transfer Service, streamlining data movement and integration across cloud and on-premises systems.
Administered GCP Cloud Spanner databases for critical transaction management, supporting high availability and global consistency.
Managed GCP Cloud SQL instanaces, ensuring robust, scalable, and manageable relational database services.
Developed and deployed SSIS packages, enhancing ETL processes and data integration strategies.
Designed and implemented SSAS models to support advanced analytics and data mining operations.
Created reports using SSRS, providing actionable insights and detailed data visualizations to stakeholders.
Engaged in data architecture planning and development using GCP Databricks, facilitating collaborative data science and engineering projects.
Environment: Google Dataflow, GCS, GCP Big Query, GCP Data prep, GCP Dataflow, GCP Data proc, Cloud Composer, Cloud Pub/Sub, python, shell scripts, Federated Queries, Snowflake, VPC Configuration, Data Catalog. VPN Google-Client, Pub Sub, SSIS, SSAS, SSRS, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP Databricks, Power BI.
Experian, Costa Mesa, CA
Data Engineer September 2017 to June 2019
Responsibilities:
Managed large-scale data processing using Azure Data Factory, orchestrating data movement and transformation across various data stores.
Developed real-time data processing solutions using Azure Stream Analytics and Azure Event Hubs, enabling immediate data ingestion and analysis.
Utilized Apache Kafka for building scalable high-throughput data pipelines, ensuring efficient data transfer between systems.
Configured and maintained Azure SQL databases, optimizing performance and scalability to support high-volume data applications.
Implemented data warehousing solutions using Azure Synapse Analytics, providing powerful query capabilities and integrative data management.
Developed and maintained ETL processes using T-SQL and SQL Server Integration Services (SSIS), ensuring data accuracy and availability.
Designed and executed batch processing jobs using Hadoop MapReduce, processing large datasets efficiently across distributed systems.
Created interactive dashboards and reports using Power BI and Tableau, delivering actionable insights to business users and stakeholders.
Administered Azure Databricks environments, facilitating collaborative data exploration, experimentation, and analytics.
Managed code version control and deployment using Azure Databricks and GitHub, ensuring consistency and traceability across development stages.
Integrated multiple data sources into a cohesive data platform using SQL and Hive, enhancing data accessibility and analytics.
Orchestrated data workflows and pipelines using Azure Service Bus, ensuring reliable messaging and data integration services.
Performed data extraction and transformation from various sources including Teradata, SQL Server, and SFDC, enriching the data ecosystem.
Utilized Unix shell scripting to automate data processing tasks, improving efficiency and reliability of data operations.
Implemented security measures and compliance protocols in Azure environments, safeguarding sensitive data and adhering to regulatory standards.
Developed predictive models and analytics solutions using Python, leveraging machine learning to enhance data-driven decision-making.
Conducted performance tuning and optimization on SQL Server 2017 instances, ensuring high performance and availability.
Environment: Apache Kafka, Azure, Python, power BI, Unix, SQL Server, Hadoop, Hive, Map Reduce, Teradata, SQL, Azure event hubs, azure synapse, azure data factory, Azure Databricks, Azure Databricks GIT Hub, Azure Service Bus, Azure SQL, SQL Server 2017, Power BI, SFDC, SQL, T-SQL, Hive.
PalTech, Hyderabad, India
Data Analyst July 2014 to June 2017
Responsibilities:
Utilized Erwin data modelling tools to design and enhance both OLTP and OLAP databases, ensuring optimal performance and scalability for a software development client's data management system.
Developed and executed complex T-SQL and PL/SQL scripts for data manipulation and querying, which supported critical analysis and decision-making processes within the company.
Leveraged AWS cloud services to set up, manage, and maintain scalable and secure data storage solutions, enhancing data retrieval and storage efficiency.
Designed and implemented ETL processes using SAS software, facilitating the effective integration, cleansing, and consolidation of large datasets from diverse sources.
Managed Teradata databases to handle large-scale data warehousing applications, optimizing data storage and retrieval operations to support extensive analytical functions.
Created comprehensive reports and dashboards using SSRS, providing actionable insights and data-driven recommendations to improve business operations and strategies.
Conducted rigorous data analysis and visualization tasks, employing advanced SQL techniques to extract insights from large datasets, which directly influenced strategic planning and operational improvements.
Continuously monitored and optimized data systems and processes to ensure security and compliance with industry standards, using best practices in AWS and data governance methodologies.
Environment: Erwin, T-SQL, OLTP, AWS, PL/SQL, OLAP, Teradata, SQL, ETL, SAS, SSRS, Java, Power BI.