Data Engineering Big

Location:

Quincy, MA

Posted:

February 14, 2025

Contact this candidate

Resume:

RAGHAVENDAR

Mobile: +1-313-***-**** E-Mail: ***********.********@*****.***

Seasoned Data Engineering Professional AWS and Big Data Specialist

" Data Engineering Specialist with over 6 + years of experience, adept at orchestrating end-to-end data engineering solutions on AWS, GCP and Snowflake platforms. Proven track record of driving innovation, optimizing data pipelines, and delivering scalable solutions that empower organizations with actionable business insights.

EDUCATION

Master of Science (Computer Science) from Central Michigan University, Mount Pleasant

TECHNICAL SKILLS

Programming Languages: Python, Java, C

Databases: MySQL, SQL Server, PostgreSQL, DynamoDB, MongoDB.

Warehouses: Snowflakes.

Platforms: Linux, Windows

Cloud Technologies: AWS, GCP, Azure

Big Data Ecosystem: Spark, Airflow, Informatica,

Kafka, Kinesis, Hadoop, Hive.

Scripting Languages: JavaScript

Web Technologies: React.js, HTML5, CSS

Query Languages: SQL, Spark SQL, No-SQL

Data Visualization: Power BI, Tableau, AWS Quick Sight

CI/CD & Containerization: Docker, Kubernetes, GitHub Actions

Big Data Ecosystems: Spark, Airflow

Vector Data base: TileDB, Pinecone

PROFESSIONAL EXPERIENCE

Python Developer – Cellarity.Inc Jan’24 – Till Date

Boston MA

Partnered with architects to enhance data literacy by profiling, analyzing, and resolving knowledge gaps within the organization’s data landscape.

Designed and implemented highly scalable and fault-tolerant ETL/ELT pipelines using AWS Glue, Python, and PySpark, ensuring seamless data processing across large datasets.

Optimized Snowflake and AWS Redshift queries, reducing execution times by 40% and improving data warehouse performance for large-scale financial datasets.

Conducted in-depth data profiling and source system analysis, identifying Critical Data Elements (CDEs), business keys, and other key attributes.

Reduced AWS Redshift query execution time by 40% through advanced optimization techniques, including materialized views, partitioning, and indexing strategies.

Profiled data lakes and identified ingestion gaps to ensure comprehensive coverage of source system data.

Orchestrated serverless ETL workflows using AWS Step Functions and Lambda, achieving a 50% reduction in operational overhead and improving pipeline reliability.

Developed and documented data flows and business process mappings to enhance transparency and governance.

Led master data management (MDM) initiatives, collaborating with cross-functional teams to provide actionable insights and drive data-driven decision-making.

Implemented real-time vector search with Pinecone, enabling high-performance similarity search across genomic datasets, improving research efficiency in personalized medicine and clinical analytics.

Built real-time dashboards in Power BI, delivering actionable insights and streamlining decision-making for key stakeholders.

Developed real-time data streaming pipelines using Apache Kafka and Spark Streaming, enabling low-latency processing and supporting event-driven architectures.

Integrated AWS Bedrock (GenAI) into data pipelines, enhancing data quality checks and enabling automated insights generation.

Streamlined infrastructure deployment using Terraform, Ansible, and AWS CloudFormation, ensuring scalability, consistency, and reduced provisioning time for cloud resources.

Enforced data governance policies with AWS Lake Formation, implementing column-level security and data lineage tracking.

Optimized data warehouse performance and ETL processes using Informatica, improving data integration efficiency and reducing processing times across enterprise systems.

Tools and Technologies: Python (pandas, PySpark, NumPy, Pydantic), SnowFlaks, TileDB, Quilt, Docker, Hadoop, Kubernetes, AWS (Glue, Redshift, Lambda, Step Functions, Quick Sight, Omics), Jenkins, GitHub Actions, Terraform, and SQL databases (PostgreSQL, MySQL).

Graduate Research Assistant for BIG DATA -Central Michigan University, USA Aug’23-Dec’23

Facilitated students in successfully deploying, customizing, and overseeing Hadoop Clusters utilizing the Cloudera CDH distribution. Acted as a reliable resource in resolving configuration issues related to Hadoop setup, ensuring smooth operations for their data processing needs.

Demonstrated exemplary debugging proficiency while acquiring and configuring Kafka and Zookeeper components. Introduced a Single Node - Multiple Broker Cluster in Kafka, enhancing efficiency of record stream transfer to HDFS.

Provided invaluable assistance to students, curating informative presentations on various essential Big Data tool Spark. Ensured students were well-equipped to explore and utilize these tools for their data processing and analytics tasks.

Developed and optimized Hive queries for processing large datasets stored in HDFS, improving data retrieval efficiency and enabling faster analytics.

Built and deployed Apache Spark applications using Scala, leveraging distributed computing for large-scale data transformations and machine learning workloads.

Python Developer - Quest It Solutions Jun’2019-Dec’22

Hyderabad, India

Built and optimized a data warehouse on BigQuery, implementing partitioning, clustering, and materialized views to enhance query performance and support large-scale analytics on patient data.

Designed and implemented ETL pipelines using Google Cloud Dataflow, PySpark, and Python, integrating data from EHR/EMR systems, billing platforms, and lab reports, ensuring accurate and consistent patient records.

Developed a real-time patient monitoring system using Google Cloud Pub/Sub and Cloud Functions, enabling early detection of critical conditions by streaming data from IoT medical devices.

Automated ETL workflows using Google Cloud Composer (Apache Airflow), reducing manual intervention and improving data pipeline reliability by 30%.

Optimized data warehouse performance with BigQuery and Informatica, ensuring efficient data integration, transformation, and storage for large healthcare datasets.

Utilized Google Cloud Data Fusion for data integration and transformation, ensuring seamless interoperabilitybetween on-premise healthcare systems and cloud-based analytics platforms.

Created interactive dashboards in Google Data Studio, providing healthcare professionals with actionable insightsinto patient admission rates, treatment outcomes, and resource utilization.

Ensured compliance with HIPAA and data governance standards by implementing Google Cloud Dataproc and Cloud Identity, enforcing fine-grained access control and data lineage tracking.

Validated and transformed raw patient data using pandas, NumPy, and Pydantic, identifying and rectifying data inconsistencies to enhance decision-making.

Enhanced data warehouse capabilities by implementing Informatica for ETL automation, improving data consistency, quality, and processing efficiency for healthcare analytics.

Tools andTechnologies: Python (pandas, PySpark, NumPy, Pydantic), GCP (Dataflow, BigQuery, Pub/Sub, Cloud Functions, Cloud Composer, Data Fusion, Dataproc, Cloud Run), Azure, Docker, Kubernetes, Tableau, Power BI, Informatica, SQL databases (PostgreSQL, MySQL).

CERTIFICATIONS

1.AWS Solutions Architect Associate

2.AWS Data Engineer Associate

Contact this candidate