PRANAY RAHUL GANJI
+1-508-***-**** ************@*****.*** Open to Relocation linkedin.com/in/pranayrahulg PROFESSIONAL EXPERIENCE
Bose Corporation Framingham, MA, USA
Senior Data Engineer February 2025 - Present
• Designed ETL pipelines using SQL Server and SSIS to centralize transactional data from legacy systems. Streamlined daily data ingestion, improving accuracy for month-end financial reporting cycles.
• Collaborated with compliance teams to map regulatory data requirements into structured SQL workflows. Delivered audit-ready datasets, reducing discrepancies in SEC and SOX reporting.
• Automated recurring financial reports using Python scripts to extract and transform NAV data. Cut manual processing time by enabling straight-through processing for fund accounting teams.
• Optimized slow-running SQL queries by indexing high-volume trading tables. Enhanced query performance for client portfolio dashboards during peak market hours.
• Built data validation checks using T-SQL to identify mismatches in custodian bank feeds. Minimized reconciliation delays by flagging outliers for upstream correction.
• Standardized deployment and improved version control by migrating SSIS packages to a shared Azure DevOps repository and utilizing Terraform for infrastructure as code across global development teams.
• Resolved data latency issues in trade settlement pipelines by tuning SSIS buffer configurations. Achieved SLA compliance for intraday trade matching workflows.
• Partnered with analysts to document data lineage for critical asset management tables. Improved traceability for cross-functional stakeholders during audits.
• Supported database upgrades by testing SQL Server 2016 compatibility for legacy stored procedures. Ensured zero downtime during migration to updated infrastructure.
Fresenius Medical Care Waltham, MA, USA
Senior Data Engineer January 2024 - July 2024
• Developed HL7-compliant data pipelines to integrate EHR systems like Epic and Cerner, ensuring seamless patient data interoperability. Migrated raw clinical records to Amazon Redshift using AWS Glue for centralized analytics.
• Engineered Python scripts to automate cleansing of unstructured medical data, resolving inconsistencies in diagnostic codes and lab reports. Enabled real-time access to standardized datasets for care teams.
• Designed FHIR-based APIs to securely exchange patient data with external clinics, adhering to HIPAA compliance guidelines. Reduced manual data-sharing workflows by 80% for cross-institutional referrals.
• Deployed PySpark jobs on AWS EMR to process high-volume imaging metadata from radiology and pathology departments. Accelerated report generation for critical patient cases.
• Built a data lakehouse architecture with Delta Lake to unify structured EHR data and semi-structured IoT streams from medical devices. Improved predictive analytics for ICU patient monitoring.
• Collaborated with clinicians to create Tableau dashboards tracking hospital-acquired infection rates and bed occupancy metrics. Supported data-driven decisions during COVID-19 surges.
• Optimized query performance by reducing data retrieval time by 50% through partitioning tables in Amazon Redshift based on diagnosis timelines and patient demographics, utilizing Terraform for infrastructure management.
• Implemented data masking rules for PHI fields in Redshift using dynamic SQL policies, ensuring compliance with GDPR and India’s Digital Personal Data Protection Act.
• Automated nightly backups of critical oncology datasets to AWS S3 Glacier via Lambda, minimizing risk of data loss during system upgrades.
Amazon Hyderabad, TG, India
AWS Data Engineer October 2021 - August 2022
• Designed scalable data pipelines using AWS Glue and PySpark to process daily e-commerce transactions across global marketplaces. Implemented dynamic partitioning in S3 to optimize storage costs and query performance for analytics teams.
• Enabled near-real-time inventory tracking and reduced fulfillment delays by 20% by integrating Kafka streams and Kinesis with AWS Lambda and DynamoDB, empowering supply chain teams with proactive stock monitoring during peak sales events.
• Migrated legacy on-premises sales datasets to AWS Redshift using AWS DMS and Step Functions. Improved accessibility of historical order data for cross-regional business intelligence reporting.
• Automated data quality checks for product catalog feeds using AWS CloudWatch and custom Python scripts. Reduced manual validation efforts and ensured consistency in product attribute mappings.
• Optimized Athena query performance by converting raw JSON logs into partitioned Parquet format with AWS Glue DataBrew. Accelerated ad-hoc analysis for logistics and delivery time optimization.
• Built serverless data lakes using S3 lifecycle policies and AWS Lake Formation to centralize clickstream data from Amazon retail platforms. Enhanced behavioral analytics for customer journey mapping.
• Collaborated with DevOps to containerize Spark jobs using AWS ECR and EKS for resource-efficient batch processing of terabytescale pricing datasets. Streamlined deployment workflows for ML model retraining.
• Secured sensitive customer data in transit and at rest using AWS KMS and IAM role-based access controls. Maintained compliance with GDPR and internal data governance policies.
• Developed disaster recovery strategies for critical financial datasets by replicating Redshift clusters across AWS regions. Ensured business continuity during regional service disruptions.
KIMS Hospitals Hyderabad, TG, India
Data Engineer April 2020 - September 2021
• Designed Azure Data Factory pipelines to ingest and process environmental compliance data from industrial sensors and regulatory databases. Transformed raw datasets into structured formats using Databricks for EPA audit reporting.
• Integrated IoT sensor streams from waste management systems using Spark Streaming and Delta Lake to monitor hazardous material levels in real time. Enabled predictive alerts for site safety teams to mitigate compliance risks.
• Developed a data validation framework with Great Expectations and Python to ensure accuracy in waste disposal manifests and shipment records. Reduced manual corrections by automating outlier detection in transactional datasets.
• Migrated legacy on-premises SQL Server databases to Azure Synapse Analytics for scalable storage of historical environmental remediation data. Achieved cost-efficient querying for long-term compliance trend analysis.
• Built Power BI dashboards to visualize real-time metrics for landfill capacity and chemical treatment efficiency. Empowered operations teams to optimize resource allocation across North American facilities.
• Automated data quality checks for EPA-regulated datasets using Azure Functions and Logic Apps. Ensured adherence to RCRA and CERCLA reporting standards during quarterly submissions.
• Engineered disaster recovery workflows for critical compliance data using Azure Backup and Geo-Redundant Storage. Maintained uninterrupted access during system upgrades and regional outages.
• Collaborated with environmental scientists to map regulatory requirements into SQL-based data models for toxicology reports. Aligned schema designs with EPA’s ECHO database specifications.
• Optimized ETL pipelines handling terabyte-scale geospatial data from groundwater monitoring systems. Reduced latency by implementing partitioning strategies in Azure Data Lake Storage. State Street Corporation Hyderabad, TG, India
Junior Data Engineer June 2018 - March 2020
• Designed a Snowflake-based data platform integrating dbt Core for manufacturing process analytics. Standardized transformation workflows to accelerate reporting for global production teams.
• Engineered real-time pipelines using Kafka and Spark Structured Streaming to ingest IoT sensor data from audio products. Supported predictive maintenance models by delivering cleansed telemetry data within 5-minute SLAs.
• Architected a Databricks-powered feature store for machine learning teams analyzing customer usage patterns. Reduced feature engineering time by reusing validated datasets across recommendation and quality assurance models.
• Automated cross-region data synchronization between AWS and on-premises systems using Airflow and Delta Sharing. Maintained consistency in supply chain datasets during cloud migration initiatives.
• Implemented column-level security policies in Snowflake for customer entitlement data across direct-to-consumer platforms. Aligned access controls with CPRA and GDPR requirements for global user bases.
• Optimized manufacturing equipment JSON payload processing by developing PySpark UDFs in Databricks. Reduced ETL latency for factory floor operational dashboards.
• Built CI/CD pipelines for dbt models using GitHub Actions and Snowflake change tracking. Enabled seamless deployment of data marts supporting North American logistics planning.
• Streamlined legacy SAP ECC data extraction using AWS DMS and incremental loading strategies. Enhanced freshness of procurement analytics for vendor performance scorecards.
• Collaborated with DevOps to containerize Python-based data quality monitors using Kubernetes and Docker. Scaled validation checks across enterprise datasets without impacting pipeline throughput. EDUCATION
Northeastern University, Boston MA USA September 2022 - December 2024 Master's, Data Analytics
SKILLS
Languages: Python, SQL, PySpark, Java
Cloud Platforms: AWS, Glue, Lambda, S3, Redshift, Athena, EMR, KMS, Microsoft Azure, Azure Data Factory, Azure Synapse, Azure Functions, Snowflake
Data Tools: Spark, Kafka, Airflow, dbt, Databricks, Delta Lake, AWS DMS, SAP ECC ETL/Data Pipelines: SSIS, AWS Glue, Azure Data Factory, Spark Streaming, Great Expectations Databases: SQL Server, DynamoDB, Redshift, Azure Synapse, Snowflake, AWS Athena BI/Visualization: Power BI, Tableau, AWS Quicksight DevOps and CI/CD: Azure DevOps, GitHub Actions, Docker, Kubernetes, AWS ECR/EKS Data Governance: AWS Lake Formation, IAM, AWS KMS, GDPR/CPRA Compliance Industry Standards: HL7, FHIR (Healthcare), EPA/RCRA (Environmental), IoT Telemetry