K Harish
Email: ***********@*****.***
Mobile: 937-***-****
LinkedIn: https://www.linkedin.com/in/harishkari/
Senior Data Engineer
PROFESSIONAL SUMMARY
Designed and implemented large-scale data architectures across AWS, Azure, and GCP over 5 years, enabling businesses to leverage data for strategic decision-making
Optimized complex SQL queries and built high-performance models in Redshift, Snowflake, and Azure Synapse, ensuring seamless data integration and processing
Built end-to-end big data solutions using Hadoop, Spark, and Kinesis, creating real-time data pipelines for efficient processing and analysis of massive datasets.
Developed automated workflows using AWS Lambda, Azure Functions, and GCP Cloud Functions, reducing deployment times and improving system reliability.
Integrated machine learning models in AWS SageMaker and Azure ML, driving actionable insights and predictive analytics to enhance decision-making and operational efficiency.
Engineered tailored data solutions for e-commerce, healthcare, and finance industries, optimizing operations, improving customer experience, and reducing costs.
Facilitated excellent written oral communication skills to enhance team collaboration and streamline project updates.
Leveraged passion automation continual process improvement to boost efficiency and reduce operational costs. TECHNICAL SKILLS
Cloud Platforms - AWS (EC2, Lambda, Glue, S3, Kinesis, IAM, EKS, Redshift), Azure (ADF, Synapse, Azure SQL, Entra ID, Key Vault), GCP (BigQuery, GKE, Cloud Storage)
Infrastructure As Code (Iac) - Terraform, Ansible, ARM Templates, Bicep, CloudFormation, Jenkins, Azure DevOps
Monitoring And Incident Response - New Relic, AWS CloudWatch, Azure Monitor, ServiceNow, RCA, SLA Management
Security And Compliance - IAM, Encryption, NIST 800-53, CIS Benchmarks, PCI-DSS, RBAC, Key Vault, Audit Logging
Ci/Cd And Devops - Jenkins, GitHub Actions, Git, GitLab, CodePipeline, CI/CD Pipelines, Shell Scripting Programming & Scripting - Python, SQL, Bash, PowerShell
Data Engineering - AWS Glue, Azure Data Factory, DBT, Apache Kafka, Spark, Hive, GCP Dataflow, data flows
Databases - Redshift, Snowflake, Azure SQL, PostgreSQL, MongoDB, MySQL, Oracle, relational databases
Dashboards And Visualization - Power BI, Tableau, Looker, AWS QuickSight
Programming Languages - Perl
Tools And Platforms - Linux, Unix, Oracle Exadata, Informatica
System Administration And Infrastructure - Linux-based processes, Unix file systems, mount types, permissions, standard tools, pipes
PROFESSIONAL EXPERIENCE
Big commerce August 2023 – Present
Senior Data Engineer
Designed and developed end-to-end data pipelines using AWS Glue, Lambda, and Step Functions, integrating sales, customer behavior, and inventory data from multiple sources into Amazon Redshift for business analytics.
Built a real-time recommendation data framework with Kinesis Data Streams and Firehose, enabling personalized product recommendations and improving conversion rates by 25%.
Developed ETL workflows and data lake architecture on Amazon S3 and EMR (Spark) to process terabytes of clickstream and order data, supporting scalable machine learning and reporting pipelines.
Automated data quality checks, validation rules, and schema drift handling through custom Python scripts and Glue jobs, ensuring consistent and trustworthy analytics outputs.
Integrated QuickSight dashboards and Redshift Spectrum queries to deliver interactive insights on pricing optimization, cart abandonment, and customer retention metrics.
Implemented data governance and security protocols using IAM, CloudWatch, CloudTrail, and KMS, ensuring compliance with PCI DSS and safeguarding sensitive transactional data.
Implemented Oracle and Oracle Exadata solutions to enhance data retrieval speeds by 40%, significantly improving database performance and user satisfaction.
Designed robust Data Warehousing and ETL/database load/extract processes, boosting data processing efficiency by 25% and supporting informed business decision-making.
Developed Perl scripts for automation, reducing manual intervention by 60% and streamlining routine tasks for greater operational efficiency.
Demonstrated excellent written and oral communication skills to facilitate seamless collaboration across cross- functional teams, resulting in a 15% improvement in project delivery timelines. Fiserv March 2021 – January 2023
Azure Data Engineer
Designed and implemented metadata-driven ETL pipelines using Azure Data Factory (ADF) and Microsoft Fabric, integrating core banking, transaction, and portfolio datasets into a centralized lakehouse for enterprise analytics.
Built and optimized Delta Lake architectures on ADLS Gen2, enabling historical tracking and regulatory data retention for auditing and financial reconciliation.
Developed data transformation frameworks in Azure Databricks (PySpark, SQL) to cleanse, standardize, and aggregate large-scale transaction data for compliance and performance analytics.
Integrated Microsoft Fabric Dataflows and Synapse Data Warehouses to deliver near real-time insights for credit risk assessment, investment analysis, and liquidity reporting.
Implemented data governance, lineage, and access control frameworks using Azure Purview, Key Vault, and RBAC, ensuring adherence to SOX, GDPR, and financial audit regulations.
Automated CI/CD data deployment pipelines using Azure DevOps and GitHub Actions, enhancing delivery speed, reproducibility, and consistency across multiple financial environments.
Cultivated a passion for automation and continual process improvement, leading initiatives that reduced operational costs by 20% through innovative process reengineering.
Utilized Linux and Unix systems to optimize Unix file systems, mount types, and permissions, enhancing system security and reliability.
Leveraged Informatica and relational databases to orchestrate complex data flows, ensuring data integrity and accuracy across multiple platforms.
Streamlined Linux-based processes using standard tools and pipes, achieving a 30% reduction in processing time for critical operations.
Health Plix January 2020 – March 2021
Associate Data engineer
Designed and implemented end-to-end ETL pipelines using Google Cloud Dataflow and Dataproc (PySpark) to integrate EHR, claims, and provider datasets for population health analytics.
Built and optimized data warehouses in BigQuery, applying partitioning, clustering, and materialized views to accelerate reporting on patient outcomes, care efficiency, and hospital performance metrics.
Developed real-time data ingestion frameworks using Pub/Sub and Cloud Functions, enabling instant synchronization of clinical and lab data across healthcare systems.
Collaborated with analysts and medical data teams to create predictive models and visual dashboards in Looker Studio, supporting early diagnosis, risk scoring, and readmission prevention initiatives.
Applied data quality, validation, and auditing frameworks using Cloud Composer (Airflow) and Data Catalog, improving data reliability and regulatory compliance by 40%.
Ensured strict HIPAA and HL7/FHIR compliance through IAM-based access control, encryption (KMS), and data masking, maintaining secure handling of patient and clinical information.
Applied Agile methodology to manage backend-focused projects, improving team productivity by 35% and accelerating time-to-market.
Employed orchestration tools to drive system/architecture improvements, resulting in a 25% increase in system uptime and performance.
Enhanced toolsets and processes to support agile development, leading to a 20% reduction in deployment times and improved software quality.
EDUCATION
Masters in Computer Engineering - Wright State University
Bachelor's in Electrical and Communications Engineering - Dr.MGR University