GHANA SHYAM KUTALA
Union City, CA +1-984-***-**** **************@*****.*** Linkedin
Professional Summary
Data Engineer with 4+ years of experience designing and optimizing large-scale data platforms. Expertise in developing Python-based ETL workflows and implementing Azure solutions, including Data Factory and Databricks, to enhance data quality and governance. Proven ability in translating complex business needs into cloud-native architectures with measurable impact. Experience
Technology Crest Corporation Oct 2023 - Present
Data Engineer Fair Oaks
• Delivered enterprise-scale data platform handling 240K+ daily cybersecurity events with 99.9% uptime by engineering Apache Spark and Kafka streaming pipelines, achieving sub-second ingestion latency and enhancing security operations.
• Reduced data discovery time by 73% through designing AWS-based cloud data lake with S3, Glue, and EMR, integrating automated data cataloging, governance, and metadata frameworks to streamline compliance and analytics readiness.
• Built efficient Python ETL workflows using Apache Airflow and PySpark to enable real-time threat intelligence processing and automated incident response, ensuring continuous monitoring across large-scale security data streams.
• Increased user scalability and analytics accessibility by implementing a data mesh architecture with microservices and API gateways, supporting 50+ concurrent business users through elastic, self-service data services.
• Boosted query performance by 85% through optimizing AWS Redshift and ClickHouse warehouses, applying advanced partitioning, columnar storage, and indexing strategies to accelerate reporting and analytical workloads. AI-Variant Jan 2022 - Jul 2022
Data Infrastructure Engineer Hyderabad, India
• Built scalable hotel analytics infrastructure by integrating 15+ reservation systems using Azure Data Factory and Databricks, enabling real-time insights and supporting business-critical decision-making.
• Ensured accurate, real-time synchronization between operational databases and analytics warehouse by implementing CDC pipelines with Debezium and Kafka Connect, strengthening reporting reliability.
• Improved analytical query performance by designing dimensional modeling and star schema architecture, supporting BI dashboards and advanced workloads for revenue forecasting and guest behavior insights.
• Cut pipeline development time by 60% through automating orchestration with Apache Airflow and dynamic DAG generation, reducing operational overhead and simplifying ongoing pipeline maintenance.
• Deployed containerized data processing environments with Docker and Kubernetes, ensuring seamless migration across development, staging, and production while maintaining consistency and reliability. ITM Mar 2020 - Sep 2021
Data Engineer Warangal, India
• Built and maintained HIPAA-compliant data platform supporting COVID-19 analytics across 45+ hospitals by engineering secure pipelines and encrypted storage with audit trails, ensuring compliance and strengthening trust in healthcare data.
• Improved interoperability of clinical systems by developing HL7 FHIR integrations with Apache NiFi and custom Java connectors, enabling seamless and standardized patient data exchange across hospital networks.
• Delivered zero data loss and uninterrupted operations during critical healthcare activities by designing PostgreSQL streaming replication with automated failover, enhancing availability and disaster recovery resilience.
• Enabled proactive healthcare monitoring through real-time architecture using InfluxDB and Grafana, driving automated alerts for critical patient conditions and improving emergency response outcomes.
• Enhanced system scalability and operational reliability by designing resilient healthcare data pipelines and visualization dashboards, ensuring hospitals could handle pandemic-driven surges in patient records and analytics workloads. Technical Skills
• Big Data & Processing: Apache Spark, Hadoop, Kafka, Airflow, Hive, HBase, Cassandra, Elasticsearch, Apache Beam, Flink, Storm
• Cloud Platforms: AWS (Redshift, S3, EMR, Glue, Kinesis, Lambda), Azure (Data Factory, Synapse, Databricks), Google Cloud (BigQuery, Dataflow, Pub/Sub), Azure VM
• Programming Languages: Python, Scala, Java, SQL, Shell scripting, Go, R, JavaScript, PySpark, Spark SQL, Python ETL
• Databases & Warehousing: PostgreSQL, MySQL, MongoDB, Redis, DynamoDB, Snowflake, Oracle, SQL Server, Neo4j, InfluxDB
• Data Integration & ETL Tools: Apache Airflow, Luigi, Prefect, dbt, Talend, Informatica, SSIS, Jenkins, GitLab CI/CD
• Infrastructure & DevOps: Docker, Kubernetes, Terraform, Ansible, Prometheus, Grafana, ELK Stack, DataDog, Linux/Unix
• Other Expertise: Data Modeling, Data Mesh, Streaming Architecture, Data Governance, Metadata Management, Performance Optimization, Data Quality
• Industry Knowledge: Financial, Banking Industry
Certifications
• Azure Fundamentals (AZ-900)
• Machine Learning with Python - IBM Certified
• MongoDB - Data Processing
Education
Wichita State University Aug 2022 - May 2024
Master of Science, Data Science
• Achievements: IEEE Publication: "Accuracy Analysis of Hotel Review Information using Machine Learning" Chaitanya Degree College May 2018 - Sep 2021
Bachelor of Science, Mathematics, Statistics and Computer Science