Data Engineer Real-Time

Location:

Union City, CA, 94587

Salary:

$125000

Posted:

September 10, 2025

Contact this candidate

Resume:

GHANA SHYAM KUTALA

Union City, CA +1-984-***-**** **************@*****.*** Linkedin

Professional Summary

Data Engineer with 4+ years of experience designing and optimizing large-scale data platforms. Expertise in developing Python-based ETL workflows and implementing Azure solutions, including Data Factory and Databricks, to enhance data quality and governance. Proven ability in translating complex business needs into cloud-native architectures with measurable impact. Experience

Technology Crest Corporation Oct 2023 - Present

Data Engineer Fair Oaks

• Delivered enterprise-scale data platform handling 240K+ daily cybersecurity events with 99.9% uptime by engineering Apache Spark and Kafka streaming pipelines, achieving sub-second ingestion latency and enhancing security operations.

• Reduced data discovery time by 73% through designing AWS-based cloud data lake with S3, Glue, and EMR, integrating automated data cataloging, governance, and metadata frameworks to streamline compliance and analytics readiness.

• Built efficient Python ETL workflows using Apache Airflow and PySpark to enable real-time threat intelligence processing and automated incident response, ensuring continuous monitoring across large-scale security data streams.

• Increased user scalability and analytics accessibility by implementing a data mesh architecture with microservices and API gateways, supporting 50+ concurrent business users through elastic, self-service data services.

• Boosted query performance by 85% through optimizing AWS Redshift and ClickHouse warehouses, applying advanced partitioning, columnar storage, and indexing strategies to accelerate reporting and analytical workloads. AI-Variant Jan 2022 - Jul 2022

Data Infrastructure Engineer Hyderabad, India

• Built scalable hotel analytics infrastructure by integrating 15+ reservation systems using Azure Data Factory and Databricks, enabling real-time insights and supporting business-critical decision-making.

• Ensured accurate, real-time synchronization between operational databases and analytics warehouse by implementing CDC pipelines with Debezium and Kafka Connect, strengthening reporting reliability.

• Improved analytical query performance by designing dimensional modeling and star schema architecture, supporting BI dashboards and advanced workloads for revenue forecasting and guest behavior insights.

• Cut pipeline development time by 60% through automating orchestration with Apache Airflow and dynamic DAG generation, reducing operational overhead and simplifying ongoing pipeline maintenance.

• Deployed containerized data processing environments with Docker and Kubernetes, ensuring seamless migration across development, staging, and production while maintaining consistency and reliability. ITM Mar 2020 - Sep 2021

Data Engineer Warangal, India

• Built and maintained HIPAA-compliant data platform supporting COVID-19 analytics across 45+ hospitals by engineering secure pipelines and encrypted storage with audit trails, ensuring compliance and strengthening trust in healthcare data.

• Improved interoperability of clinical systems by developing HL7 FHIR integrations with Apache NiFi and custom Java connectors, enabling seamless and standardized patient data exchange across hospital networks.

• Delivered zero data loss and uninterrupted operations during critical healthcare activities by designing PostgreSQL streaming replication with automated failover, enhancing availability and disaster recovery resilience.

• Enabled proactive healthcare monitoring through real-time architecture using InfluxDB and Grafana, driving automated alerts for critical patient conditions and improving emergency response outcomes.

• Enhanced system scalability and operational reliability by designing resilient healthcare data pipelines and visualization dashboards, ensuring hospitals could handle pandemic-driven surges in patient records and analytics workloads. Technical Skills

• Big Data & Processing: Apache Spark, Hadoop, Kafka, Airflow, Hive, HBase, Cassandra, Elasticsearch, Apache Beam, Flink, Storm

• Cloud Platforms: AWS (Redshift, S3, EMR, Glue, Kinesis, Lambda), Azure (Data Factory, Synapse, Databricks), Google Cloud (BigQuery, Dataflow, Pub/Sub), Azure VM

• Programming Languages: Python, Scala, Java, SQL, Shell scripting, Go, R, JavaScript, PySpark, Spark SQL, Python ETL

• Databases & Warehousing: PostgreSQL, MySQL, MongoDB, Redis, DynamoDB, Snowflake, Oracle, SQL Server, Neo4j, InfluxDB

• Data Integration & ETL Tools: Apache Airflow, Luigi, Prefect, dbt, Talend, Informatica, SSIS, Jenkins, GitLab CI/CD

• Infrastructure & DevOps: Docker, Kubernetes, Terraform, Ansible, Prometheus, Grafana, ELK Stack, DataDog, Linux/Unix

• Other Expertise: Data Modeling, Data Mesh, Streaming Architecture, Data Governance, Metadata Management, Performance Optimization, Data Quality

• Industry Knowledge: Financial, Banking Industry

Certifications

• Azure Fundamentals (AZ-900)

• Machine Learning with Python - IBM Certified

• MongoDB - Data Processing

Education

Wichita State University Aug 2022 - May 2024

Master of Science, Data Science

• Achievements: IEEE Publication: "Accuracy Analysis of Hotel Review Information using Machine Learning" Chaitanya Degree College May 2018 - Sep 2021

Bachelor of Science, Mathematics, Statistics and Computer Science

Contact this candidate