Post Job Free
Sign in

Data Engineer Science

Location:
Posted:
September 10, 2025

Contact this candidate

Resume:

Akhil Bochu Data Engineer

New Hampshire 603-***-**** **************@*****.*** LinkedIn

PROFESSIONAL SUMMARY

Experienced Data Engineer with 5+ years of expertise in building scalable data pipelines using Python, SQL (T-SQL, PL/SQL), and Shell scripting, optimizing ETL workflows that reduced data processing time by 35%.

Expert in architecting cloud-native data solutions across AWS, Snowflake, and GCP, handling multi-terabyte workloads with zero data loss and high availability.

Hands-on expertise in building real-time streaming pipelines using Kafka, Flink, and Spark, enabling sub-second decision- making for business-critical applications.

Proven record of optimizing data warehouses and pipelines to cut cloud spend by 30%+ while improving query performance by up to 4x.

Collaborated with Finance, Marketing, and Data Science teams to translate business KPIs into scalable BI dashboards and predictive analytics pipelines, empowering 500+ users with self-service insights.

Designed and deployed robust data platforms using AWS services like Redshift, S3, Lambda, and EMR, handling over 3TB of daily data ingestion and supporting high-availability analytics for business-critical apps.

Developed modular ETL pipelines using Apache Airflow and dbt, integrated with Fivetran and Talend, achieving a 99.2% success rate in daily data sync across production and staging environments.

Implemented data warehousing solutions using Snowflake and Google BigQuery, improving query performance by 42% through clustering, partitioning, and result caching strategies.

Deployed Dockerized microservices integrated with Jenkins and GitLab CI/CD for seamless code promotion collaborated with DevOps teams to define Terraform IaC modules for infrastructure provisioning.

Built dynamic BI dashboards using Power BI, Tableau, and Looker to deliver executive-level reports; reduced dashboard load time by 40% using optimized DAX queries and incremental refresh.

Strong background in Agile development with proven ability to work cross-functionally with Data Science, Product, and DevOps teams to deliver production-ready data solutions with measurable impact. SKILLS

Languages & Programming Python, Java, JavaScript, SQL (T-SQL, PL/SQL), Shell Scripting (Bash) Cloud Platforms & Data Warehousing AWS (Redshift, S3, Lambda, EMR), Snowflake, Google BigQuery, Azure Synapse Analytics ETL / ELT & Data Orchestration Apache Airflow, dbt, Talend, Matillion, Fivetran, Informatica, AWS Glue, ETLC Big Data & Frameworks Apache Kafka, Hadoop, Flink, EMR, Dataflow, Django Databases & Storage PostgreSQL, MySQL, Oracle, MongoDB, NoSQL, Amazon S3, Azure Blob Storage AI / ML & Advanced Analytics Machine Learning model development, Data preprocessing, Feature engineering, Predictive analytics CI/CD & DevOps Docker, Kubernetes, Jenkins, Git, GitLab CI, Terraform, Infrastructure as Code (IaC) BI & Reporting Tools Power BI, Tableau, Looker, Google Data Studio PROFESSIONAL EXPERIENCE

Data Engineer

Glorium Technologies New Jersey Jul 2025 – Present

Designed high-performance queries in Redshift and Snowflake to support BI consumption for over 1.2 billion records weekly, enabling financial analysts to reduce report generation time from hours to under 10 minutes.

Engineered streaming data ingestion from Kafka into Amazon S3 using optimized Avro serialization, implementing partition strategies that improved retrieval speeds by nearly 40% for downstream analytics.

Automated metadata-driven ETL orchestration in AWS Glue and Step Functions, reducing pipeline failures by 25% and cutting manual monitoring by 15 hours/week.

Optimized Redshift queries and automated table clustering, reducing compute costs by $120K annually while cutting query execution time by 50%.

Designed and maintained data lakehouse architectures (Databricks, Snowflake, AWS/GCP/Azure) to centralize large-scale training datasets for AI/ML workloads.

Integrated Django applications with cloud services (AWS S3, Azure Blob, GCP BigQuery) to support distributed data storage and processing.

Designed Java-based microservices and APIs for secure, efficient data access layers, supporting consumption by analytics and BI applications.

Automated data quality checks and validation scripts in JavaScript/Node.js, ensuring accuracy and consistency across multi-terabyte datasets.

Tuned SQL queries, stored procedures, and Java persistence layers (Hibernate/JPA) for faster data retrieval and reporting.

Implemented JavaScript-based dashboards and data visualization tools (D3.js, React, or similar) to present data insights and monitoring metrics to stakeholders.

Designed Kafka-based streaming analytics for personalization workflows, improving customer engagement by 18% and reducing churn.

Implemented data governance using AWS Lake Formation and IAM-based policies, ensuring secure and compliant access for 200+ users.

Mentored 2 junior interns on dbt modeling and CI/CD deployment, improving onboarding efficiency and reducing ramp- up time from 4 weeks to 1.5 weeks.

Built event-driven architecture leveraging AWS Lambda triggers on S3 object creation to launch lightweight transformation jobs, reducing delay between ingestion and availability from 45 minutes to under 5.

Led CI/CD efforts by containerizing dbt transformations using Docker and deploying through Jenkins pipelines, achieving consistent and tested model deployments across dev and prod environments.

Collaborated with business stakeholders to create Power BI dashboards fed by real-time Redshift datasets, enabling KPI visualization and filtering for over 120+ operational users.

Instituted GitLab version control standards for data workflows, improving change traceability and rollback capabilities across 3 environments supporting finance and compliance teams.

Implemented Snowflake Streams and Tasks to enable incremental load design with historical tracking, ensuring zero data loss across 150+ pipeline executions.

Worked closely with IoT platform team to design NoSQL-based storage architecture in MongoDB and S3 for time- series data, reducing storage overhead by 30% using tiered retention and compression strategies. Environment: Redshift, Snowflake, BI Reporting, Kafka, Amazon S3, Avro, AWS Lambda, Docker, Jenkins, dbt, Power BI, GitLab, Streams & Tasks (Snowflake), MongoDB, NoSQL, Time-Series Data Architecture. Data Engineer (Intern)

Glorium Technologies New Jersey Jan 2025 – Jun 2025

Transformed a fragile legacy ETL workflow by refactoring into scalable Python modules with parameterized SQL logic, reducing run failures by 60% and cutting execution time from 6 hours to under 50 minutes.

Delivered Google Data Studio reports by enabling real-time ingestion from BigQuery, aggregating 25+ marketing KPIs across multiple regions using scheduled queries and federated joins.

Built a Kafka-Flink pipeline to support real-time event stream processing from web applications, applying transformation rules before writing enriched data to BigQuery with schema versioning support.

Modularized dbt transformations into reusable models with dynamic macros, reducing code duplication and supporting CI-enabled testing with automatic dataset freshness checks in Snowflake.

Integrated data orchestration workflows with Java and JavaScript (Node.js), scheduling jobs, handling dependencies, and enabling CI/CD for pipeline deployments.

Developed real-time data streaming pipelines with Kafka, Spark, and Flink to feed ML models with live data for fraud detection, recommendation engines, and risk analytics.

Tuned SQL queries, stored procedures, and Java persistence layers (Hibernate/JPA) for faster data retrieval and reporting.

Built schema designs and indexing strategies in MongoDB to optimize query performance, reduce latency, and improve analytical workloads.

Implemented Django ORM models to standardize data structures, enforce schema consistency, and improve query performance across large-scale datasets.

Re-engineered legacy Informatica workflows into AWS Glue and Airflow-managed pipelines, reducing license costs and improving observability across batch and real-time data paths.

Containerized ETL workers using Docker and deployed on Kubernetes with health checks and retries, reducing production failures and enabling horizontal scaling based on queue size.

Defined Glue Catalog and S3-based data lake layers with automated crawlers and schema tagging, enabling self- service access to analytics teams and data stewards.

Enhanced Tableau dashboards to support drill-through analytics for sales and HR, with monthly refresh automation reducing manual reporting cycles by 90%.

Built reusable Terraform modules for EMR and Airflow stack provisioning, embedding environment-specific configs and maintaining separation of dev/staging/prod deployments. Environment: Python, SQL, Google Data Studio, BigQuery, Kafka, Apache Flink, dbt, Snowflake, Informatica, AWS Glue, Apache Airflow, Docker, Kubernetes, Glue Data Catalog, Amazon S3, Tableau, Terraform, Amazon EMR. Data Engineer

Sage Softtech India Jan 2019 – Jul 2023

Delivered automated data validation suite for internal pipelines using SQL assertions and data profiling logic, surfacing schema drift and null-value hotspots across 30+ critical tables.

Engineered end-to-end data pipelines to ingest, clean, and transform multi-source datasets, providing high- quality training and inference data for ML models.

Created scalable data modeling solutions in PostgreSQL using dbt, enabling modular analytics with date- partitioned incremental loads and source freshness checks.

Implemented Google Cloud Dataflow pipelines for campaign clickstream transformation, enriching event data using lookup joins and persisting to BigQuery with windowed aggregations.

Developed and maintained data integration frameworks using Java, JDBC, and REST APIs to connect disparate sources (RDBMS, NoSQL, cloud storage) into centralized data platforms.

Built real-time data streaming solutions leveraging Java with Apache Kafka, Spark Streaming, and Flink for low- latency analytics and event-driven processing.

Implemented real-time data streaming solutions with MongoDB, Kafka, and Spark, enabling low-latency processing and event-driven architectures.

Designed and developed Django-based REST APIs to expose curated datasets, enabling seamless data access for analytics, machine learning, and reporting applications.

Collaborated with finance teams to deliver KPI-rich dashboards in Looker and Power BI, slashing manual report generation time by more than 85% across monthly business reviews.

Built real-time pricing data sync from Kafka into Snowflake using micro-batch ingestion jobs with retry and checkpoint logic to ensure high data integrity and traceability.

Integrated Great Expectations data quality framework with Airflow pipelines, automatically validating 50+ KPIs daily and reducing data-related incidents by 40%.

Partnered with marketing and finance teams to define business KPIs, translating them into scalable Looker dashboards used by 300+ stakeholders.

Enhanced data lineage visibility by embedding dbt docs and automated cataloging in Glue Data Catalog, improving traceability for auditors and compliance teams.

Streamlined Jenkins pipelines to support Docker-based CI/CD deployments of ETL jobs, embedding metadata and version tagging for enhanced traceability and rollback.

Configured Fivetran connectors for Salesforce and NetSuite integration into Snowflake, with sync schedules, SLA monitoring, and connector failure alerting via Slack.

Tuned Oracle and MySQL stored procedures and SQL logic to optimize report delivery pipelines, enabling up to 4x faster query performance for high-priority customer segments.

Ingested campaign data from external vendors into Azure Blob Storage using Data Factory, integrating retry logic and event- based triggers to support daily campaign reporting.

Environment: SQL, Data Profiling, PostgreSQL, dbt, Google Cloud Dataflow, BigQuery, Looker, Power BI, Kafka, Snowflake, Jenkins, Docker, CI/CD, Fivetran, Salesforce, NetSuite, Oracle, MySQL, Azure Blob Storage. EDUCATION

Master's in Computer Science Aug 2023 – May 2025 Rivier University Nashua, NH

Bachelor’s in Mechanical Engineering Jul 2017 – Jun 2021 Osmania University India



Contact this candidate