Data Engineer Processing

Location:

Hyderabad, Telangana, India

Salary:

$95,000 – $115,000

Posted:

September 10, 2025

Contact this candidate

Resume:

Meghana Botu

503-***-**** *******.**********@*****.*** LinkedIn-Meghana Botu GitHub-MegahanaBotu

Summary

Results-oriented Data Engineer with 4 years of experience in designing and maintaining data pipelines on Cloud Platforms. Skilled in Dataproc, Scala, Spark, Kafka, GCS, BigQuery, and SQL for eﬃcient data processing and scalability. Proﬁcient in using Automic and Airﬂow for job scheduling, and GitHub and Jenkins for CI/CD. Experienced in resolving data anomalies and conducting RCA. Expertise in data modeling and managing cloud-based data infrastructure. Adept at automating workﬂows and developing data models and warehouses. Committed to leveraging technical skills to drive business success. Experience

Barclays Febuary 2024 – Present

Cloud Data Engineer

• Architected an intelligent conversational assistant using AWS Bedrock with Claude Sonnet LLM, implementing Retrieval Augmented Generation (RAG) on HR and company policy datasets, achieving 95% query coverage with response times under 3 seconds during the Barclays GenAI Hackathon.

• Built an ETL pipeline to migrate historical trade data from Apache Hive to SQL using ElasticSearch (ELK) and Kafka, automating workﬂows and saving 20 hours per week.

• Automated cloud infrastructure scaling with AWS CloudFormation (YAML), deploying EC2, VPC, IAM, S3, and Lambda for eﬃcient and cost-optimized resource management.

• Developed business intelligence dashboards consolidating 150+ weekly reports into interactive views with server usage metrics and error insights, improving visibility for stakeholders.

• Implemented fraud detection and anomaly analysis models to ﬂag server issues such as max resource utilization, enabling early identiﬁcation of anomalous nodes for investigation.

• Led cross-functional collaboration with product managers, business analysts, and testing teams to design ELK dashboards with SQL and triggers, reducing production incidents by 30% and improving release quality by 25%.

• Established CI/CD pipelines with Jenkins across environments, reducing deployment cycle times by 30% and ensuring smooth production transitions.

• Integrated cloud data lakes with downstream tools like Power BI for seamless reporting, enhancing real-time data accessibility for business intelligence teams.

• Implemented distributed real-time data processing using Hadoop frameworks like Hive and Pig, signiﬁcantly improving data processing speeds and accuracy.

• Developed Spark pipelines in Databricks for daily sync jobs and monitored feature engineering pipelines, ensuring timely delivery of machine learning models for production use. Informatica January 2020 – July 2022

Data Engineer

• Built a real-time analytics solution using AWS Glue by migrating AWS Storage Services (EBS, S3, EFS) data to Redshift and AWS QuickSight, enabling on-demand insights for product teams through Elasticsearch and reducing processing time by 20%.

• Led a database migration project for internal data, applying data quality transformations and executing historical batch data transfers using JDBC and SQL.

• Improved user experience by enhancing UI layouts and navigation using Java, HTML, and JavaScript, resulting in more intuitive and eﬃcient interfaces.

• Designed and implemented a CI/CD pipeline with SVN, Jenkins, and AWS to automate testing and deployments across environments, accelerating release cycles and improving reliability.

• Developed and deployed real-time streaming pipelines using PySpark, processing data from Kafka topics and Event Hubs to support machine learning models.

• Improved data processing eﬃciency by 25% through the implementation of Spark RDDs and optimizations in PySpark for structured and semi-structured data.

• Optimized PySpark scripts for handling various ﬁle formats (CSV, JSON, Parquet) with snappy and gzip compression, reducing ﬁle size and enhancing data ingestion performance.

• Built dashboards to enable ongoing delivery of information and insights to stakeholders, improving data-driven decision-making processes across the organization.

• Prepared and presented insightful ﬁnancial reports and analyses to senior management, highlighting key trends and ﬁnancial performance metrics using Power BI dashboards.

• Analyzed large-scale datasets from Snowﬂake to uncover trends and insights, aiding in strategic planning and decision-making processes.

• Provided training and support to ﬁnance and risk management teams on new BI tools and reporting systems, enhancing their ability to leverage data for decision-making.

• Managed Tableau authentication and authorization, ensuring secure and appropriate access to data across the enterprise.

• Developed data governance policies and procedures, leading to improved data quality and compliance with regulatory requirements.

Verans Technology May 2019 – December 2019

Data Engineer

• Built and scheduled Spark jobs in Databricks to perform daily syncs, improving the accuracy of feature engineering pipelines by 15%.

• Optimized PySpark code to handle CSV, JSON, and Parquet ﬁle formats, reducing data processing times by 30% and supporting real-time analytics.

• Modiﬁed complex SQL queries for Spark-based pipelines, improving the performance of ETL workﬂows and reducing execution times by 25%.

• Utilized big data programming languages like Scala and Python to modify SQL queries into Spark RDDs, signiﬁcantly improving the performance of data processing jobs.

• Integrated multiple cloud-based SaaS technologies with data processing frameworks like Spark and Storm, providing high-performance distributed computing environments.

• Managed Spark and Kubernetes microservices for high-performance distributed big data processing, improving job orchestration and resource management.

• Architected end-to-end business intelligence solutions to support sales and marketing teams, integrating data from various sources into PostgreSQL and ensuring high data quality and accessibility.

• Enhanced documentation for data models and analytical methodologies, ensuring clarity and ease of use for future data analysts and business users.

• Integrated data from various sources, including transactional databases, customer management systems, and web analytics tools, into PostgreSQL, ensuring a comprehensive and uniﬁed view of the business.

• Performed deep dive analyses on sales trends to generate insights that informed strategic marketing decisions and product development.

• Collaborated with marketing and product teams to support new product launches, ensuring comprehensive performance tracking and analysis.

• Worked with Analytics Engineers and DataOps to develop data pipelines, resulting in a 30% reduction in data processing times and improved data accessibility.

• Managed and optimized the PostgreSQL database environment, enhancing performance and reliability for business-critical applications.

Education

Arizona State University July 2022 - July 2023

Master of Science in Information Systems Management – Data Analytics (GPA: 4.00 / 4.00) Tempe, Arizona Gandhi Institute of Technology and Management (GITAM) June 2016 - July 2020 Bachelor of Technology in Computer Science (GPA: 3.68 / 4.00) Visakhapatnam, India Technical Skills

Languages: Python, SQL, Java, Scala

Technologies Tableau, Power BI, Hadoop, Kafka, Snowﬂake, Apache Airﬂow, Amazon Redshift, Git, Parquet, Bitbucket, JIRA, Docker, Microsoft Azure, AWS, Tensorﬂow, Azure Data Factory, HDInsight, CI/CD, ETL, Azure Databricks, EC2, Azure Synapse Analytics, MySQL, PySpark, Apache, Azure Event Hub, Agile, Excel, Matplotlib, Statistical Analysis Certiﬁcation: Microsoft Certiﬁed DP-203-Data Engineering on Microsoft Azure,fast-paced, design, deep learning, cross-functional, consumer, communicate, SQL, Python, R, Excel, Tableau, Power BI, SAS, SPSS, MATLAB, QlikView, Qlik Sense, Google Analytics, Adobe Analytics, Looker, Data Studio, Microsoft Access, VBA, DAX, Pivot Tables, Jupyter Notebooks, Apache Spark, Hadoop, BigQuery, Snowﬂake, Redshift, Azure Data Lake, AWS (Amazon Web Services), Google Cloud Platform, ETL (Extract, Transform, Load), Data Warehousing, Data Mining, Machine Learning, Statistical Analysis, Predictive Modeling, Regression Analysis, Classiﬁcation, Clustering, Time Series Analysis, A/B Testing, Optimization, Data Visualization, Dashboard Development, Data Cleaning, Data Preprocessing, Data Management, Database Design, Database Management, OLAP, JSON, XML, API, Git, Version Control, CRM (Customer Relationship Management), ERP (Enterprise Resource Planning), Business Intelligence, Analytics, Reporting, Insights, Competitive Analysis, Market Research, Customer Segmentation, User Experience (UX) Analysis, Conversion Rate Optimization (CRO), Inventory Analysis, Supply Chain Analysis, Operations Research.

Contact this candidate