Meghana Botu
503-***-**** *******.**********@*****.*** LinkedIn-Meghana Botu GitHub-MegahanaBotu
Summary
Results-oriented Data Engineer with 4 years of experience in designing and maintaining data pipelines on Cloud Platforms. Skilled in Dataproc, Scala, Spark, Kafka, GCS, BigQuery, and SQL for efficient data processing and scalability. Proficient in using Automic and Airflow for job scheduling, and GitHub and Jenkins for CI/CD. Experienced in resolving data anomalies and conducting RCA. Expertise in data modeling and managing cloud-based data infrastructure. Adept at automating workflows and developing data models and warehouses. Committed to leveraging technical skills to drive business success. Experience
Barclays Febuary 2024 – Present
Cloud Data Engineer
• Architected an intelligent conversational assistant using AWS Bedrock with Claude Sonnet LLM, implementing Retrieval Augmented Generation (RAG) on HR and company policy datasets, achieving 95% query coverage with response times under 3 seconds during the Barclays GenAI Hackathon.
• Built an ETL pipeline to migrate historical trade data from Apache Hive to SQL using ElasticSearch (ELK) and Kafka, automating workflows and saving 20 hours per week.
• Automated cloud infrastructure scaling with AWS CloudFormation (YAML), deploying EC2, VPC, IAM, S3, and Lambda for efficient and cost-optimized resource management.
• Developed business intelligence dashboards consolidating 150+ weekly reports into interactive views with server usage metrics and error insights, improving visibility for stakeholders.
• Implemented fraud detection and anomaly analysis models to flag server issues such as max resource utilization, enabling early identification of anomalous nodes for investigation.
• Led cross-functional collaboration with product managers, business analysts, and testing teams to design ELK dashboards with SQL and triggers, reducing production incidents by 30% and improving release quality by 25%.
• Established CI/CD pipelines with Jenkins across environments, reducing deployment cycle times by 30% and ensuring smooth production transitions.
• Integrated cloud data lakes with downstream tools like Power BI for seamless reporting, enhancing real-time data accessibility for business intelligence teams.
• Implemented distributed real-time data processing using Hadoop frameworks like Hive and Pig, significantly improving data processing speeds and accuracy.
• Developed Spark pipelines in Databricks for daily sync jobs and monitored feature engineering pipelines, ensuring timely delivery of machine learning models for production use. Informatica January 2020 – July 2022
Data Engineer
• Built a real-time analytics solution using AWS Glue by migrating AWS Storage Services (EBS, S3, EFS) data to Redshift and AWS QuickSight, enabling on-demand insights for product teams through Elasticsearch and reducing processing time by 20%.
• Led a database migration project for internal data, applying data quality transformations and executing historical batch data transfers using JDBC and SQL.
• Improved user experience by enhancing UI layouts and navigation using Java, HTML, and JavaScript, resulting in more intuitive and efficient interfaces.
• Designed and implemented a CI/CD pipeline with SVN, Jenkins, and AWS to automate testing and deployments across environments, accelerating release cycles and improving reliability.
• Developed and deployed real-time streaming pipelines using PySpark, processing data from Kafka topics and Event Hubs to support machine learning models.
• Improved data processing efficiency by 25% through the implementation of Spark RDDs and optimizations in PySpark for structured and semi-structured data.
• Optimized PySpark scripts for handling various file formats (CSV, JSON, Parquet) with snappy and gzip compression, reducing file size and enhancing data ingestion performance.
• Built dashboards to enable ongoing delivery of information and insights to stakeholders, improving data-driven decision-making processes across the organization.
• Prepared and presented insightful financial reports and analyses to senior management, highlighting key trends and financial performance metrics using Power BI dashboards.
• Analyzed large-scale datasets from Snowflake to uncover trends and insights, aiding in strategic planning and decision-making processes.
• Provided training and support to finance and risk management teams on new BI tools and reporting systems, enhancing their ability to leverage data for decision-making.
• Managed Tableau authentication and authorization, ensuring secure and appropriate access to data across the enterprise.
• Developed data governance policies and procedures, leading to improved data quality and compliance with regulatory requirements.
Verans Technology May 2019 – December 2019
Data Engineer
• Built and scheduled Spark jobs in Databricks to perform daily syncs, improving the accuracy of feature engineering pipelines by 15%.
• Optimized PySpark code to handle CSV, JSON, and Parquet file formats, reducing data processing times by 30% and supporting real-time analytics.
• Modified complex SQL queries for Spark-based pipelines, improving the performance of ETL workflows and reducing execution times by 25%.
• Utilized big data programming languages like Scala and Python to modify SQL queries into Spark RDDs, significantly improving the performance of data processing jobs.
• Integrated multiple cloud-based SaaS technologies with data processing frameworks like Spark and Storm, providing high-performance distributed computing environments.
• Managed Spark and Kubernetes microservices for high-performance distributed big data processing, improving job orchestration and resource management.
• Architected end-to-end business intelligence solutions to support sales and marketing teams, integrating data from various sources into PostgreSQL and ensuring high data quality and accessibility.
• Enhanced documentation for data models and analytical methodologies, ensuring clarity and ease of use for future data analysts and business users.
• Integrated data from various sources, including transactional databases, customer management systems, and web analytics tools, into PostgreSQL, ensuring a comprehensive and unified view of the business.
• Performed deep dive analyses on sales trends to generate insights that informed strategic marketing decisions and product development.
• Collaborated with marketing and product teams to support new product launches, ensuring comprehensive performance tracking and analysis.
• Worked with Analytics Engineers and DataOps to develop data pipelines, resulting in a 30% reduction in data processing times and improved data accessibility.
• Managed and optimized the PostgreSQL database environment, enhancing performance and reliability for business-critical applications.
Education
Arizona State University July 2022 - July 2023
Master of Science in Information Systems Management – Data Analytics (GPA: 4.00 / 4.00) Tempe, Arizona Gandhi Institute of Technology and Management (GITAM) June 2016 - July 2020 Bachelor of Technology in Computer Science (GPA: 3.68 / 4.00) Visakhapatnam, India Technical Skills
Languages: Python, SQL, Java, Scala
Technologies Tableau, Power BI, Hadoop, Kafka, Snowflake, Apache Airflow, Amazon Redshift, Git, Parquet, Bitbucket, JIRA, Docker, Microsoft Azure, AWS, Tensorflow, Azure Data Factory, HDInsight, CI/CD, ETL, Azure Databricks, EC2, Azure Synapse Analytics, MySQL, PySpark, Apache, Azure Event Hub, Agile, Excel, Matplotlib, Statistical Analysis Certification: Microsoft Certified DP-203-Data Engineering on Microsoft Azure,fast-paced, design, deep learning, cross-functional, consumer, communicate, SQL, Python, R, Excel, Tableau, Power BI, SAS, SPSS, MATLAB, QlikView, Qlik Sense, Google Analytics, Adobe Analytics, Looker, Data Studio, Microsoft Access, VBA, DAX, Pivot Tables, Jupyter Notebooks, Apache Spark, Hadoop, BigQuery, Snowflake, Redshift, Azure Data Lake, AWS (Amazon Web Services), Google Cloud Platform, ETL (Extract, Transform, Load), Data Warehousing, Data Mining, Machine Learning, Statistical Analysis, Predictive Modeling, Regression Analysis, Classification, Clustering, Time Series Analysis, A/B Testing, Optimization, Data Visualization, Dashboard Development, Data Cleaning, Data Preprocessing, Data Management, Database Design, Database Management, OLAP, JSON, XML, API, Git, Version Control, CRM (Customer Relationship Management), ERP (Enterprise Resource Planning), Business Intelligence, Analytics, Reporting, Insights, Competitive Analysis, Market Research, Customer Segmentation, User Experience (UX) Analysis, Conversion Rate Optimization (CRO), Inventory Analysis, Supply Chain Analysis, Operations Research.