Data Engineer Power Bi

Location:

Hayward, CA

Posted:

August 15, 2025

Contact this candidate

Resume:

RAVALIKA MARAPALLY

Data Engineer

*************@*****.*** 346-***-**** Hayward, CA-94544, USA LinkedIn

SUMMARY

●Highly skilled Data Engineer with over 5+ years of professional experience designing and developing scalable data infrastructure for storage, transformation, and analytics, including datasets exceeding 5 TB.

●Knowledgeable in relational and NoSQL database systems, including PostgreSQL, MySQL, MongoDB, Oracle, and Teradata, with a focus on performance tuning and governance.

●Proficient in architecting cloud-native solutions on Azure, AWS, leveraging services like Databricks, Data Factory, EC2, S3, Lambda, Glue, Redshift, and BigQuery to drive cost-efficient, enterprise-grade data ecosystems.

●Streamlined deployment workflows by containerizing applications with Docker/Kubernetes and automating CI/CD pipelines (Kubernetes, GitLab), decreasing deployment cycles by 30 minutes per release.

TECHNICAL SKILLS

Programming Languages:

Python, SQL, R

DevOps Tools:

Docker, Kubernetes, Jenkins

ETL Tools:

Informatica, Talend, Apache Airflow

Data Visualization:

Tableau, Power BI, Matplotlib, Seaborn, Plotly

Cloud Technologies:

Azure (Blob Storage), AWS (EC2, S3, Lambda, Glue, Redshift)

Databases:

Snowflake, Oracle, MS-Access, MySQL, Teradata, PostgreSQL, MongoDB, NoSQL

Big Data Technologies:

Hadoop, Spark, HDFS, MapReduce, Cassandra, Impala, Kafka

Data Engineering Tools:

Azure Databricks, PySpark, Data Lake Storage, Azure Data Factory, Synapse Analytics

Methodologies:

Agile, Scrum, Waterfall

IDEs:

PyCharm, Jupyter Notebook, Visual Studio

Version Control:

Git, GitHub, GitLab, CI/CD workflows

EDUCATION

University of Central Missouri Jan 2023 - May 2024

Master of Science in Computer Science MO, USA

SR University Jun 2014 - May 2018

Bachelor of Technology in Computer Science and Engineering Warangal, TS, India

PROFESSIONAL EXPERIENCE

McKinsey & Company Oct 2024 - Current

Data Engineer MI, USA

●Designed batch processing applications leveraging PostgreSQL to automate workflows and manage 3 TB datasets within an S3 Data Lake, minimizing processing latency by 15 hours and enhancing scalability for data operations.

●Revamped data pipelines by integrating Tableau, enabling daily processing of 100 GB of data, cutting query execution time by 10 minutes, and delivering near-real-time analytics for customer-facing dashboards, driving insights.

●Automated 4 mission-critical data pipelines with Hadoop, processing large-scale datasets while ensuring system uptime and reliability, eliminating resource allocation bottlenecks and boosting operational efficiency.

●Facilitated end-to-end development of 3 cloud-native data pipelines using Visual Studio, EMR, and AWS Lambda, shortening ETL cycle times by 6 hours and streamlining analytics output integration, empowering stakeholders.

●Implemented Snowflake-based data warehousing solutions, optimizing query performance and enabling seamless data integration across cloud platforms, improving analytics efficiency, and reducing storage costs.

Accenture Sep 2021 - Dec 2022

Data Engineer/Cloud Analyst India

●Architected a data pipeline to ingest and process semi-structured data using Python, crafting custom scripts to transform and cleanse records from 5 heterogeneous data sources, achieving a processing throughput of 5 minutes.

●Engineered real-time data streaming with Apache Kafka, configuring topics and partitions to sustain 50 messages per second, delivering ingestion-to-delivery latency under 800 milliseconds for seamless downstream consumption.

●Designed big data workflows in Databricks, leveraging Spark executors to process 1 terabyte of data daily, completing data wrangling and preparation in 4 hours and yielding 20,000 rows of high-quality, analysis-ready data.

●Built data solutions in PyCharm, developing and debugging 10,000 lines of Python code, integrating Git for version control to accelerate development cycles, shaving 20 hours off each sprint’s timeline.

●Deployed and managed Kubernetes clusters, orchestrating 4 applications across 4 nodes with 50 pods, ensuring uptime and auto-scaling to support 500 requests per minute during peak loads, with zero-downtime rolling updates.

●Spearheaded Agile methodology implementation, guiding a team of 6 through two-week sprints, resolving 100 tickets, and delivering 4 high-impact features on schedule, aligning with the needs of 3 key stakeholders.

●Implemented ETL workflows using Informatica, processing 1 gigabytes of data daily across 5 mappings, optimizing extract-transform-load cycles to complete in 3 hours and enabling reliable data access for 150 downstream users.

●Leveraged Databricks on Azure to optimize cloud data operations, processing 100 gigabytes of data across pipelines, achieving data transfers in 2 hours while supporting 200 concurrent users with reliable performance.

●Pioneered Power BI dashboards to track financial KPIs, designing 7 interactive visualizations with 5 drill-through layers, slashing financial review time from 2 hours to 60 minutes for fund managers, enhancing decision-making efficiency.

Capgemini Feb 2019 - Sep 2021

Data Engineer/Senior Analyst India

●Implemented ETL workflows using Apache Airflow, streamlining data ingestion from three application APIs and processing 1 terabyte of user-generated data to enhance pipeline reliability for critical software functionalities.

●Developed interactive dashboards using Seaborn for a software analytics module tailored to consulting needs, visualizing 200 usage records and empowering product managers to monitor and optimize feature development strategies.

●Facilitated Azure cloud infrastructure to support the software ecosystem, managing 2 terabytes of application logs and enabling three real-time monitoring tools, reducing manual configuration efforts by 100 hours annually.

●Enforced Cassandra as a backend database for a customer-facing application, storing 6 terabytes of user profile data across 5 nodes and supporting 10,000 queries per minute, ensuring seamless performance for a growing user base.

●Leveraged Teradata for business intelligence, optimizing queries across 8 terabytes of historical data and cutting report generation time by 15 seconds for 50 daily customer insights.

●Employed Azure Data Factory to orchestrate 7 pipelines for a cloud-based software solution, integrating 3 terabytes of telemetry data from 4 sources daily, enabling 6 analytics modules to deliver real-time performance metrics.

●Developed and managed ETL/data integration projects using Talend, transforming structured and unstructured data from multiple sources, automating cleansing workflows, and improving data quality by 20%, enhancing decision-making process.

●Incorporated Jupyter Notebooks to prototype and document data processing scripts for a software development team, analyzing datasets with 5000 rows each and collaborating with 4 engineers to refine algorithms improving functionality.

●Utilized Spark/PySpark for big data processing, implementing distributed data transformation pipelines that reduced batch processing time by 40% for 10 terabytes of raw event data, supporting real-time analytics for operational intelligence.

Adani Mar 2018 – Jan 2019

Data Analyst India

●Streamlined SQL-driven data workflows to strengthen internal reporting systems, cutting down manual efforts by 20 hours monthly while enhancing the efficiency of business intelligence operations across business units.

●Implemented automated CI/CD pipelines with AWS CodePipeline, Jenkins, and AWS CodeDeploy, enhancing deployment speed and operational agility.

●Managed Oracle database operations to streamline retrieval processes, overseeing 50 tables with 1,000 rows, and improving query time from 20 seconds to 5 seconds for 1,000 daily queries.

●Customized Excel-based tools for data transformation, automating repetitive tasks to save 6 hours of manual work per month, while ensuring consistent, accurate, and reliable data outputs.

●Conceptualized the design of advanced data storage and retrieval frameworks, enhancing query performance by reducing time by 10 seconds and improving accessibility of enterprise client data.

●Collaborated with cross-functional teams spanning varied sectors to revamp and modernize data pipelines and workflows, shortening query processing times for large-scale datasets by 25 hours, bolstering strategic planning.

●Implemented CI/CD pipeline automation using Jenkins, optimizing DevOps workflows for continuous integration and seamless deployment of data processes.

●Integrated Jenkins with infrastructure management tools, streamlining cloud-based deployments and automating job executions for ETL pipelines.

PROJECTS

End-to-End Cloud Data Pipeline for Real-Time Analytics

●Project Description: Designed and implemented a cloud-based end-to-end data pipeline using AWS Glue, S3, Lambda, and Redshift to process streaming data for real-time analytics. The pipeline ingested data from various sources, transformed it, and loaded it into Redshift for near real-time reporting.

●Impact: Reduced data latency by 50%, enabling stakeholders to access up-to-date insights and make data-driven decisions faster. The system improved operational efficiency, cutting report generation time from 2 hours to 15 minutes.

Automated ETL Pipeline with Apache Airflow and Snowflake

●Project Description: Automated the ETL pipeline for a large e-commerce company using Apache Airflow and Snowflake. The pipeline ingested data from product sales, customer interactions, and inventory databases, transforming it into structured datasets for analysis.

●Impact: Improved data processing efficiency by 70%, reducing manual ETL efforts from 15 hours to 3 hours per week. It enabled the company to produce daily business intelligence reports and provide insights into sales trends and inventory forecasts in real-time.

Contact this candidate