RAVALIKA MARAPALLY
Data Engineer
*************@*****.*** 346-***-**** Hayward, CA-94544, USA LinkedIn
SUMMARY
●Highly skilled Data Engineer with over 5+ years of professional experience designing and developing scalable data infrastructure for storage, transformation, and analytics, including datasets exceeding 5 TB.
●Knowledgeable in relational and NoSQL database systems, including PostgreSQL, MySQL, MongoDB, Oracle, and Teradata, with a focus on performance tuning and governance.
●Proficient in architecting cloud-native solutions on Azure, AWS, leveraging services like Databricks, Data Factory, EC2, S3, Lambda, Glue, Redshift, and BigQuery to drive cost-efficient, enterprise-grade data ecosystems.
●Streamlined deployment workflows by containerizing applications with Docker/Kubernetes and automating CI/CD pipelines (Kubernetes, GitLab), decreasing deployment cycles by 30 minutes per release.
TECHNICAL SKILLS
Programming Languages:
Python, SQL, R
DevOps Tools:
Docker, Kubernetes, Jenkins
ETL Tools:
Informatica, Talend, Apache Airflow
Data Visualization:
Tableau, Power BI, Matplotlib, Seaborn, Plotly
Cloud Technologies:
Azure (Blob Storage), AWS (EC2, S3, Lambda, Glue, Redshift)
Databases:
Snowflake, Oracle, MS-Access, MySQL, Teradata, PostgreSQL, MongoDB, NoSQL
Big Data Technologies:
Hadoop, Spark, HDFS, MapReduce, Cassandra, Impala, Kafka
Data Engineering Tools:
Azure Databricks, PySpark, Data Lake Storage, Azure Data Factory, Synapse Analytics
Methodologies:
Agile, Scrum, Waterfall
IDEs:
PyCharm, Jupyter Notebook, Visual Studio
Version Control:
Git, GitHub, GitLab, CI/CD workflows
EDUCATION
University of Central Missouri Jan 2023 - May 2024
Master of Science in Computer Science MO, USA
SR University Jun 2014 - May 2018
Bachelor of Technology in Computer Science and Engineering Warangal, TS, India
PROFESSIONAL EXPERIENCE
McKinsey & Company Oct 2024 - Current
Data Engineer MI, USA
●Designed batch processing applications leveraging PostgreSQL to automate workflows and manage 3 TB datasets within an S3 Data Lake, minimizing processing latency by 15 hours and enhancing scalability for data operations.
●Revamped data pipelines by integrating Tableau, enabling daily processing of 100 GB of data, cutting query execution time by 10 minutes, and delivering near-real-time analytics for customer-facing dashboards, driving insights.
●Automated 4 mission-critical data pipelines with Hadoop, processing large-scale datasets while ensuring system uptime and reliability, eliminating resource allocation bottlenecks and boosting operational efficiency.
●Facilitated end-to-end development of 3 cloud-native data pipelines using Visual Studio, EMR, and AWS Lambda, shortening ETL cycle times by 6 hours and streamlining analytics output integration, empowering stakeholders.
●Implemented Snowflake-based data warehousing solutions, optimizing query performance and enabling seamless data integration across cloud platforms, improving analytics efficiency, and reducing storage costs.
Accenture Sep 2021 - Dec 2022
Data Engineer/Cloud Analyst India
●Architected a data pipeline to ingest and process semi-structured data using Python, crafting custom scripts to transform and cleanse records from 5 heterogeneous data sources, achieving a processing throughput of 5 minutes.
●Engineered real-time data streaming with Apache Kafka, configuring topics and partitions to sustain 50 messages per second, delivering ingestion-to-delivery latency under 800 milliseconds for seamless downstream consumption.
●Designed big data workflows in Databricks, leveraging Spark executors to process 1 terabyte of data daily, completing data wrangling and preparation in 4 hours and yielding 20,000 rows of high-quality, analysis-ready data.
●Built data solutions in PyCharm, developing and debugging 10,000 lines of Python code, integrating Git for version control to accelerate development cycles, shaving 20 hours off each sprint’s timeline.
●Deployed and managed Kubernetes clusters, orchestrating 4 applications across 4 nodes with 50 pods, ensuring uptime and auto-scaling to support 500 requests per minute during peak loads, with zero-downtime rolling updates.
●Spearheaded Agile methodology implementation, guiding a team of 6 through two-week sprints, resolving 100 tickets, and delivering 4 high-impact features on schedule, aligning with the needs of 3 key stakeholders.
●Implemented ETL workflows using Informatica, processing 1 gigabytes of data daily across 5 mappings, optimizing extract-transform-load cycles to complete in 3 hours and enabling reliable data access for 150 downstream users.
●Leveraged Databricks on Azure to optimize cloud data operations, processing 100 gigabytes of data across pipelines, achieving data transfers in 2 hours while supporting 200 concurrent users with reliable performance.
●Pioneered Power BI dashboards to track financial KPIs, designing 7 interactive visualizations with 5 drill-through layers, slashing financial review time from 2 hours to 60 minutes for fund managers, enhancing decision-making efficiency.
Capgemini Feb 2019 - Sep 2021
Data Engineer/Senior Analyst India
●Implemented ETL workflows using Apache Airflow, streamlining data ingestion from three application APIs and processing 1 terabyte of user-generated data to enhance pipeline reliability for critical software functionalities.
●Developed interactive dashboards using Seaborn for a software analytics module tailored to consulting needs, visualizing 200 usage records and empowering product managers to monitor and optimize feature development strategies.
●Facilitated Azure cloud infrastructure to support the software ecosystem, managing 2 terabytes of application logs and enabling three real-time monitoring tools, reducing manual configuration efforts by 100 hours annually.
●Enforced Cassandra as a backend database for a customer-facing application, storing 6 terabytes of user profile data across 5 nodes and supporting 10,000 queries per minute, ensuring seamless performance for a growing user base.
●Leveraged Teradata for business intelligence, optimizing queries across 8 terabytes of historical data and cutting report generation time by 15 seconds for 50 daily customer insights.
●Employed Azure Data Factory to orchestrate 7 pipelines for a cloud-based software solution, integrating 3 terabytes of telemetry data from 4 sources daily, enabling 6 analytics modules to deliver real-time performance metrics.
●Developed and managed ETL/data integration projects using Talend, transforming structured and unstructured data from multiple sources, automating cleansing workflows, and improving data quality by 20%, enhancing decision-making process.
●Incorporated Jupyter Notebooks to prototype and document data processing scripts for a software development team, analyzing datasets with 5000 rows each and collaborating with 4 engineers to refine algorithms improving functionality.
●Utilized Spark/PySpark for big data processing, implementing distributed data transformation pipelines that reduced batch processing time by 40% for 10 terabytes of raw event data, supporting real-time analytics for operational intelligence.
Adani Mar 2018 – Jan 2019
Data Analyst India
●Streamlined SQL-driven data workflows to strengthen internal reporting systems, cutting down manual efforts by 20 hours monthly while enhancing the efficiency of business intelligence operations across business units.
●Implemented automated CI/CD pipelines with AWS CodePipeline, Jenkins, and AWS CodeDeploy, enhancing deployment speed and operational agility.
●Managed Oracle database operations to streamline retrieval processes, overseeing 50 tables with 1,000 rows, and improving query time from 20 seconds to 5 seconds for 1,000 daily queries.
●Customized Excel-based tools for data transformation, automating repetitive tasks to save 6 hours of manual work per month, while ensuring consistent, accurate, and reliable data outputs.
●Conceptualized the design of advanced data storage and retrieval frameworks, enhancing query performance by reducing time by 10 seconds and improving accessibility of enterprise client data.
●Collaborated with cross-functional teams spanning varied sectors to revamp and modernize data pipelines and workflows, shortening query processing times for large-scale datasets by 25 hours, bolstering strategic planning.
●Implemented CI/CD pipeline automation using Jenkins, optimizing DevOps workflows for continuous integration and seamless deployment of data processes.
●Integrated Jenkins with infrastructure management tools, streamlining cloud-based deployments and automating job executions for ETL pipelines.
PROJECTS
End-to-End Cloud Data Pipeline for Real-Time Analytics
●Project Description: Designed and implemented a cloud-based end-to-end data pipeline using AWS Glue, S3, Lambda, and Redshift to process streaming data for real-time analytics. The pipeline ingested data from various sources, transformed it, and loaded it into Redshift for near real-time reporting.
●Impact: Reduced data latency by 50%, enabling stakeholders to access up-to-date insights and make data-driven decisions faster. The system improved operational efficiency, cutting report generation time from 2 hours to 15 minutes.
Automated ETL Pipeline with Apache Airflow and Snowflake
●Project Description: Automated the ETL pipeline for a large e-commerce company using Apache Airflow and Snowflake. The pipeline ingested data from product sales, customer interactions, and inventory databases, transforming it into structured datasets for analysis.
●Impact: Improved data processing efficiency by 70%, reducing manual ETL efforts from 15 hours to 3 hours per week. It enabled the company to produce daily business intelligence reports and provide insights into sales trends and inventory forecasts in real-time.