Revanth Pavan Nimmala
703-***-**** ***************@*****.*** LinkedIn
Professional Summary
●6+ years designing, optimizing, and deploying large-scale data pipelines and analytics solutions across AWS and GCP.
●Deep expertise in scalable ETL workflows, real-time and batch processing, and distributed data architectures.
●Proven record of optimizing pipelines and queries for performance and cost, leveraging advanced partitioning, caching, and resource management.
●Proficient in orchestrating and building scalable data pipelines using Apache Spark, PySpark, Airflow, and Hadoop
●Led productionization of batch and streaming data pipelines, implementing failover strategies and performance tuning to maintain 99.99% uptime
●Strong background in data analysis, reporting, and translating business requirements into insights using SQL, Tableau, Power BI, and QuickSight.
●Adept at translating business requirements into actionable data solutions and delivering insights for strategic decisions.
●Strong expertise in PL/SQL development for writing stored procedures, triggers, and complex queries in Oracle databases.
●Collaborative leader with experience in Agile teams, stakeholder engagement, and cross-functional project delivery.
Education
George Mason University Fairfax, VA
Masters of Science, Data Analytics Engineering; 3.93/4.0 Jan 2019 – Dec 2020 Coursework: Predictive Analytics, Natural Language Processing, Geo-Social Data Analytics, Big Data Essentials: Hadoop and Spark Framework, Advance Health Data Mining, Applied Statistics and Visualizations.
Gitam University Vizag, India
Bachelors of Technology, Computer Science; 3.7/4.0 Aug 2014 – May 2018
Experience
BP August 2021 – Present
Data Engineer/Analyst VA
●Architected data lake solutions on AWS S3 and Redshift Spectrum to unify structured, semi-structured, and unstructured sensor data across business units.
●Migrated data assets and pipelines from Palantir Foundry into AWS Redshift, optimizing query performance through the use of distribution keys, sort keys, and columnar storage.
●Optimized Spark and Redshift jobs using partitioning, bucketing, and caching to cut batch runtime by 45% and enable near real-time field analytics.
●Engineered reusable PySpark AWS Glue ETL pipelines using Glue Data Catalog for centralized metadata, processing 30 TB/day, delivering 500 GB–1 TB curated datasets, and cutting onboarding time by 40%.
●Designed serverless real-time ingestion with Kinesis, AWS Lambda triggers, and Glue crawlers—slashing field-monitoring latency from 30 minutes to under 2 minutes.
●Engineered modular SQL models for both ad-hoc and scheduled analytics, improving maintainability and accelerating insights.
●Developed data quality scorecards and visualizations to highlight pipeline accuracy and business metric alignment, increasing stakeholder confidence in analytics outputs.
●Built end-to-end monitoring frameworks, including QuickSight dashboards, CloudWatch alerts, and Great Expectations/Glue DataBrew checks in CI/CD, improving data trust and cutting incident response times by 60%.
●Performed in-depth SQL analysis on Redshift and Snowflake datasets to identify trends in asset utilization and operational efficiency, influencing leadership decisions.
●Partnered with finance and operations teams to gather requirements and translate them into BI dashboards (QuickSight, Tableau), enabling real-time KPI tracking and reducing manual reporting by 50%.
●Collaborated with business teams to define KPIs and build QuickSight dashboards, enabling leadership to track asset utilization and field performance in near real-time.
●Presented analytical findings to leadership teams, helping drive data-driven decisions on field operations, cost management, and system performance.
●Streamlined ETL for high-volume financial data using Glue, PySpark, and Redshift Spectrum, improving efficiency by 40%.
●Led the implementation of Master Data Management (MDM) solutions using AWS Glue and DynamoDB, ensuring data standardization and deduplication across multiple business units.
●Implemented adaptive job scheduling in Airflow, dynamically scaling resources based on workload and business priorities.
●Collaborated with product managers and analysts to design SQL models supporting new reporting requirements, reducing turnaround time for insights from days to hours
Collaborated within an agile methodology, understanding user story requirements, and delivering solutions on time.
George Mason University Sept 2019 – July 2021
Data Analyst Fairfax, VA
●Developed reusable SQL models for ad-hoc and scheduled analytics, improving query maintainability and reducing time-to-insight for research teams.
●Tuned data storage formats (Parquet, ORC) and compression strategies, reducing storage costs by 40% and improving downstream query performance.
●Implemented automated data archival and retention policies, ensuring compliance and optimizing storage usage.
●Used CloudFormation to automate infrastructure setup for analytics environments, enabling rapid provisioning and consistent environments.
●Conducted statistical analysis and predictive modeling on research datasets, uncovering key insights that shaped academic grant proposals and publications.
●Built interactive Tableau dashboards to visualize student and faculty research KPIs, improving decision-making for project funding and resource allocation.
●Led requirements gathering with academic leads to define research KPIs and built Tableau dashboards, driving data-driven decision-making for grant proposals and project prioritization.
Allywn Corporation August 2020 – December 2020
Data Engineer Apprenticeship Fairfax, VA
●Built data ingestion pipelines integrating GCP BigQuery, AWS S3, and on-prem sources, supporting 10TB+ daily data transfer for analytics and ML workloads.
●Designed and implemented event-driven ETL pipelines using Apache Beam and Dataflow, enabling scalable, fault-tolerant processing of streaming and batch data.
●Developed a data lakehouse architecture using Delta Lake, supporting ACID transactions and time travel for critical datasets.
●Partnered with product managers to design data models supporting new business features, and built Power BI dashboards for executive reporting.
L&T Infotech April 2017 – December 2018
Data Analyst INDIA
●Led migration of 500GB+ legacy datasets to cloud storage, designing robust data validation and reconciliation checks to ensure zero data loss.
●Implemented materialized views and index tuning in Oracle and PostgreSQL, reducing report generation time by 60%.
●Established automated job monitoring and alerting for ETL failures, improving SLA adherence and reducing downtime.
●Built a custom data lineage tracker in Python to trace data flow across multiple systems, improving auditability and troubleshooting.
●Gathered requirements from business users to design interactive Tableau dashboards, enabling real-time insights for 200+ stakeholders
●Conducted data mapping and gap analysis for new business processes, ensuring accurate integration into the enterprise data warehouse.
●Coordinated release management for ETL code, using Git and Jenkins to support multi-environment deployments and rollback.
Technical Skills
Programming: Python(Pandas, NumPy), SQL, PL/SQL, Shell Scripting, Java, Excel (Pivot Tables, VLOOKUPs)
Big Data: Apache Spark, PySpark, Hadoop, Hive, HBase, MapReduce, Delta Lake
Cloud: AWS (Glue, Redshift, S3, Lambda, EMR, DynamoDB, Athena), Azure (Data Lake, Synapse, Data Factory, Databricks), GCP (BigQuery, Dataproc, GCS, Airflow, Dataflow)
Databases: PostgreSQL, MySQL, MongoDB, Snowflake, Redshift, Oracle.
Orchestration & CI/CD: Airflow, Jenkins, Git, Docker, Kubernetes, Terraform, CloudFormation
Reporting/BI: Tableau, Power BI, QuickSight
Other: Data Lake Architecture, Data Modeling, Data Governance, Data Quality, Statistical Analysis, A/B Testing, Production Monitoring, Agile, TDD.
Operating Systems: UNIX, Linux, Windows.