VYSHNAVI GANNAMANENI
Harrison, NJ 913-***-**** ************@*****.*** LinkedIn
EXPERIENCE SUMMARY
• Around 5 years of experience in data engineering and analysis, specializing in building scalable ETL pipelines, data integration, and transformation using tools like Spark, AWS Glue, and Python.
• Engineered end-to-end data solutions on Google Cloud Platform (GCP) and AWS, leveraging tools like BigQuery, DataProc, GCS, S3, Glue, EMR, Athena, Redshift for optimized storage, processing, and analytics.
• Automated workflows and processes using Apache Airflow, Jenkins, and scripting languages such as Python and Shell, ensuring efficiency and reducing manual intervention.
• Streamlined real-time data ingestion and processing with Kafka, enhancing system reliability and scalability for high
-velocity data streams.
• Designed and implemented scalable ETL pipelines using Spark, AWS Glue, and Python, driving efficient data integration and transformation processes.
• Conducted advanced data analysis and visualization using NumPy, Pandas and Power BI, driving actionable insights and improving decision-making processes.
• Designed and maintained databases like MySQL and SQL Server, optimizing queries to ensure high performance and reliability.
TECHNICAL STACK
Programming Languages Python, SQL, PySpark, R, C++ Frameworks & Libraries NumPy, Pandas, Matplotlib, SciPy Cloud Platforms AWS (S3, Glue EMR, Lambda), GCP
Data Visualization Tools Tableau, Power BI, Microsoft Excel Databases MySQL, SQL Server
Development Tools & IDEs Visual Studio Code, PyCharm, IntelliJ IDEA Methodologies SDLC, Agile, Waterfall
Operating System Windows, Linux
EDUCATION
Master of Science in Computer Science
University of Central Missouri, Missouri May 2023 CERTIFICATION
Amazon Web Services Solutions Architect – Associate Issued: July 2021 PROFESSIONAL EXPERIENCE
Data Engineer Walmart Hoboken NJ August 2024 – Present
• Designed and deployed daily triggered pipelines to process and transform data for machine learning models, ensuring timely and reliable data availability.
• Developed and orchestrated DataProc jobs for large-scale data processing and transformation tasks in GCP.
• Leveraged Google Cloud Storage (GCS) for efficient storage and management of input and output datasets used in the pipeline.
• Utilized BigQuery for data warehousing and analytics, optimizing queries to enhance performance and minimize costs.
• Automated workflow scheduling and monitoring using Airflow, ensuring the smooth execution of complex multi-step pipelines.
• Designed, developed, and optimized scalable data pipelines using Python and Spark/PySpark for large datasets.
• Implemented and maintained data solutions on Google Cloud Platform (GCP) for storage, processing, and analytics.
• Built and managed real-time data processing systems with messaging technologies such as Kafka.
• Optimized SQL queries for data analysis and database management.
• Managed code versioning with Git and implement CI/CD pipelines.
• Conducted integration testing to ensure the reliability and accuracy of data pipelines.
• Collaborated in Agile ceremonies including sprint planning, backlog grooming, and daily stand-ups.
• Collaborated with data science teams to support machine learning model pipelines and deployment. Data Engineer HealthFirst New York Jan 2023 – July 2024
• Crafted and implemented ETL pipelines for seamless data migration across a variety of data warehouses, focusing on efficiency and reliability.
• Employed Spark and AWS Glue to develop robust ETL processes, ensuring the smooth transfer of data. Executed data transformations using Postgres to manage different data loads, including incremental, historical, and change data capture (CDC). Proficiently handled large datasets, encompassing diverse data types.
• Key Accomplishments:
• Successfully orchestrated the data flow for Eyemed claims, utilizing PySpark and PostgreSQL for intricate transformations and diverse data loads. The resulting system is currently operational and actively serving numerous stakeholders.
• Spearheaded the implementation of vital components, collaborating with diverse teams and testing personnel. This initiative led to the creation of a streamlined system, significantly simplifying processes for a broad user base. Conducted multiple demonstrations and provided comprehensive documentation, contributing to enhanced operational efficiency.
Data Analyst Deloitte India Nov 2020 - Dec 2021
• Led the collection and aggregation of data from various sources, ensuring a comprehensive and diverse dataset.
• Played a key role in refining and processing data, contributing to the overall quality and accuracy of analytical outputs.
• Developed and implemented Python scripts to automate repetitive data cleaning tasks, reducing manual effort and potential errors.
• Collaborated with cross-functional teams to understand data requirements and optimize the data transformation process.
• Actively contributed to the creation of efficient workflows, resulting in streamlined data processing and analysis procedures.
• Maintained a keen focus on data quality assurance, implementing strategies to identify and address anomalies in the datasets.
• Regularly assessed and improved data processing pipelines to adapt to evolving business needs and technological advancements.
• Provided documentation and training on the developed scripts, empowering team members to utilize automated data processes effectively.
Data Analyst Sukshi Technologies India May 2019 - Oct 2020
• Developed business process models in Waterfall to document existing and future business processes.
• Used Excel in setting up pivot tables to create various reports using a set of data from an SQL query.
• Employed advanced packages like NumPy, Pandas, and Matplotlib to drive data analysis, leading to a 15% increase in actionable insights.
• Leveraged SciPy to perform advanced statistical analysis on a large dataset, identifying anomalies and outliers, leading to a 10% improvement in data accuracy and quality.
• Utilized Power BI to create dynamic reports and visualizations, leading to a 15% increase in data-driven insights shared with the executive team.
• Performed data analysis and data profiling using complex SQL queries on various source systems including SQL Server.
• Worked with Data Cleaning, Wrangling, Data Analytics, Integration, Critical Thinking, Problem-Solving, Communication, and Presentation Skills