SUMMARY
Geetha Katragadda
Data Engineer
GA +1-816-***-**** *******************@*****.***
Data Engineer with 5+ years of experience in building and optimizing data pipelines, ensuring seamless data flow across enterprise systems. Expertise in handling large-scale data processing with Big Data technologies such as Apache Spark, PySpark, Hadoop, Kafka, Hive, etc. Skilled in designing robust ETL workflows using SSIS, Apache NiFi, Talend, and Informatica to streamline data extraction, transformation, and integration. Well-versed in cloud-based data solutions, leveraging AWS, Azure, DataBricks, and Snowflake for scalable and high- performance data architectures. Adept at database management with MySQL, PostgreSQL, SQL Server, Oracle, and MongoDB, ensuring data integrity and optimized query performance. Experienced in developing insightful visualizations and reports using Tableau, Power BI, and SSRS to drive business intelligence initiatives. Strong background in deploying containerized applications using Docker and Kubernetes, along with CI/CD pipelines to enhance automation and efficiency in data engineering processes. SKILLS
Methodologies: SDLC, Agile, Waterfall
Languages: Python, R, SQL, SAS, T-SQL
Big Data Ecosystem: Apache Spark, PySpark, Hadoop (HDFS, MapReduce), Kafka, Hive, Sqoop, Airflow, Apache Flink, Pig ETL Tools: SSIS, Apache NiFi, Apache Kafka, Talend, Apache Airflow, Informatica
IDEs: Visual Studio Code, PyCharm, Juypter Notebook Packages: NumPy, Pandas, Matplotlib, Seaborn, ggplot2, SciPy, Scikit Learn
EXPERIENCE
Reporting/Visualization Tools: Tableau, Power BI, SSRS, Excel Cloud Technologies: AWS (S3, EC2, RDS), Azure (Data Lake, Synapse Analytics, Data Factory, VMs), DataBricks, Snowflake DevOps Tools: Docker, Kubernetes, Jenkins, CI/CD Pipelines Databases: MySQL, PostgreSQL, SQL Server, Oracle, MongoDB, Azure SQL Database
Version Control Tools: Git, GitHub, GitLab
Operating Systems: Windows, Linux, Mac
Pfizer, USA Jan 2024 - Present Data Engineer
• Established and fine-tuned ETL processes utilizing Apache Kafka, Azure Data Lake, and Synapse Analytics, enabling seamless real-time data streaming and integration for petabyte-scale datasets.
• Leveraged Apache Spark and Python to perform high-speed data processing on structured and unstructured data, reducing computation time by 40% and optimizing analytics workflows.
• Designed Power BI dashboards, translating complex datasets into visually intuitive insights that empowered stakeholders with data-driven decision-making.
• Administered Azure SQL Database and MongoDB environments, ensuring high availability, secure data management, and superior query performance.
• Cooperated with DevOps teams to implement CI/CD automation with Azure DevOps, streamlining deployment processes and enhancing system stability.
Metasystems, India Jan 2020 - Jul 2022 Data Engineer
• Developed and optimized high-performance data pipelines leveraging Apache Spark and Hadoop, enabling the efficient processing of massive datasets, reducing latency by 40%.
• Designed and implemented cloud-based data storage solutions using AWS S3 and RDS, enhancing data accessibility and retrieval speed for analytics teams.
• Migrated legacy on-premises data warehouses to Snowflake, optimizing query execution speeds by 50% and ensuring seamless scalability for growing data demands.
• Created dynamic and interactive visual reports in Amazon QuickSight, translating raw data into actionable insights that guided strategic decision-making.
• Engineered NoSQL database structures in MongoDB to support real-time applications, improving system response times and data consistency.
• Combined with data scientists, analysts, and business teams to develop scalable, analytics-driven solutions that aligned with organizational goals.
Mindtree, India Jun 2018 - Dec 2019 Data Engineer
• Deployed scalable data pipelines using Apache NiFi and Azure Data Factory, enhancing data ingestion and data transformation efficiency across diverse enterprise systems.
• Orchestrated ETL workflows with Apache Airflow and Talend, streamlining data integration processes and improving pipeline reliability across multiple business units.
• Automated deployment and monitoring using Jenkins and Terraform, enhancing CI/CD workflows for continuous data pipeline updates with minimal disruptions.
• Engineered high-performance database solutions in MySQL, optimizing data storage and retrieval for enterprise applications. EDUCATION
Master of Science in Computer Science May 2024
Wichita State University, Wichita, KS
Bachelor of Science in Computer Science May 2017
R.V.R. & J.C. College of Engineering, Chowdavaram, India