Data Engineer Analytics

Location:

United States

Salary:

70000

Posted:

February 20, 2025

Contact this candidate

Resume:

TX +1-562-***-****

PROFESSIONAL SUMMARY

Geethika Penukula

********.*@*********.***

Experienced Data Engineer specializing in cloud data engineering and big data analytics. Proven track record in designing robust data pipelines using Spark SQL and PySpark across AWS and Azure platforms. Successfully optimized DBT projects in Snowflake, enhancing performance and reducing costs. Eager to leverage expertise to drive data-driven solutions and insights in a dynamic environment. WORK EXPERIENCE

Goldman Sachs Group Data Engineer Jan 2024 - Present

• Implemented the integration of Apache Airflow with AWS to orchestrate multi-stage ML workflows, leveraging Amazon SageMaker resource allocation with a 20% reduction in overall execution time.

• Architected and deployed a scalable DynamoDB data store optimized to enhance system performance under high traffic loads, resulting in improved user responsiveness and system reliability

• Developed interactive Tableau and Amazon QuickSight dashboards for trend analysis and anomaly detection while utilizing SSRS, leading to a 30% increase in operational efficiency.

• Executed DBML models and macros with a focus on incremental model design, modularization, and performance optimization to enhance data pipeline efficiency, leading to a 20% improvement in overall performance.

• Accelerated Snowflake's cloud-based architecture to facilitate seamless collaboration between data analysts and data scientists, resulted in a 30% increase in data-driven project completion rate.

• Developed Infrastructure as Code (IaC) using Terraform and Git Actions, automating infrastructure provisioning and management, resulting in a 40% improvement in deployment efficiency

• Enhanced SQL queries in Impala by implementing indexing, partitioning, and query tuning techniques, leading to a 30% faster data retrieval for large-scale datasets

Sigma Tech Solution Data Engineer Jul 2020 - Jul 2022

• Employed real-time data streaming solutions using Apache Kafka, optimizing data ingestion and processing for low-latency performance, 40% reduction in data processing time.

• Enhanced Hive queries using partitioning and bucketing strategies, enhancing query performance by 25% for efficient processing of large datasets processing systems scalability.

• Engineered ETL pipelines using Azure Data Factory, optimizing data ingestion, transformation, and loading processes, resulting in a 30% reduction in data processing time

• Established interactive dashboards and reports using Power BI, enabling real-time data visualization and insights for stakeholders and reporting workflows, improving efficiency by 35%.

• Improved scalable data pipelines on Databricks using PySpark, efficiently ingesting terabytes of data daily from diverse sources, including relational databases and APIs, resulting in a 40% reduction in execution time

• Designed and automated ETL workflows using Python, streamlining data integration from multiple sources and reducing manual effort by 40%, ensuring data quality and consistency

• Utilized Postman for comprehensive API testing, validating functionality, performance, and security during development and integration, leading to a 30% improvement in API reliability and efficiency

• Leveraged Snowflake's User-Defined Functions (UDFs) and stored procedures to perform complex data transformations, resulting in a 40% improvement in processing efficiency

• Configured and managed SSIS packages for seamless data migration, transformation, and loading between data sources and SQL Server, resulting in a 25% improvement in data processing efficiency SKILLS

• Language: Scala, Python, SQL

• Big Data Ecosystem: Hadoop, Hive, Impala, HDFS, HBase, Apache Airflow, Apache Kafka, Apache Spark, Flink, Data Bricks

• Cloud: AWS (EMR, EC2, S3, Athena, Glue, Elasticsearch, Lambda, DynamoDB, Redshift, QuickSight), Azure (Data Lake, Databricks, Data Storage, Data Factory, Azure App Service, Azure SQL Database, Azure Blob Storage)

• Packages & Tools: NumPy, Pandas, Matplotlib, Scikit-learn, Seaborn, TensorFlow, PySpark, Data Build Tool

• Visualization Tools: Snowflake, Tableau, Power BI, SSIS, SSRS

• Database & Other Skills: Git, MS SQL Server, PostgreSQL, MongoDB, MySQL EDUCATION

California State University, Long Beach, CA Master of Science, Computer Science CA, USA 2022 - 2024 CMR Institute of technology-Hyderabad, India Bachelor, Computer Science 2018 - 2022

Contact this candidate