Data Engineer Web Services

Location:

San Jose, CA

Posted:

June 07, 2024

Contact this candidate

Resume:

Ramya Sri Sonar

San Jose, CA 703-***-**** ***********@*****.*** www.linkedin.com/in/rssonar

Technical Skills

Big Data: Apache Spark, Apache Kafka, PySpark, Databricks, Hive, Hadoop, Snowflake Languages: Python, SQL

Cloud Services: Amazon Web Services (AWS) - S3, EC2, EMR, IAM roles and permissions, Redshift Miscellaneous: Dimensional Modeling, Data Warehousing, Data Governance, Apache Airflow, ETL, MongoDB, Linux, Jira, Git, Tableau, Visual Studio

Work Experience

Data Engineer Nike June 2021 – Present

• Operationalizing data for GSM (Global Sourcing and Manufacturing), a team focused on the management of the quality, sustainability, cost, and profitability of Nike’s factory sourcing activities

• Implemented end-to-end data pipelines with PySpark, HQL, Amazon Web Services (S3, EMR), Apache Airflow and Snowflake to source data for Tableau dashboards.

• Engineered Spark applications with optimal design principles, achieving a 30% increase in EMR cluster computing resource efficiency.

• Automated ETL workflows with Airflow to load scheduled batch and history data to Hive and Snowflake data warehouses, resulting in 70% reduction in manual interventions.

• Improved Snowflake views retrieval time for product quality pipeline from 4 hours to 2 minutes with Spark for downstream users.

• Proficient in exploiting JIRA, an agile project management tool, to streamline development processes and track project progress.

• Collaborated with cross-functional teams to understand data needs and deliver timely solutions to achieve product sustainability, quality, and cost.

• Performed data inquiries to resolve issues encountered within the ETL data pipelines

• Working on reading and writing multiple data file formats like JSON, CSV, Parquet using PySpark Environment: Python, AWS S3, EMR, Spark, PySpark, Apache Airflow, CI/CD, Azure Databricks, Snowflake, Teradata. Git, Hive, SQL, Tableau

Data Engineer Apprenticeship Accenture Federal Services January 2021 – May 2021

• Improved the efficiency of the planning and procurement process by leveraging Machine Learning, Deep learning, Big Data, and Amazon Web Services (AWS).

• Gathered and processed diverse data sets in formats such as CSV, XLSX, and TXT, totaling approximately 2.5 million records, collaborating closely with stakeholders to ensure data quality and accuracy.

• Conducted preprocessing of the dataset, involving the removal of special characters and digits, tokenization, correction of inconsistent spellings, lemmatization, stop word removal, and stemming.

• Attained 80% accuracy using LSTM Deep Learning to automate asset classification in Supply Chain procurement.

• Achieved 92.7% optimization, 32% classification performance boost by developing novel NLP algorithm for asset grouping.

• Created scalable and portable REST API and GUI using Python on AWS with Flask Framework. Environment: Python, AWS EMR, S3, Redshift, Machine learning models (K Nearest Neighbor,Random forest model), Deep Learning models (Recurrent neural network (RNN), Long short-term memory (LSTM)) Graduate Teaching Assistant George Mason University August 2020 – May 2021

• Delivered tutoring and teaching activities for graduate-level Big Data course - Hadoop, Spark, and data visualization tools.

• Tasks encompassed grading, mentoring, and supervising homework/assignments for 60+ students. Environment: Hadoop, Spark, Python, R, Tableau, AWS, SQL, NoSQL Education

George Mason University May 2021

Masters of Science, Data Analytics Engineering 3.7 Jawaharlal Nehru Institute of Technology May 2019

Bachelors of Technology, Computer Science Engineering 3.5

Contact this candidate