Ramya Sri Sonar
San Jose, CA 703-***-**** ***********@*****.*** www.linkedin.com/in/rssonar
Technical Skills
Big Data: Apache Spark, Apache Kafka, PySpark, Databricks, Hive, Hadoop, Snowflake Languages: Python, SQL
Cloud Services: Amazon Web Services (AWS) - S3, EC2, EMR, IAM roles and permissions, Redshift Miscellaneous: Dimensional Modeling, Data Warehousing, Data Governance, Apache Airflow, ETL, MongoDB, Linux, Jira, Git, Tableau, Visual Studio
Work Experience
Data Engineer Nike June 2021 – Present
• Operationalizing data for GSM (Global Sourcing and Manufacturing), a team focused on the management of the quality, sustainability, cost, and profitability of Nike’s factory sourcing activities
• Implemented end-to-end data pipelines with PySpark, HQL, Amazon Web Services (S3, EMR), Apache Airflow and Snowflake to source data for Tableau dashboards.
• Engineered Spark applications with optimal design principles, achieving a 30% increase in EMR cluster computing resource efficiency.
• Automated ETL workflows with Airflow to load scheduled batch and history data to Hive and Snowflake data warehouses, resulting in 70% reduction in manual interventions.
• Improved Snowflake views retrieval time for product quality pipeline from 4 hours to 2 minutes with Spark for downstream users.
• Proficient in exploiting JIRA, an agile project management tool, to streamline development processes and track project progress.
• Collaborated with cross-functional teams to understand data needs and deliver timely solutions to achieve product sustainability, quality, and cost.
• Performed data inquiries to resolve issues encountered within the ETL data pipelines
• Working on reading and writing multiple data file formats like JSON, CSV, Parquet using PySpark Environment: Python, AWS S3, EMR, Spark, PySpark, Apache Airflow, CI/CD, Azure Databricks, Snowflake, Teradata. Git, Hive, SQL, Tableau
Data Engineer Apprenticeship Accenture Federal Services January 2021 – May 2021
• Improved the efficiency of the planning and procurement process by leveraging Machine Learning, Deep learning, Big Data, and Amazon Web Services (AWS).
• Gathered and processed diverse data sets in formats such as CSV, XLSX, and TXT, totaling approximately 2.5 million records, collaborating closely with stakeholders to ensure data quality and accuracy.
• Conducted preprocessing of the dataset, involving the removal of special characters and digits, tokenization, correction of inconsistent spellings, lemmatization, stop word removal, and stemming.
• Attained 80% accuracy using LSTM Deep Learning to automate asset classification in Supply Chain procurement.
• Achieved 92.7% optimization, 32% classification performance boost by developing novel NLP algorithm for asset grouping.
• Created scalable and portable REST API and GUI using Python on AWS with Flask Framework. Environment: Python, AWS EMR, S3, Redshift, Machine learning models (K Nearest Neighbor,Random forest model), Deep Learning models (Recurrent neural network (RNN), Long short-term memory (LSTM)) Graduate Teaching Assistant George Mason University August 2020 – May 2021
• Delivered tutoring and teaching activities for graduate-level Big Data course - Hadoop, Spark, and data visualization tools.
• Tasks encompassed grading, mentoring, and supervising homework/assignments for 60+ students. Environment: Hadoop, Spark, Python, R, Tableau, AWS, SQL, NoSQL Education
George Mason University May 2021
Masters of Science, Data Analytics Engineering 3.7 Jawaharlal Nehru Institute of Technology May 2019
Bachelors of Technology, Computer Science Engineering 3.5