Data Engineer Integration

Location:

Austin, TX

Posted:

April 18, 2024

Contact this candidate

Resume:

SPOORTHI VAKKAVANTHULA

Austin, TX ***** linkedin *********************@*****.*** 409-***-**** Professional Summary

• 5+ years of expertise as a Big Data Engineer, specializing in AWS and Azure cloud technologies for enterprise-wide ETL pipelines using Python, PySpark, Scala, and SQL.

• Proficient in the Hadoop Ecosystem (HDFS, MapReduce, YARN, Hive, HBase, Spark), with experience in cluster management, configuration, and optimization.

• Demonstrated a service-oriented approach by promptly responding to internal and external ad hoc requests for data, ensuring timely delivery of accurate and relevant information.

• Proficient in Agile/Scrum methodologies, fostering positive relationships and facilitating productive collaboration.

• Skilled in SQL, Python, graph-based solutions, Linux command line tools, and UNIX commands.

• Implemented business process improvement strategies, identifying bottlenecks, and introducing streamlined solutions.

• Familiarity with data integration tools, including Apache Kafka and Apache NiFi.

• Advanced proficiency in R for data analysis, modeling, and visualization, delivering actionable insights.

• Engineered scalable ETL and ELT workflows using Spark, Databricks, Hadoop, Hive, AWS EMR, and Kafka.

• Established and maintained interfaces between data resources and analytics software applications, facilitating seamless data integration and analysis.

• Collaborated closely with Product and Engineering teams to integrate AI technologies into product development processes.

• Designed, developed, and maintained complex ETL processes using SQL Server Integration Services (SSIS).

• Experience in understanding Hadoop multiple data processing engines to handle data stored in a single platform in Yarn.

• Proficient in identifying and addressing inefficiencies in data processing workflows.

• Cloud platform experience in AWS and Azure, with a good understanding of NoSQL databases such as HBase and MongoDB.

• Experience in data analysis, cleansing, validation, conversion, migration, and mining. Skills:

Programming: C, C++, Python, JAVA, Basics of Kotlin, Data Structure, Unix, OOP, Spring Boot, Pyspark, Hadoop Databases: MySQL, NoSQL, Relational database, Snowflake Web Technologies: HTML, CSS, JavaScript, jQuery, AngularJS, Rest API Tools: GitHub, JIRA, Bitbucket, Putty, WinSCP, Postman, Insomnia Data Modeling: dbt (data build tool), SQL Cloud Technologies: AWS ETL Tools: Talend, Informatica IDEs: Visual Studio, IntelliJ. EDUCATION

Lamar University Beaumont, TX MS in Computer Science Graduation Date: August 2022 Relevant Coursework: Data Visualization, Data Mining in Engineering, Data Management, database design, Cloud computing, Cyber Security, Analysis of Algorithms, Advance Operating systems, Cloud Computing Jawaharlal Nehru Technology University, Hyderabad Hyderabad, India B.Tech in Computer Science Graduation Date: May 2019 Relevant Coursework: Data Structures, Software Engineering, Operating Systems, Database Management Systems, Java Programming, Compiler Design, Computer Organization, Datawarehouse &Data mining EXPERIENCE

Data Engineer

APPLE (Austin, TX) July 2023 – December 2023

• Developed Python scripts for secure SFTP connections and implemented modular loaders for diverse data integration needs.

• Enhanced security measures with encryption and decryption techniques, ensuring compatibility with PMWeb's security protocols.

• Optimized query performance and reduced storage costs in Amazon Redshift using partitioning and compression techniques.

• Orchestrated deployment and maintenance of Hadoop clusters, integrating seamlessly with Snowflake for scalable data storage and analytics.

• Automated data loading processes and improved performance by 25%, ensuring adherence to Snowflake best practices.

• Executed complex SQL queries for ETL processes between Hadoop and Snowflake, maintaining data quality and consistency.

• Established comprehensive documentation and utilized dbt models for consistency and transparency. AMAZON (Austin, TX)

Software Dev Engineer October 2022 – April 2023

Worked as an SDE on building Distribution as a service and created a new workflow in the existing system to support the external orders. I was in the Multi-Channel Replenishment team (Amazon Warehouse Distribution) where we are responsible for building a new service for Amazon that allows customers to use the Amazon fulfillment network to store and distribute their own products, whether they choose to sel l on Amazon or externally.

• Worked on debugging and refactoring the existing system to improve the success rate.

• Improved matcher implementation in the Integration test suite.

• Integrated Big Data technologies, including Hadoop and Apache Spark, into software applications using Java and Python for efficient data processing.

• Planed and coordinated data migrations between systems.

• Wrote Rest APIs in Java 8 and worked on Junit test cases for the APIs and business logic.

• Improved the code coverage by 15% by adding unit tests and wrote integration tests also worked on the matcher implementation in the integration test suite.

• Implemented shell scripting (Bash) for automation, enhancing operational efficiency and minimizing manual intervention.

• Utilized HDFS (Hadoop Distributed File System) for efficient storage and retrieval of large-scale datasets, improving data availability for software applications.

• Worked on large sets of structured, semi-structured and unstructured data.

• Developed data-driven applications using Python, leveraging libraries and frameworks for Big Data analytics.

• Utilized HBase as a real-time data storage solution, enhancing data retrieval capabilities for software applications.

• Developed Java objects using Object Oriented and Multithreading concepts.

• Implemented version control using Git in a Unix environment, ensuring collaborative software development and efficient code management.

• Written the use case JUNIT tests as per and used Mockito framework to confirm the expected results.

• Used POJO classes, spring configuration files to use dependency injection. Molina Health Care (Bellevue, WA) August 2021 – August 2022 Role: Data Engineer

Responsibilities:

Developed and designed data models and ETL jobs for data acquisition and manipulation, ensuring data quality and master data management.

Expertise in JSON script development for deploying pipelines in Azure Data Factory (ADF) and using Databricks with ADF for processing large volumes of data.

Performed ETL operations in Azure Databricks, connecting to relational database source systems via jdbc connectors and automating file validations with Python scripts.

Implemented automated processes in Azure cloud for daily data ingestion from web services to Azure SQL DB.

Collaborated with report analysts and Epic build teams to resolve discrepancies between Cache and Clarity.

Proficient in snowflake for creating and maintaining tables and views, as well as conducting one-time data migration from SQL Server to Snowflake using Python and SnowSQL.

Developed streaming pipelines using Azure Event Hubs and Stream Analytics, and analyzed data in Azure Data Lake and Blob with Databricks.

Utilized Tableau capabilities for data visualization and developed custom alerts using Azure Data Factory, SQLDB, and Logic App.

Implemented Databricks ETL pipelines using notebooks, Spark DataFrames, Spark SQL, and Python scripting.

Automated Teradata ELT and admin activities with Python and Shell scripts, and performed application-level DBA activities for Teradata.

Designed and implemented database solutions in Azure SQL Data Warehouse and Azure SQL, collaborating with application architects on infrastructure and Platform as a Service (PaaS) applications. Big Data Developer

LETHYA GROUP (HYD, INDIA) March 2018 – Dec 2020

• Participated in all phases including Analysis, Design, Coding, Testing and Documentation.

• Participated in full SDLC, including analysis, design, coding, testing, and documentation.

• Conducted business analysis, gathered requirements, and utilized SQL for data analysis.

• Developed data governance strategies for Hadoop ecosystems, ensuring data quality and compliance.

• Integrated and optimized Big Data technologies like Apache Spark for real-time processing.

• Designed ER diagrams, logical and physical databases, and proposed ETL strategies.

• Implemented version control with Git and developed ETL scripts on AWS, increasing data fetching speed by 40%.

• Designed star schema data warehouse models and implemented data models for distributed databases like Apache HBase and Apache Cassandra.

• Developed custom scripts, procedures, queries, and user interfaces for SQL databases.

• Analyzed source data from Oracle, SQL Server, and flat files and participated in system and disaster recovery processes.

• Extracted data into Hadoop HDFS and utilized Informatica PowerCenter for ETL processes. Software Intern S&K TECHNOLOGIES(HYD, INDIA) AUGUST 2017 – FEB 2018 Developed an efficient website using file-hierarchy attribute-based encryption scheme in Cloud Computing to conveniently manage the details and information of a hospital management system using object-oriented languages like C# and Java along with relational databases and HTML. PROJECTS

An Efficient File Hierarchy Attribute based Encryption Scheme in Cloud computing Layered access structures are integrated into a single access structure, and hierarchical files are encrypted with theintegrated access structure. This project is used to access the details of its own report by receiving an approval keyfrom the administrator. An Automatic Toll-Tax Collection System using RFID Vehicles are equipped with an RFID tag that contains the vehicle's registration number and can be read by the RFID reader at the tollhouse. The mobile app was developed for customer engagement.

Developed Client Server Model for finding the shortest distance and path between two cities. Built a Client - Server architecture, where client sends the names of two different cities to server where it runs shortestpath algorithms such as Dijkstra, Bellman ford, A* Search on different threads and return the shortest path and distanceto client. Database Application in android app

Developed an android application which connects to MySQL database and display data from the table, Android Text view show data from MySQL database and two buttons one for connecting database and other for clearing the Text View content. Certificates & Co-Curricular Activities:

• Certified as a best ethical hacker in the SUNTEK Hackathon.

• Certified in HTML5 Application Development Fundamentals by MTA (Microsoft Technology Associate)

• Certified in Database Administration fundamentals by MTA (Microsoft Technology Associate)

• Published a research paper involving “Customized Toll Assessment Framework Utilizing RFID and Application”

Contact this candidate