Data Engineer

Location:

Austin, TX

Posted:

May 14, 2024

Contact this candidate

Resume:

RAJA RAJESWARI UPPALAPATI

Austin,TX ad5pbi@r.postjobfree.com +1-848-***-**** LinkedIn Certified GCP Data Engineer By Google

PROFESSIONAL EXPERIENCE

Highly skilled Data Engineer with 7 years of experience specializing in cloud platforms like GCP and Azure in Retail domain. Proficient in developing and optimizing data pipelines, managing databases, and implementing advanced analytics solutions, with a strong focus on collaboration, continuous improvement, and project management.

Senior Cloud Data Engineer 01/2022 – 01/2023 GSPANN India

●Developed REST APIs using Python and deploying them into Google Cloud Platform (GCP) using Cloud Functions and Cloud Run offers a scalable and efficient solution for web service deployment.

●Designed and deployed ETL and ELT pipelines with Python into GCP using Cloud Composer and Dataflow, achieving a 50% boost in data processing efficiency. This led to optimized data flows and a 30% reduction in pipeline latency.

●Implemented distributed tracing and logging solutions for microservices on Google Cloud Platform (GCP), leading to a 40% reduction in debugging time and a 30% increase in troubleshooting efficiency.

●Implemented scalable and resilient microservices architectures on GCP, leveraging Cloud Pub/Sub, Cloud Run, and Cloud Functions to enable event-driven communication and serverless computing.

●Performed data analysis and collaborated with cross-functional teams to design and implement data solutions that meet business requirements and adhere to best practices in data engineering on GCP.

●Implemented data visualization solutions using Tableau to provide insights into Google Cloud Platform (GCP) data infrastructure, enabling stakeholders to make informed decisions based on real-time analytics and trends.

Team Lead / Software Cloud Data Engineer 07/2016 – 01/2022

Tata Consultancy Services India

●Successfully migrated the entire Oracle database to BigQuery, enabling Power BI integration for robust reporting, which led to a 60% reduction in data retrieval time and enhanced analytics capabilities.

●Collaborated with the Product Owner, Quality Engineers, Software Engineers, Development DBA to understand and develop against structured requirements by attending the Agile meetings.

●Optimized SQL scripts through detailed analysis and design using PySpark SQL, leading to a 40% reduction in query execution time and a 50% increase in data processing efficiency.

●Moved data between GCP and Azure with Azure Data Factory, achieving a 70% reduction in migration time. Analyzed and debugged issues during software build and integration, leading to a 50% drop in integration errors.

●Utilized GCP storage and data processing services like Google Cloud Storage, BigQuery, and Pub/Sub for smooth integration with Spark applications, achieving a 60% improvement in data transfer efficiency.

●Built end-to-end data pipelines on GCP using Spark to ingest, process, transform, and analyze large volumes of structured and unstructured data, enabling efficient handling of complex data sets.

●Enhanced Kafka and microservices performance on GCP by optimizing resource utilization, implementing load balancing, and configuring auto-scaling, leading to high throughput and reduced latency.

TECHNICAL SKILLS

Languages/Technologies : Python,.Net,SQL, HTML5, CSS3,Oracle,Linux,

Libraries/Frameworks : MVC, NumPy, Pandas,Flask,DataFrames.

Database/Cloud : MySQL, SQL Server,GCP Cloud composer,Dataflow,Cloud Functions, Cloud Run,BigQuery,GCS,Pub/Sub,

Azure Data Bricks,AWS S3,Kafka,Airflow,ADF

Software/Tools : Visual Studio Code, Git, MYSQL Workbench, Jupyter Notebook, Docker, Jenkins,Tableau.

PROJECTS

Macy’S 09/2022-01/2023

●Developed PL/SQL stored procedures in BigQuery to insert data from upstream systems like SAP, creating Functional Specification Documents (FSD) and Technical Specification Documents (TSD), reducing data integration errors by 40%.

●Created DAGs in Cloud Composer to export data from GCS to BigQuery, supporting both delta and full push data, which improved data synchronization by 50% and reduced data transfer errors by 35%..

●Resolved JIRA stories within deadlines, maintained the code repository in GitHub, and demonstrated completed code to clients, achieving a 95% approval rate. Enhanced code based on client feedback, reducing change requests by 40%.

●Created Azure SQL databases, monitored and restored them, and migrated Microsoft SQL Server data to Azure SQL. This reduced database downtime by 50% and improved data recovery times by 40%.

●Extensive experience with NoSQL databases like Cosmos DB and relational databases such as MySQL and Postgres, showcasing versatility in handling various data management solutions and addressing a wide range of business needs.

World Market 05/2022 – 08/2022

●Developed APIs in Azure App Services using Docker containers, ensuring scalability and reliability. Tested the APIs extensively in Postman to validate functionality and performance before deployment.

●Established Azure DevOps pipelines to automate Docker image deployments across different environments, achieving a smoother, more consistent process and reducing deployment errors through automated CI/CD practices.

●Proficient in developing ETL processes with Azure Data Factory and SSIS, integrating large data volumes from various sources while maintaining high data quality and accuracy. Achieved a 40% improvement in data integration efficiency.

TailoredBrands 03/2022 – 05/2022

● Developed batch jobs in dataflow and Airflow and created Streaming and Batch jobs by creating new templates

in cloud dataflow. Involved in requirement gathering calls and provided by inputs as well.

● Skilled in advanced SQL techniques, implementing Slowly Changing Dimensions (SCD) for efficient data management

and historical tracking. This led to a 30% increase in data accuracy and improved historical trend analysis.

BlueStarFamilies 01/2022 - 03/2022

● Deployed APIs by setting up Pub/Sub and Kafka topics, facilitating seamless communication and event-driven workflows

in the GCP ecosystem. This resulted in a 35% reduction in message delivery latency and improved system scalability.

● Developed RESTful APIs in Cloud Run on Google Cloud Platform (GCP) using Docker containers, enabling scalable and

efficient deployments. This led to a 40% improvement in deployment speed and increased system elasticity.

● Developed APIs with Salesforce Marketing Cloud and deployed them into Cloud Functions on Google Cloud Platform,

harnessing serverless architecture for streamlined execution and improved scalability.

WoolWorths 07/2016-01/2022

●Expert in Google Cloud Platform services, including Dataproc, Cloud Storage (GCS), Cloud Functions, and BigQuery,

with a focus on data analytics and processing. This expertise contributed to a 50% boost in data processing speed .

●Skilled in optimizing SQL scripts with PySpark SQL to improve performance and efficiency, especially when analyzing

large datasets, resulting in faster query execution and reduced processing times.

●Proficient in data integration between Google Cloud Platform (GCP) and Azure using Azure Data Factory, enabling smooth data movement across cloud platforms. This ensured data consistency and improved cross-platform compatibility.

●Designed and implemented data pipelines using Google Cloud Pub/Sub and Apache Kafka for real-time data streaming and processing. This setup improved data throughput by 40% and reduced data processing latency by 30%.

●Skilled in configuring Google Cloud Platform (GCP) services with Cloud Shell SDK, enabling efficient setup and management of Dataproc, Cloud Storage, and BigQuery. This streamlined resource allocation and improved project management.

●Designed and led the implementation of advanced analytical models in a Hadoop Cluster for large datasets, collaborating with the Data Science team. This improved data processing efficiency by 50% and accelerated model development by 30%.

●Extensive experience building and monitoring ELT pipelines with Data Fusion, establishing CI/CD pipelines using GitLab for continuous integration, and implementing microservices and Kafka stack for high-priority projects, reducing deployment time by 40%.

●Handled downloading BigQuery data into pandas DataFrames for advanced ETL operations, enabling more complex data transformations and analyses. This approach streamlined data manipulation and improved processing capabilities.

●Optimized Hive queries by applying best practices and selecting optimal parameters, utilizing technologies like Hadoop, Python, and PySpark. This improved query execution speed and overall data processing efficiency.

●Proficient in leveraging cloud shells for diverse tasks and deploying services. Developed BigQuery authorized views for robust row-level security and seamless data sharing, enhancing team collaboration.

CERTIFICATIONS

Certified Google professional Data Engineer by Google

EDUCATION

Master of Science in Computer Science May 2024 University of Missouri- Kansas City, GPA: 3.7/4 Missouri, USA

Contact this candidate