Post Job Free
Sign in

Data Engineer Azure

Location:
San Diego, CA
Posted:
May 30, 2024

Contact this candidate

Resume:

Profile

Innovative and results-driven Data Engineer with over 3+ years of comprehensive experience in designing, developing, and maintaining large-scale data processing systems. Expertise in leveraging a wide array of data technologies and programming languages to build reliable and scalable data solutions. Proficient in cloud-based technologies, with a strong focus on optimizing data warehousing, ETL processes, and real-time data streaming services. Skills

Languages & Framework (Python, R, Bash, Scala, SQL, NoSQL, Hadoop, Apache Spark, PySpark, Apache Kafka, NumPy, Pandas, Matplotlib, SciPy, Scrapy, Seaborn) Database & Cloud (MySQL, MS SQL, PostgreSQL, MongoDB, AWS (EC2, S3, RDS, VPC, Glue, Nifi, DynomoDB, Redshift, Athena), GCP(BigQuery), Azure (Blob, VM, Azure Active Directory, Data Factory, Data Lake Gen 2), Databricks) Tools & Technology (Power BI, Tableau, Jupyter Notebook, IBM SPSS, Apache Nifi, Data Mining, Data Warehousing, Snowflake, Star Schema, ETL, GitHub, Scrum, Waterfall Model, Terraform) Professional Experience

Data Engineer, Wayfair 08/2023 – present Remote, USA

•Implemented Apache Kafka to handle real-time data streams, enhancing data ingestion efficiency and scalability. Reduced data latency by 70% through effective message queuing and processing.

•Engineered ETL pipelines using Python and Apache Airflow, ensuring seamless data extraction, transformation, and loading processes. Enhanced data quality and consistency while optimizing workflow automation.

•Managed SQL (MySQL) and NoSQL (Cassandra) databases, optimizing them for performance and scalability. Implemented indexing strategies and schema optimizations for efficient data storage and retrieval.

•Implemented machine learning models, such as regression and classification algorithms, integrated with Apache Flink for real-time data analysis, enhancing predictive capabilities and adaptability over time.

•Utilized Apache Flink for real-time data streaming, processing approximately 500,000 records per hour. Significantly reduced data latency, enabling timely insights and decision-making for stakeholders.

•Developed interactive dashboards and reports using Power BI and MS Excel, providing stakeholders with intuitive insights into key metrics and trends. Enabled data-driven decision-making and cross-functional collaboration. Data Engineer, TATA Consultancy Services Pvt. Ltd. 10/2018 – 12/2021 Hyderabad, India

•Orchestrated ETL operations using Azure Data Factory and Azure Databricks, processing over 10TB of data monthly from various sources to Azure storage services including Azure Data Lake, Azure SQL, and Azure Data Warehouse.

•Successfully implemented and managed Azure Data Lake platforms, handling the extraction and integration of data from 15+ upstream systems, resulting in a 20% increase in data retrieval efficiency.

•Engineered and maintained 30+ pipelines in Azure Data Factory, enhancing the ETL process for diverse data sources like Azure SQL and Blob Storage, leading to a 25% reduction in processing time.

•Developed and optimized 20+ Spark applications using Scala and Spark-SQL, facilitating a 30% improvement in data processing speed and efficiency for complex data sets.

•Expertise in integrating data from NoSQL (MongoDB) and SQL databases (MySQL), using Debezium for real-time data streaming into Kafka, handling over 5TB of streaming data per month.

•Employed Airflow for batch processing, managing data transfers of approximately 8TB per month into Azure Data Lake Storage, streamlining data management and analytical processes.

•Implemented advanced Partitioning and Bucketing techniques in HIVE, leading to a 10% improvement in data query performance and organization.

•Enhanced data availability and usability by 30% through Hive query refinement, alongside implementing diverse HDFS file formats and compression techniques for a 15% increase in storage efficiency. Education

Master of Science, Oklahoma State University 01/2022 – 12/2023 Stillwater, USA Computer Science

Bachelor of Engineering, Anna University 08/2014 – 05/2018 Chennai, India Civil Engineering

Venkata Ragavendra Vavilthota Data Engineer

*******@**********.*** 945- 216- 2477 San Diego California Linkedin



Contact this candidate