Data Engineer

Location:

Fords, NJ

Posted:

May 03, 2024

Contact this candidate

Resume:

VENUGOPAL LAKAVATH

Data Engineer

Kansas city, Missouri 64111 816-***-****

*******************@*****.*** https://linkedin.com/in/venugopal-lakavath-02b1a8191

Dynamic and experienced SEO Data Engineer with a proven track record of over 3 years in Cloud technologies, ETL pipelines, and data analysis. Proficient in Snowflake, Azure, AWS, Python, Java, and SQL. Seeking to contribute advanced technical skills in data engineering to drive success for DEPT, a forward-thinking organization that values innovation and a unique blend of tech and marketing.

Work Experience

Snowflake Developer/Data Engineer Oct 2020 - Jul 2022

Westfield Specialty Hyderabad

●Orchestrated an Insurance Domain project, constructing pipelines to ingest data from AWS S3 bucket to Snowflake, aligning data with business requirements

●Implemented advanced data validation checks, leading to a 15% improvement in data accuracy

●Collaborated cross-functionally to optimize ETL processes, resulting in a 20% reduction in data processing time

●Designed and managed Azure Data Factory pipelines, achieving a 40% increase in data processing efficiency

●Conducted in-depth data analysis and resolved validation checks, reducing discrepancies by 20%

●Leveraged Azure Data Bricks notebooks for agile data transformation, improving speed by 50%

●Collaborated with the data governance team to ensure compliance with industry standards.

Data Engineer May 2019 - Sep 2020

WELLS FARGO Hyderabad

●Orchestrated the complete SDLC of various projects encompassing requirement analysis, design, development, testing, deployment, and production support, ensuring alignment with organizational goals and best practices. Leveraged Big Data technologies such as Hadoop, MapReduce, HBase, Hive, and Sqoop, utilizing Python alongside Spark for processing and analyzing multi-source data. Imported and transformed structured data using Sqoop and Spark, with Python scripts for automated handling and storage into HDFS in CSV format. Implemented advanced data management techniques such as partitioning and bucketing and developed Hive queries for data processing and cube generation for visualization. Implemented data ingestion processes from various sources into Redshift, including Amazon S3 and RDS, ensuring consistent and reliable data integration. Designed and developed a Data Lake in the AWS environment, synchronizing data across multiple platforms and ensuring data integrity. Utilized Spark Streaming APIs and Kafka for real-time data analysis and persistence to AWS S3 or Cassandra, demonstrating real-time analytics capabilities. Connected Snowflake and Cassandra databases to Amazon EMR, analyzed data using CQL, and loaded data using DataStax Spark-Cassandra connector, exhibiting deep expertise in complex data storage solutions. Configured the Elastic Stack (ELK) for log analytics and full-text search, integrating it with AWS Lambda and CloudWatch to enhance application monitoring. Maintained a data warehouse ecosystem on AWS, utilizing Redshift for storing and processing large-scale datasets for analytics and reporting purposes. Developed and deployed Kafka producer and consumer applications on a Kafka cluster, managing configuration through Zookeeper Utilized Star and Snowflake Schemas for data modeling, with hands-on experience in OLTP/OLAP systems, and data modeling using Erwin. Developed Spark Code, Spark-SQL/Streaming, and Scala’ scripts for efficient data processing, utilizing features like Spark Context, Data Frame, Pair RDD's, and Spark YARN. Expert in automating and scheduling the informatica jobs using UNIX shell scripting configurating korn-jobs informatica sessions. Used Python to develop pre-processing jobs, flatten JSON documents to flat files, and convert distributed data into organized formats with Scala's Data Frame API. Created comprehensive reports using Tableau, collaborating with the BI team and utilizing Python for data extraction and transformation to meet specific business requirements.

Environment : Python, Hadoop Yarn, Spark Core, Unix, Spark Streaming, Spark SQL, HBase, Scala, Kafka, Hive, Sqoop, AWS, S3, AWS Lambda, Glue, Cloud Watch, IAM, OLTP/OLAP, Cassandra, Tableau, MySQL, Linux, Shell scripting, Agile Methodologies.

Core Skills

OTHER, Languages : Python, SQL, Java, Cloud Platforms : Google Cloud Platform (GCP), Azure, AWS, Data Warehousing : Snowflake, BigQuery, ETL Tools : Azure Data Factory, Azure Data Bricks, Database Management : MySQL, PostgreSQL, Scripting : Bash, Shell Scripting, CI/CD Tools : Jenkins, Big Data Technologies : Hadoop, Spark, DevOps and Infrastructure as Code (IAC) : Docker, Kubernetes, Data Governance and Compliance, Data Security and Encryption, Data Pipeline Orchestration : Apache NiFi, Apache Airflow, Data Quality Frameworks, Machine Learning and Data Science Concepts, NoSQL Databases : MongoDB, Agile and Scrum Methodologies, Looker Studio : Hands-on experience, JavaScript : Basic understanding for Dataform management, Terraform, Understanding Of Data, Ability To Effectively Communicate, Communication Skills, Curiosity, Humble, Love Of Learning, Communication Skills, Quality Control, CloudFormation, CloudFormation

Education

University of Missouri-Kansas City Aug 2022 - Dec 2023

Master of Computer Science Kansas city, MO

Keshav Memoria Institute of Technology Jul 2016 - Sep 2020

Bachelor of Computer Science Hyderabad, Telangana

Languages

English, Hindi, Telugu

Contact this candidate