Data Engineer

Location:

Cedar Park, TX

Posted:

January 08, 2025

Contact this candidate

Resume:

VANI VEERANNA

*******@*****.*** 512-***-****

SUMMARY

Skilled Data Engineer with 7+ years of experience as a Data Engineer in designing, Data modeling, analyzing data, discovering insights from data, and constructing data pipelines to process large-scale data, by using Big Data technologies that enable organizations extract value from their data.

Efficient team player in collaborating with data scientists, analysts, and other stakeholders to understand their data needs and transform them into scalable solutions.

Extensive experience in Hadoop led development utilizing Hadoop components such as Apache Spark, PySpark, MapReduce, HDFS, Sqoop, Hive, Impala, and Oozie.

Profound experience in performing data ingestion and data processing ETL (Extract, Transform, Load) and load data to Hive Data warehouse.

Strong understanding of performance and optimization techniques of Hadoop, Spark and worked extensively on PySpark.

Proficient in handling ingestion of data from various data sources into HDFS using Sqoop, performing transformations using MapReduce & Spark, and loading data into HDFS.

Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.

Experience in creating and loading data into Hive tables with appropriate static and dynamic partitions, intended for efficiency.

Managed Sqoop jobs with incremental load to populate Hive external tables.

Experience in Partitioning and bucketing in Hive and designing both Managed and External tables in Hive to optimize performance.

Solid understanding of SQL and develop complex queries for relational databases.

Worked with Hive data warehouse infrastructure-creating schema/tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HQL queries.

Experience with different file formats such as Avro, Parquet, ORC, CSV and XML.

Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Autosys and D-series tools for batch Cycles.

Expert in designing ETL data flows to extract data from RDBMS database for Data Migrations.

Instantiated, created, and maintained CI/CD pipelines and applied automation to environments and applications using tools such as Bitbucket, GIT, Jenkins and Ansible.

Created Oozie scripts and set up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.

Produced attractive Data visualization reports to convey the story inside the data using Tableau.

Hands-on experience with Amazon Web Services (AWS) S3, Glue, EMR, Lambda, and Microsoft Azure Data Factory.

Proficiency in Databricks platform.

Expert in performing data analysis using Hive and providing Ad-hoc reports to Lines of Business (LOBs).

Adept with Python, R, and Unix Shell scripting.

Hands-on experience with NoSQL (Document DB) and Google BigQuery databases.

Created User Stories/Tasks in Jira and collaborated with other members of the Data Engineering team on the design and implementation of an optimal data pipeline design.

Proficient in creating and maintaining technical specifications, diagrams and knowledge documentation as needed to support the efforts.

TECHNICAL SKILLS

Big Data tools

Hadoop, Sqoop, Oozie, Spark/PySpark, Hive, SQL, Impala, NoSQL

Cloud platform

Amazon Web Services(AWS), Microsoft Azure Data Factory (ADF)

Databases

Oracle, Teradata, SQL Server, MongoDB

Programming/Scripting Languages

Shell, Python, R

Business Intelligence tool

Tableau

Web Analytics Tool

Google Analytics

Tools & Utilities

Autosys, Ansible, D-series, Airflow,Bitbucket, Jenkins, Rally, Jira, Jupiter notes, Databricks

EDUCATION & PROFESSIONAL CERTIFICATION(S)

• Master of Computer Applications (MCA), Visvesvaraya Technological University, Belgaum, Karnataka, India.

• Data Analytics Certificate Program, University of Texas at Austin, TX.

EXPERIENCE

Technology Lead Data Engineer Infosys Bank of America, Dallas, TX Aug 2021-May 2024

Designed & developed data pipelines to ingest various types of data files such as CSV, structured and semi structured XML to Hadoop by using MapReduce/ PySpark.

Developed robust SQL scripts to run complex queries based on business requirements, and loaded data into Hive tables.

Extracted data from Hive tables into flat files and exported the files to external clients via SFTP.

Mentored eight new members on the team to develop & deploy data pipeline applications.

Converted non-partition table to partition table in Hive and deploy the application to production environment.

Performed data and application migration validations from RHEL6 to RHEL7.

Certified successful completion of data processing and transfers within the required time constraints.

Validated, monitored, and resolved issues in non-production and production environments.

Designed & developed ten CI/CD deployment pipelines by using Jenkins to deploy the packaged data pipelines.

Developed Autosys JIL tasks for scheduling Ingestion Jobs.

Used Ansible to deploy code in lower environments.

Provided operations support and enhanced performance of the data pipeline applications.

Performed data analysis using Hive and provided Ad hoc reports to LOBs.

Created temp tables, control flow statements (IF/ELSE) and nested CASE statements in SQL transformations.

Worked with team members to address technical problems.

Created and maintained technical specifications, diagrams and knowledge documentation as needed to support the efforts.

Create and design interactive dashboards and visualizations using Tableau software.

Analyze and interpret complex data sets to identify trends, patterns, and insights.

Created Tableau data sources and data extracts, improving data accuracy by 15% and reducing data processing time by 20%.

Train and educate end-users on how to use Tableau dashboards and visualizations effectively.

Stay up-to-date with Tableau software updates and new features to continuously improve data visualization capabilities.

Develop and maintain Tableau documentation, including user guides and technical specifications.

Work with IT teams to ensure Tableau server and data sources are properly configured and maintained.

Provide Tableau expertise and support to other teams and departments as needed.

Senior Data Engineer VISA, Inc., Austin, TX Aug 2016 - Jul 2021

As a member of the Data Platform team, implemented data pipelines for various products such as Visa Checkout, Visa Direct etc. by using Sqoop.

Extracted data from various types of data sources such as Oracle, csv etc. by using Sqoop Import to ingest into Hive tables.

Designed and implemented ETL pipelines using Databricks, created workflows, Schedule the data pipeline job for CSV file data ingestion.

Created and maintained optimal data pipelines’ design for applications.

Collaborated with DevOps team to perform production deployments and provided production support to fix the defects.

Used Hive extensively to perform transformations and pre-aggregations for developing baseline data for reports.

Implemented Oozie workflow engine to run Hive jobs.

Migrated Hive jobs to PySpark jobs using D-series automated tool.

Loaded data into Hive tables with static and dynamic partitions, intended for efficiency.

Worked on various Hadoop file formats such as Parquet, AVRO & CSV.

Used different joins, sub queries, nested query and window functions in SQL query.

Created Views to reduce the time for complex queries.

Configured Encryption Framework in Production & non-production environments, and tested encryption and decryption capabilities for Visa Checkout application.

Designed, developed, tested, and maintained Tableau functional reports based on business requirements.

Designed and developed Tableau workbooks to support business objectives.

Developed Tableau reports and dashboards to meet customer requirements.

Developed Tableau dashboards and reports to support business decisions, resulting in a 15% increase in revenue.

Developed Tableau dashboards to monitor KPIs, resulting in a 20% increase in team efficiency.

Provided production support to Tableau users and wrote custom SQL to support business requirements.

Gathered and analyzed business requirements, groomed stories and features at an early stage, and built solutions using various Big Data technologies to address the business capabilities.

Played the role of Program Manager for the project and responsible for user story creation, estimation and assign stories to 10 members in the team and follow up daily status.

Performed Functionality, Integration, System, Regression, ETL and Reports’ testing.

Test Analyst/Test Lead Perot System, Capgemini, Oracle, Wipro Bangalore, India Jul 2004– June 2014

Plan, deploy and manage testing effort for any given engagement.

Defining the scope of testing within the context of each release / delivery.

Manage resources for testing.

Applying the appropriate test measurements and metrics in the product and the Testing Team.

Performed defect detection using modern structured quality control techniques and tools.

Performed Data Warehouse & Reports’ testing.

Worked on various domains such as Investment Banking, Retail, Telecom, Trade Compliance, and Healthcare.

Utilized Software Test Development Life Cycle and Test Methodologies.

Performed Functionality, Integration, System and Regression Testing.

Prepared Test Case Specifications and Test Reports.

Actively participated in reviews and meetings.

Identified and tracked Bug Reports using Bug tracking Tools.

Utilized solid project and people Management skills.

Participated and used Agile Software Development Methodologies and practices such as Scrum, Kanban.

Performed Risk Analysis, Proofs of Concepts (PoCs) and Software Test Estimations.

Worked in London, UK for the client, UBS for 06 months from March 2006 through September 2006.

Contact this candidate