Resume

Data Engineer / Data Analyst

Location:

Houston, TX

Posted:

March 15, 2024

Contact this candidate

Resume:

Kavya A

Data Engineer

Email ID : ad4cxe@r.postjobfree.com

Mobile # : 774-***-****

LinkedIn ID : https://www.linkedin.com/in/kavyaa9/

PROFESSIONAL SUMMARY

Highly dedicated, inspiring, and expert Data Engineer around 6 years of IT industry experience exploring various technologies, tools and databases.

Have good experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, and Spark).

Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB), SQL Server, Oracle, Data Warehouse etc. Build multiple Data Lakes.

Good working knowledge on Snowflake and Teradata databases.

Experience in Database Design and development with Business Intelligence using SQL Server, Integration Services (SSIS), DTS Packages, SQL Server Analysis Services (SSAS), DAX, OLAP Cubes, Star Schema and Snowflake Schema.

Excellent Programming skills at a higher level of abstraction using Scala and Python.

Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark MLlib, Spark Streaming and Spark SQL.

Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.

Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.

Worked on reading multiple data formats on HDFS using Scala.

Experienced with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN

Knowledge about unifying data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.

Experience data processing like collecting, aggregating, moving from various sources using Apache Kafka.

Developing ETL pipelines in and out of data warehouse using combination of Python and Snowsql.

Substantial experience in Spark 3.o integration with Kafka 2.4

Experience in developing customized UDF's in Python to extend Hive and Pig Latin functionality.

Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python.

Hands-on use of Spark and Scala API's to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala.

Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.

Good experience in working with cloud environment like Amazon Web Services (AWS) EC2 and S3.

Experience in Implementing Continuous Delivery pipeline with Maven, Ant, Jenkins, and AWS.

Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software in AWS EC2.

Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.

Excellent understanding of best practices of Enterprise Data Warehouse and involved in Full life cycle development of Data Warehousing.

Hands on experience working on NoSQL databases including HBase and its integration with Hadoop cluster.

Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.

Good knowledge of stored procedures, functions, etc. using SQL and PL/SQL.

Expertise in using Version Control systems like GIT.

Optimized and tuned ETL processes & SQL Queries for better performance.

Performed complex data analysis and provided critical reports to support various departments.

Extensive Python scripting experience for Scheduling and Process Automation.

Experience with Unix/Linux systems with scripting experience and building data pipelines.

Responsible for migration of application running on premise onto Azure cloud.

Experience on Cloud Databases and Data warehouses (SQL Azure and Confidential Redshift/RDS).

TOOLS AND TECHNOLOGIES:

Hadoop Ecosystem

MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Scala, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

BI Tools

Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports

RDBMS

Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, MS Access, RDBMS, MySQL, DB2, Hive, Microsoft Azure SQL Database

Operating Systems

Microsoft Windows Vista7/8 and 10, UNIX, and Linux

Methodologies

Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model

Programming Languages

SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED, Python.

Other Tools

TOAD, BTEQ, MS-Office suite (Word, Excel, Powerpoint and Outlook).

Cloud Management

AWS (Amazon Web Services), MS Azure

PROFESSIONAL EXPERIENCE:

Client: Lexis Nexis, TN Mar 2023 – Till Date

Role: Teradata Engineer

Key Responsibilities:

As a Data Engineer, Maintained data pipelines, data integration, ETL processes and data management delivery within a global data lake and data warehouse environment.

Installed Kafka on Hadoop cluster and configured producer and consumer to establish connection from source to HDFS with popular hash tags.

Used Agile (SCRUM) methodologies for Software Development.

Worked in Azure environment for development and deployment of Custom Hadoop Applications.

Provided 24x7 support on IBM DataStage, IBM Cognos, Longview loading processes and Jobs.

Designed and built a Data Discovery Platform for a large system integrator using Azure HDInsight components.

Created dashboard to monitor and managed data flow using NiFi.

Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI.

Used Azure data factory and data Catalog to ingest and maintain data sources.

Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.

Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark data bricks cluster.

Hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.

Good knowledge on creating Data Pipelines in SPARK using SCALA.

Experience in developing Spark Programs for Batch and Real-Time Processing. Developed Spark Streaming applications for Real Time Processing.

Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.

Performed Data Ingestion from multiple internal clients using Apache Kafka.

Worked on integrating Apache Kafka with Spark Streaming process to consume data from external REST APIs and run custom functions.

Developed Spark scripts by using Scala Shell commands as per the requirement.

Configured, deployed, and maintained multi-node Dev and Tested Kafka Clusters.

Implemented real time system with Kafka and Zookeeper.

Configured spark streaming data to receive real time data from Kafka and store it in HDFS.

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.

Selected and generated data into csv files and stored them into AWS S3 by using AWS EC2 and then structured and stored in AWS Redshift.

Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, PostgreSQL, Data Frame, OpenShift, Talend, pair RDD's.

Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.

Generated report on predictive analytics using Python and Tableau including visualizing model performance and prediction results.

Good knowledge on Spark components like Spark SQL, MLlib, Spark Streaming and GraphX.

Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.

Used Python to implement Data Processing pipelines.

Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.

Worked on reading multiple data formats on HDFS using python.

Implemented Spark using Python (pySpark) and Spark SQL for faster testing and processing of data.

Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console with Azure Monitor.

Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.

Worked for improve the performance for Data stage jobs by tuning them, changing the job designs, changing the SQL queries.

Built Azure Data Warehouse Table Data sets for Tableau Reports.

Developed customized classes for serialization and De-serialization in Hadoop.

Analyzed the SQL scripts and designed the solution to implement using python.

Developed Shell, Perl and Python scripts to automate and provide Control flow to Pig scripts.

Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.

Performed data profiling and transformation on the raw data using Pig, Python.

Created Hive External tables to stage data and then move the data from Staging to main tables.

Worked with ETL tools to migrate data from various OLTP and OLAP databases to the data mart.

Importing and exporting data into HDFS from MySQL and vice versa using Sqoop and manage the data coming from different sources.

Used Reverse Engineering approach to redefine entities, relationships and attributes in the data model.

Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.

Automated data flow between nodes using Apache NiFI.

Environment: ETL, Kafka 1.0.1, Hadoop 3.0, HDFS, Agile, MS Azure, Apache NiFi, Pyspark, Spark 2.3, SQL, Python, ETL, Erwin 9.8, CI, CD, Hive 2.3, NoSQL, HBase 1.2, Pig 0.17, OLTP, HDFS, MySQL, Sqoop 1.4, OLAP.

Client: Wipro Pvt. Ltd, Pune India Dec 2020 – Aug 2022

Role: Data Engineer/Data Analyst

Key Responsibilities:

Worked with the analysis teams and management teams and supported them based on their requirements.

Generated PL/SQL scripts for data manipulation, validation, and materialized views for remote instances.

Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.

Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.

Developed live reports in a drill down mode to facilitate usability and enhance user interaction.

Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.

Wrote Python scripts to parse XML documents and load the data in database.

Used Python to extract weekly information from XML files.

Developed Python scripts to clean the raw data.

Developed a python script to transfer data, REST API’s and extract data from on-premises to AWS S3. Implemented Micro Services based Cloud Architecture using Spring Boot.

Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue

Worked on AWS CLI to aggregate clean files in Amazon S3 and on Amazon EC2 Clusters to deploy files into Buckets.

Used AWS CLI with IAM roles to load data to Redshift cluster.

Responsible for in-depth data analysis and creation of data extract queries in both Netezza and Teradata databases.

Used Scala to write code for all Spark use cases.

Assigned name to each of the columns using case class option in Scala.

Developed multiple Spark SQL jobs for data cleaning.

Developed Spark SQL to load tables into HDFS to run select queries on top.

Developed analytical component using Scala, Spark, and Spark Stream.

Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.

Designed custom Spark REPL application to handle similar datasets Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.

Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using java to perform event driven processing. Created Lambda jobs and configured Roles using AWS CLI.

Used Apache Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.

Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.

Responsible for design and development of Python programs/scripts to prepare transform and harmonize data sets in preparation for modeling.

Extensive development in Netezza platform using PL/SQL and advanced SQLs.

Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.

Designed reports in SSRS to create, execute, and deliver tabular reports using shared data sources and specified data sources. Also, Debugged, and deployed reports in SSRS.

Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes.

Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.

Development of routines to capture and report data quality issues and exceptional scenarios.

Creation of Data Mapping document and data flow diagrams.

Developed Linux Shell scripts by using Nzsql/Nzload utilities to load data from flat files to Netezza database.

Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.

Worked on QA the data and adding Data sources, snapshot, caching to the report.

Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.

Environment: SAS, SQL, Teradata 13, Oracle 11c, PL/SQL, UNIX, XML, Python, AWS, SSRS, TSQL, Hive 2.1, Sqoop 1.2

Client: Data Matics Tech, Hyderabad India Sep 2019 – Nov 2020

Role: Data Analyst

Key Responsibilities:

Conducted sessions with the Business Analysts and Technical Analysts to gather the requirements.

Used SQL joins, aggregate functions, analytical functions, group by, order by clauses and interacted with DBA and developers for query optimization and tuning.

Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.

Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL.

Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.

Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.

Performed data analysis and data profiling using complex SQL on various sources systems.

Utilized a diverse array of technologies and tools as needed, to deliver insights.

Developed complex PL/SQL procedures and packages using views and SQL joins.

Optimized the data environment to efficiently access data Marts and implemented efficient data extraction routines for the delivery of data.

Collected, analyze, and interpret complex data for reporting and performance trend analysis.

Extracted data from different sources performing Data Integrity and quality checks.

Developed documents and dashboards of predictions in Micro strategy and presented it to the business intelligence team.

Used MS Excel, and Power Point to process data, create reports, analyze metrics, implement verification procedures, and fulfill client requests for information.

Designed Ad-hoc reports as per business analyst, operation analyst, and project manager data requests.

Environment: DB2, PL/SQL, T-SQL, SAS, SQL Server, MS Access, MySQL, Crystal Reports

Client: Computer Science Corporation, India Aug 2018 – Aug 2019

Role: Data Analyst

Key Responsibilities:

Worked extensively in data analysis by querying in SQL and generating various PL/SQL objects.

Involved in creating new stored procedures and optimizing existing queries and stored procedures.

Worked on scripting some complex stored procedures using T-SQL in creating metrics for the data.

Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.

Analyzed and build proof of concepts to convert SAS reports into tableau or use SAS dataset in Tableau.

Performed thorough data analysis for the purpose of overhauling the database using SQL Server.

Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP, and OLAP.

Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources

Gathered and documented the Audit trail and traceability of extracted information for data quality.

Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.

Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.

Produced dashboard SSRS reports under report server projects and publishing SSRS reports to the report's server.

Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.

Developed the financing reporting requirements by analyzing the existing business objects reports.

Wrote multiple SQL queries to analyze the data and presented the results using Excel, Access, and Crystal reports.

Performed ad-hoc analyses, as needed, with the ability to comprehend analysis as needed.

Environment: SQL, PL/SQL, T-SQL, SAS, Tableau 9.0, OLTP, OLAP, SSIS, SSRS, Excel, Crystal reports 14.0

Certification:

Certified Microsoft Azure Data Engineer Associate

Education:

Master of Science, Computer Information Science

New England college, Henniker, New Hampshire, USA.

Bachelor of Technology, Computer Science

KITS Affiliated to Jawaharlal Nehru Technological University, Kakinada, India.

Contact this candidate