Junior Data Engineer

Location:

Cincinnati, OH

Posted:

April 21, 2023

Contact this candidate

Resume:

BHAGATH BABU

EXPERIENCE SUMMARY

Around * years of experience in the IT industry, including big data environment, Hadoop ecosystem, and Designing, Developing, and Maintenance of various applications.

Having around 1+ years of experience in Data warehousing using ETL - Talend open studio.

Worked with various data technologies and have experience in creating data pipelines, data integration, and data warehousing and have a deep understanding of data modeling and data management best practices and can analyze complex data sets to identify patterns and trends.

Experience in developing custom UDFs for Pig and Hive to incorporate methods and functionality of Python intoPig Latin and HQL (HiveQL).

Expertise in Big Data architecture like Hadoop (Hortonworks, Cloudera) distributed system.

Development of Spark-based application to load streaming data with low latency, using Kafka and Pyspark programming.

Hands-on experience in Hadoop/Big Data related technology experience in Storage, Querying, Processing, and analysis of data.

Experience in designing, developing, deploying, and/or supporting data pipelines using Databricks.

Hands-on experience in Hadoop ecosystem including Spark, Kafka, HBase, Pig, Hive, Sqoop, Oozie, Flume, big data technologies.

Experience in working with MapReduce programs using Apache Hadoop.

Strong hands-on experience with AWS services, including but not limited to EMR, S3, EC2, RDS, Glue, Athena, etc which provide fast and efficient processing of Big Data Analytics.

Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.

Worked on Sparks, Spark Streaming and using CoreSparkAPI to explore Spark features to build data pipelines.

Experienced in working with different scripting technologies like Python, UNIX shell scripts.

Good Knowledge of Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata using Sqoop.

Experience in using PostgreSQL with big data visualization tools such as Tableau.

Experience in designing, implementing, and maintaining complex data pipelines in Snowflake using SnowSQL.

Strong experience in migrating other databases to Snowflake. In-depth knowledge of Snowflake Database, Schema and Table structures.

Hands on Experience with dimensional modeling using star schema and snowflake models.

Excellent Knowledge in understanding Big Data infrastructure, distributed file systems -HDFS, parallel processing - MapReduce framework.

Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.

Experience in working with different data sources like Flat files, XML files, and Databases.

Experience in database design, entity relationships and database analysis, programming SQL.

AML/KYC compliance and risk reporting on a regular and as-needed basis.

Worked with AML technology platforms like Actimize.

Able to communicate effectively with both technical and non-technical stakeholders and to present complex data insights in a clear and concise manner.

TECHNICAL SKILLS

Programming Languages: Python

Querying Languages: SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL, HBase Integration Tool: Sqoop, Flume, Spark, PySpark, Talend Reporting Tools: Tableau

Source Control & Collaboration: Team Foundation Server (TFS), SharePoint Database: SQL Server, MySQL, Hive, SnowSQL, DB2

**************@*****.***

513-***-****

Cloud Services: AWS, Databricks, Snowflake

SDLC: Agile, Kanban, Scrum, Waterfall

EXPERIENCE DETAILS

FIS 01/20 – Present

Big data Engineer

As a Data Engineer in the Data Platform team, responsible for designing, architecting, and implementing secure batch and real-time applications using Snowflake, PySpark,Spark.

Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.

Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.

Designing, developing, and maintaining data pipelines, ETL jobs, data streaming, and data warehousing solutions to support the bank's reporting, analytics, and compliance requirements.

Collaborating with data architects, data analysts, data scientists and other teams to design, implement and maintain scalable data architectures, data models, and data warehousing solutions.

Building and maintaining data pipelines to extract, transform, and load data from various sources like structured, semi- structured and unstructured data, using big data technologies like Kafka, Hadoop, Spark and Snowflake.

Optimizing data pipeline performance and troubleshooting data-related issues as they arise and maintaining documentation for data pipelines, data models, and data warehousing solutions.

Providing guidance and mentoring to junior data engineers on the team.

Assessing and managing data risks and ensuring data quality across the systems

Participating in Agile development processes and working closely with the development team to ensure the timely delivery of data solutions.

Writing efficient and maintainable code in Python, PySpark, and SQL.

Experience with YARN, Airflow, AWS and Databricks for scheduling, monitoring, and management of data pipelines

Experience with Snowflake's unique features like Time Travel, Streams, Tasks and Snowpipe for realtime data processing.

Proficiency in writing complex SQL queries for data transformation, aggregation, and filtering using SnowSQL.

Improved data quality, governance, and compliance by implementing robust data pipelines, data models and data warehousing solutions.

Scheduled Workflows in Tivoli work Scheduler (TWS) through Oozie.

Improved data pipeline performance and reduced data-related issues resulting in better data availability and accessibility.

Improved data-driven decision making and reporting by providing accurate and timely data to business teams.

Reduced data storage and processing costs.

Awards:

Received "Best Data Engineering Team" award for implementing a scalable and cost-effective data pipeline solution for the Actimize fraud detection project.

Environment: Hadoop 3.0, MapReduce, Hive 3.0, Agile, HBase 1.2, PySpark, NoSQL, AWS, Kafka, Pig 0.17, HDFS,Hortonworks, Spark, PL/SQL, Python, Notebook, Databricks, Snowflake,TWS. Sears Hometown & Outlets 01/19 – 01/20

Big Data Developer

The project involves ETL Development to support Business office for Analytics and have monthly and 12Months Rolling data for increasing needs to report transaction activity from Issuer and Merchant perspective. ETL process involves processing HDFS source data using hive scripts and loading hive tables. Ingested hive data into Snowflake for reporting purposes. Worked closely with Baxter’s business to gather BI requirements, working with the solution architect team to design and develop ETL framework. Worked with the source systems like Teradata, MySQL to extract, transform and load the data into the enterprise data warehouse (Hadoop/Hive) based on the business requirements. Upon completion of the project, Baxter business will be provided with the enterprise data which will be used to generate high critical reports and visualizations to make business decisions. Extensive use of ETL methodology for supporting data Extraction, Transformation, and Loading in a corporate-wide- ETL solution using bigdata technologies.

Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.

Worked with migrating data from Legacy network to our network. Migrated Hive tables from hdfs to staging layer by writing shell scripts. Using FileZilla for transferring files from staging layer to local system.

Contributed to the development of key data integration and advanced analytics solutions leveraging Apache Hadoop and other big data technologies for leading organizations using major Hadoop Distributions like Cloudera.

Collected the requirements for the given project and did complete analysis to come up with best possible optimized solution. Interacted with Solution/Technical Architects for solution review and approval.

Applied knowledge of programming techniques to document component design and specifications and solution based on business need and use case.

Implemented and delivered different application modules using Spark, HDFS, Hive, HQL, Impala, OOZIE. Alsoused shell scripting to wrap the applications for automation.

Designed and implemented Incremental Imports into Hive tables.

Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Python. Expertise in implementing PySpark and Spark SQL for fastertesting and processing of data responsible to manage data from different sources.

Developed Spark Programs using Scala and Java API's and performed transformations and actions on RDD's.

Responsible for data extraction and data integration from different data sources into Hadoop Data Lake by creating ETL pipelines Using Spark, MapReduce, Pig, and Hive

Developed Spark programs with Scala and applied principles of functional programming to process the complex unstructured and structured data sets.

Processed the spark data with Spark from Hadoop Distributed File System.

Collaborated with Architects to design Spark model for the existing MapReduce model and migrated them to Spark models using Scala. Worked on writing Scala Programs using Spark-SQL in performing aggregations.

Developed Web Services in play framework using Scala in building stream data Platform.

Reviewed technical solution documents and peer code, and provided review comments, improvement suggestions.

Maintained and modified existing application to correct errors, improve performance or enhance its usability and functionality.

Developed and designed data models, data structures and ETL jobs for data acquisition and manipulation purposes

Developed ETL pipelines using notebooks, Spark Data frames, SPARK SQL, and python scripting.

Error Analysis (Debugging), Code Corrections & Unit Testing of Application modules

Analyzed performance metrics generated by QA Team and determined performance bottlenecks and resolution.

Analyzed and provided input to execute efforts to improve the efficiency and quality of Application as part of continuous improvement process.

Prepared Installation Guides, User Manuals, Standard Operational Procedure, Troubleshooting Guides for Operations/Support and Service Management.

Arranged and took part in daily meetings, SCRUM calls with team members working at different geographical locations. Interacted with onsite-offshore teams and stakeholders.

Investigated and analyzed reported defects in a timely manner, and recommend solutions to these problems including code changes, data updates, or configuration modifications while using defect tracking tool like Atlassian JIRA and ServiceNow to manage defect cycle.

Implemented Build and Deployment scripts using Python, Shell.

Provided technical support to Operations and Infrastructure teams for various deployment environments like UAT, Production.

Worked with Data Mapping Team to understand the source to target mapping rules.

Analyzed the requirements and framed the business logic for the ETL process using Talend.

Involved in the ETL design and its documentation.

Developed jobs in Talend Open Studio from stage to source, intermediate, conversion and Target.

Extracted data from Oracle as one of the source databases.

Worked on Talend ETL to load data from various sources to Oracle DB. Used tmap, treplicate, tfilterrow, tsort, twaitforfile and various other features in Talend.

Worked on Talend ETL and used features such as Context variables, Database components like toracleinput, toracleoutput, file components, ETL components, etc.

Worked on Proof of Concepts for integration with emerging technologies like Docker. Arranged demonstration to display usability which helped in decision making for further integration.

Ensured Application was implemented to the highest standard by following industry best practice. Awards:

Recognition for outstanding contributions to the development and implementation of Migration solutions.

Awards for implementing innovative data management solutions that have had a positive impact on customers. Environment: Pig, Hive, AWS, MapReduce, HDFS, Cloudera, Python,Spark 2.X,Pyspark, SQL, Python,Talend. BDM Innovative Solutions 06/14 – 12/14

Software Engineer

Involved in designing and developing modules at both Client and Server Side.

Developed the UI using JSP, JavaScript and HTML.

Responsible for validating the data at the client side using JavaScript.

Interacted with external services to get the user information using SOAP web service calls

Developed web components using JSP, Servlets and JDBC.

Technical analysis, design, development and documentation with a focus on implementation and agile development.

Developed a Web based reporting system with JSP, DAO and Apache Struts-Validator using Struts framework.

Developed user and technical documentation.

Wrote test cases using Junit and coordinated with testing team for integration tests.

Developed business objects, request handlers and JSPs for this project using Java Servlets and XML.

Installed and configured the Apache Web server and deployed JSPs and Servlets in Tomcat Server. Environment: Java, Servlets, JSP, JavaScript, JDBC, Unix Shell scripting, HTML, Eclipse, Oracle 8i, WebLogic. EDUCATION

Northwestern Polytechnic University

Master of Science in Computer Science

RMK CET Affl. by ANNA university

Bachelor’s in Electrical & Electronics Engg.

CERTIFICATION

AWS Developer – Associate

Credential ID AWS02066084

Contact this candidate