Data Engineer Sql Server

Location:

Newark, DE

Posted:

May 17, 2024

Contact this candidate

Resume:

Chandrasekhar Simmana

ad5rvj@r.postjobfree.com

240-***-****

Newark, DE

LinkedIn: - linkedin.com/in/chandra-simmana-25167088 SUMMARY:

• Over 11 years of IT experience in design, development, and implementation of large-scale applications using various platforms like ETL Process and Hadoop.

• I have good knowledge on Spark and its internal architecture and capable of processing large sets of structured, semi structured data sets.

• Worked in various Hadoop distributions Cloudera (5.2, 6.1)-CDH & Cloudera (7.1)-CDP and Migrations.

• Specializes in data modeling, relational database modelling, ETL methodology and business intelligence and infrastructure building, data migration, and relational databases systems development with waterfall and Agile SDLC model.

• Solid hands on experience in developing PL/SQL, stored procedure and Unix shell scripting.

• Leveraged Hadoop ecosystem knowledge to design and develop capabilities to deliver our solutions using technologies Spark, YARN, Scala, Hive, and No-SQL databases.

• Worked on data ingestions using Sqoop and Streaming tools, involving importing of data to HDFS, Hive, and HBase from various data sources, also performed transformations using Hive and Spark.

• Experience in optimizing hive and spark jobs.

• Loading the data from the different data sources like (Teradata, Oracle, MS SQL and DB2) into HDFS using Sqoop and loading it into partitioned Hive tables.

• Good knowledge on different file formats like CSV, Delimiter, Fixed width, Parquet, and JSON Files.

• Good knowledge on cloud migration projects from HDFS to AWS.

• Good knowledge on cloud migration from Teradata, Oracle & My SQL servers.

• Expertise in writing Hive ETL scripts in an optimized way for performing large scale operations.

• Experience in building ETL applications in Apache spark using Scala and Python.

• Expert in supporting and optimizing Hive, Spark and ETL jobs which are running in production.

• Working experience in building automated scripts as per requirements in Shell, and Python.

• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.

• Good Knowledge of Databases like DB2, MySQL, MS Access, MS SQL Server, Teradata, Vertica and DB2.

• Good knowledge on CI/CD process, GitHub, and Code migration process. Technical Skill Set

Hadoop ecosystem Cloudera (CDH/CDP), HDFS, Hive, Oozie, Spark, Pyspark, Impala, Hue, Splunk

Databases Teradata, Oracle, MS SQL, RDS, Aurora SQL Programming Core Java, Scala, Shell Scripting, Python Scheduling Tools Control Center, Control-M, Maestro, Autosys Reporting Tools Tableau

Operating Systems Windows, Unix, Linux

Version Controls GitHub, Jenkin and Ansible

Professional Experience

July 2022 – Present Bank of America Corporate, Newark, DE Big Data Engineer

• Understanding the business requirements and implementing logic to process and load data into hadoop.

• Worked on Spark SQL, Reading/Writing data from different file formats Text file, parquet files and JSON.

• Extensively worked on code Migration from Development, UAT and Production.

• Executing adhoc queries for trend analysis and finding various transaction patterns.

• Created Hive tables, loaded data and wrote Hive QL queries to further analyze the data.

• Developed Map reduce programs for data analysis and data cleaning.

• Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing, Pig Jobs and Hive jobs.

• Experience in working Internal data ingestion tools to load data into hadoop data lake.

• Analyzing and Optimizing Hive/Spark ETL Production Jobs to save SLA.

• Experience in Data Modeling and Architecturing for business requirements.

• Experience in using HUE for data analysis and validations.

• Experience in working Compacting data in HDFS to reduce small files burden on cluster.

• Experience in working with SQL, Hive, Spark SQL, and shell scripts to support User/Customer requirements.

• Experience in optimizing hive and spark jobs.

• Involved in loading data from UNIX/Linux file system to HDFS.

• Experience in trouble shooting performance issues using CM (Cloud Manager)

• Experience in working Cluster migration from Cloudera (5.2, 6.1)-CDH to Cloudera (7.1)-CDP

• Experience in monitoring Cloudera services using CM (Cloudera Manager)

• Experienced with batch systems scheduling and processing using Control M, Tivoli, Autosys etc.

• Worked on Hive for further analysis and for generating/transforming files from various analytical formats to text files.

• Change Management, working on production code deployment process involving CRQ, work order, tasks in BMC remedy tool.

• Worked on building monitoring tools which can help improve cluster performance.

• Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

• Responsible for handling multiple user tickets and supporting their requirements.

• Working closely with Project Managers for the release and carried deployments and builds on various environments using continuous integration tools. Created Hive queries that helped market analysts spot emerging trends by comparing incremental data. Environment: Cloudera - CDH/CDP, HDFS, Hive, Spark, Pyspark, Oozie, Sqoop, Impala, YARN, Pepper data tool (cluster monitoring tool), Splunk (to monitor logs), Unix/Linux, Shell scripting, Autosys, GitHub, Jenkin and Ansible.

March 2021 to June 2022 AMFAM, Madison, WI.

Hadoop Developer.

Client: AMFAM, Madison, WI.

Description:

American Family Insurance (AmFam) is a private mutual company that focuses on property, casualty, and auto insurance, but also offers commercial insurance, life, health, and homeowner’s coverage, as well as investment and retirement-planning products. Roles and Responsibilities:

• Developed various ingestion scripts using Sqoop into Hive and HDFS for daily ingestions.

• Involved in the Analysis, Design Phase and requirement gathering for data ingestion.

• Developed streaming & batch applications using Apache spark using Scala and Python.

• Working closely with business team for requirement gathering/ Feature model validation.

• Creating complex data frames, transformation and writing attributes as in parquet files.

• Writing configuration files which are input to the spark framework.

• Created reusable template for creating configuration files which is input for Spark framework.

• Writing complex hive SQL to analyze data and validate transformations.

• Written HQL’s by applying the transformation/Business rules.

• Experience in trouble shooting performance issues using CM (Cloud Manager).

• Used Apace Drill/ Drill explorer for querying HDFS data for analysis and validate output files.

• Identified areas of improvement in existing business by unearthing insights by analyzing vast amount of data using Hive and Pyspark.

• Analysis of various sources and consolidate all in data foundary.

• Building feature data model and by using Pyspark which is input for data science team to build chatbot etc.

• Expert in building Hive ETL scripts in optimized ways using various techniques.

• Worked in different types of data file formats like Parquet, ORC, and AVRO.

• Automated daily jobs in Hadoop using Apache oozie with optimized ways.

• Importing and exporting structured data from different relational databases into HDFS and Hive using Sqoop.

March 2019 to Feb 2021 Client: Marico

Hadoop Developer

Roles and Responsibilities:

• Responsible for work with architects to the translation functional and technical requirements into detailed architecture and design.

• Comparing the results of the traditional system to the Hadoop environment to identify any differences and fix them by finding the route cause.

• Responsible for validation of transactional and profile data from RDBMS which are transformed and loaded to Data Lake using Hadoop Bigdata technologies.

• Automated the process to copy files in the Hadoop system for testing purposes at regular intervals.

• Actively involving in monitoring the server’s health using the Splunk Monitoring and Alerting tool.

• Develop Map-Reduce and Hive Scripts for pulling the data into the Hadoop environment, validate it with the data in the files and reports.

• Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala. Spark scripts by using Scala shell commands as per the requirement.

• Processing the schema oriented and non-schema-oriented data using Scala and Spark.

• Developed and designed system to collect data from multiple portals using Kafka and then process it using spark.

• Worked as SME for Java, Big Data, and Splunk technologies.

• Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.

• Developed Oozie workflow engine to run multiple Hive and Spark Jobs.

• Good experience in troubleshooting production-level issues in the cluster and its functionality.

• Ingested structured data into appropriate schemas and tables to support the rules and analytics.

• Developing custom User Defined Functions (UDF's) in Hive to transform large volumes of data with respect to business requirement.

• Developing Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.

• Responsible for building scalable distributed data solutions using Hadoop.

• Implemented scripts for loading data from the UNIX file system to HDFS.

• Worked on tuning the performance Pig scripts.

• Mentored analyst and test team for writing Hive Queries.

• Installed Oozie workflow engine to run multiple Map-Reduce jobs.

• Load and transform large sets of structured, semi-structured, and unstructured data.

• Analyzed large amounts of data sets to determine the optimal way to aggregate and report on it.

• Built Spark Scripts by utilizing Scala shell commands depending on the requirement. March 2016 to Feb 2019 Client: ERICSSON

Software Tester

Roles and Responsibilities:

• Requirement gathering and analysis.

• Preparation of RTM, mapping URS and FRS with the Test cases.

• Preparation of test data mining for the assigned modules.

• Running the jobs/Workflow for ETL process

• Prepared and ran SQL queries.

• Designed & executed test cases.

• Verifying the data in target database after ETL process

• Performed column data mapping between source & target database Defect Analyzing and Reporting in QC.

• Interactions with BA & Dev. teams to resolve the issues. Sep 2012 to Feb 2016 Value IT Labs, Hyderabad, India. Software Developer

Roles and Responsibilities:

• Actively participated in Object-Oriented Analysis Design sessions of the Project, which is based on MVC architecture using Spring Framework.

• Implemented Object-relation mapping in the persistence layer using a hibernate framework in conjunction with spring functionality.

• Generated POJO classes to map to the database table.

• ORM tool Hibernate to represent entities and fetching strategies for optimization.

• Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.

• Written SQL queries and stored procedures for the application to communicate with Database.

• Used Junit framework for unit testing of the application.

• Used Maven to build and deploy the application.

• Created continuous integration builds using Maven and SVN control. Education

• Bachelor’s in computer science engineering, JNTU University May-2012

Contact this candidate