data engineer

Location:

United States

Salary:

$75.00

Posted:

March 26, 2021

Contact this candidate

Resume:

Kantamani

*************@*****.***

775-***-****

Professional Summary

Over 8+ years of experience as Big Data Engineer/Data Engineer and Data Analyst including designing, developing using Big data & Data technologies.

Pleasant experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.

Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.

Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.

Experience in Text Analytics, Data Mining solutions to various business problems and generating data visualizations using SAS and Python and creating dashboards using tools like Tableau.

Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.

Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.

Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.

Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export using multiple ETL tools such as Informatica Power Centre.

Experience in designing, building and implementing complete Hadoop ecosystem comprising of HDFS, Hive, Pig, Sqoop, Oozie, HBase and MongoDB.

Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.

Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.

Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.

Experienced on Python for statistical computing.

Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop), NoSQL databases like MongoDB, HBase, Cassandra.

Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.

Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.

Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.

Pleasant experience working on analysis tool like Tableau for regression analysis, pie charts and bar graphs.

Skills:

Big Data technologies: HBase 1.2, HDFS, Sqoop 1.4, Hadoop 3.0, Hive 2.3, Pig 0.17, Oozie 5.1

Cloud Architecture: Amazon AWS, EC2, EC3, Elastic Search, Elastic Load Balancing & Basic MS Azure

Data Modeling Tools: ER/Studio V17, Erwin 9.7, Power Sybase Designer.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9/7

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury (Quality Center, Win Runner, Quick Test Professional, Performance Center, Requisite, MS Visio & Visual Source Safe

Operating System: Windows, Unix & Linux.

ETL/ Data warehouse Tools: Informatica 9.6 and Tableau 10.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.

Professional Experience:

Securonix Inc, Dallas, TX Dec 2019 - Present

Sr. Big Data Engineer (AWS)

Roles & Responsibilities

Objective of this project is to migrate all the services from in-house to cloud (AWS). This includes building a data lake as a cloud-based solution in AWS using Amazon S3 and make it a single source of truth.

Provided meaningful and valuable information for better decision-making.

Migration of data includes various data types like Streaming data, Structured data and unstructured data from various sources and also includes legacy data migration.

Utilize AWS services with focus on big data analytics, enterprise data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility,

Designed AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function.

Built NoSQL solution for non-structural data using AWS DynamoDB services

Built data warehousing solutions on analytics/reporting using AWS Redshift service.

Developed Python programs to consume data from APIs as part of several data extraction processes and store the data in AWS S3.

Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.

Developed the code for Importing and exporting data into HDFS and Hive using Sqoop.

Created Hive External tables to stage data and then move the data from Staging to main tables.

Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive.

Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.

Migration of Data from Hive to Presto DB for faster query execution and data retrieval times.

Used Presto in-built connectors for Redshift and Hive to prepare datasets for applying advanced analytics (ML) on certain use cases.

Implemented performance optimizations on the Presto SQL Queries for improving query retrieval times.

Used Query execution plans in Presto for tuning the queries that are integrated as data sources for Tableau dashboards.

Designed of Redshift Data model and working on the Redshift performance improvements that helps faster query retrieval and also improves the dependent reporting/analytics layers.

Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for the certain events based on use cases.

Fidelity, Boston, MA Feb 2018- Nov 2019

Sr. Data Engineer

Roles & Responsibilities

Worked with the analysis teams and management teams and supported them based on their requirements.

Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.

Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.

Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.

Developed live reports in a drill down mode to facilitate usability and enhance user interaction

Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.

Wrote Python scripts to parse XML documents and load the data in database.

Used Python to extract weekly information from XML files.

Developed Python scripts to clean the raw data.

Worked on AWS CLI to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.

Used AWS CLI with IAM roles to load data to Redshift cluster,

Responsible for in depth data analysis and creation of data extract queries in both Netezza and Teradata databases.

Extensive development in Netezza platform using PL SQL and advanced SQLs.

Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.

Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.

Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes

Used Hive and Sqoop utilities and Oozie workflows for data extraction and data loading.

Development of routines to capture and report data quality issues and exceptional scenarios.

Creation of Data Mapping document and data flow diagrams.

Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.

Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.

Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.

Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized Data-Mart reports.

Worked on QA the data and adding Data sources, snapshot, caching to the report.

Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.

DXC Technologies, India Sep 2014 - Dec 2016

Data Engineer

Roles & Responsibilities

Analyzed business information to identify process improvements and made recommendations to modify existing applications to correct errors, upgrade interfaces and query optimization.

Coordinated with clients to gather and review business requirements.

Executed SQL queries to access and analyze data from the relational database for data modelling and reporting.

Exploratory analysis of the data - Identified, cleaned & prepared data to ensure the quality of data and derived business insights.

Extracted data from the legacy system and loaded/integrated into another database through the ETL process.

Wrote complex queries for extensive data validation.

Analyzed business requirements, system requirements, data mapping specifications.

Created ad-hoc reports and dashboards using tableau, Power BI.

Created meaningful data visualizations/interactive dashboards to communicate data findings to business owners using Tableau.

Performed statistical operations like data analytics, data modelling, data interpretation, data visualization to provide business insights to the client.

Tested data mapping & specifications to ensure data format and schemas matched the data warehouse as per the business requirements and if there were any changes in the underlying data.

Initiated and designed enterprise system for program execution and support operations.

Performed ad-hoc analysis and provided relevant KPIs, metrics to management and the business partners which improved business productivity by 25%.

Carried out quality assurance tests, integration testing while discovering errors and optimizing usability.

Led team to discuss the design process, development on any proposed changes or challenges, and outlined schedules to ensure deliverables on time.

Created customer-facing web application to automate data retrieval which allowed users to monitor report status.

Sutherland, India Jul 2013 – Aug 2014

Sr. Data Analyst

Roles & Responsibilities

Analyzed and reported client/customer data using large data sets like transactional and analytical data to meet business objectives.

Responsible for all aspects of management, administration, and support of IBM's internal Linux/UNIX cloud-based infrastructure as the premier hosting provider.

Worked with SQL and performed the computations, log transformations and Data exploration to identify the insights and conclusions from complex data using R.

Used SPSS for data cleaning, reporting and developed efficient and modifiable statistical scenario.

Extracted data from SQL Server using Talend to load it into a single data warehouse repository.

Utilized Digital analytics data from Heap in extracting business insights and visualized the trends from the customer events tracked.

Extensively used Star and Snowflake Schema methodologies.

Worked on working different types of projects like migration projects, Ad-hoc reporting and exploratory research to guide predictive modelling.

Applied concepts of R-squared, R.M.S.E, P-value, in the evaluation stage to extract interesting findings through comparisons.

Worked on the entire CRISP-DM life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering.

Extensively used Azure Machine Learning to set up the experiments and creating Web services for the predictive analytics.

Worked on writing complex SQL queries in performing Data analysis using window functions, joins, improving performance by creating partitioned tables.

Prepared dashboards with drill down functions such as date filters, parameters, actions using Tableau to reflect the data behavior over time.

Client: Avida Technologies, India June 2012 –Jun 2013

Role: Research Analyst

Responsibilities:

Programming socio-economic experiments in ztree (C++); creating and analyzing surveys in Behavery, Qualtrics and responsible for spot modifications and on-call debugging.

Maintain the subject pool database (hroot and MySQL), troubleshooting the lab server (Linux and Ruby-on-Rails).

Automated several manual data reports in MS Excel using Macros, VBA and reduced turn-around times for reporting from days to minutes.

Demonstrated abilities to extract, transform, load (ETL) operational data using SQL queries shortening the average processing time by 10 minutes.

Created financial reports and dashboards using Tableau by using advance functionalities such as animations, interactive charts etc. Accomplished data collecting, cleansing, data modelling, data proﬁling, data queries to analyze the data from diﬀerent sources.

Performed exploratory and descriptive data analysis using MS Excel (Pivot tables, formulas, functions) and reported to the senior management team.

Involved in addressing a wide range of challenging problems using techniques from applied statistics, machine learning and data mining fields. Explored and analyzed the customer specific features by using Matplotlib, Seaborn in Python and implemented dashboards in Tableau.

Speaker at “Essex Summer School 2016 in Social Science Data Analytics” on zTree Programming, A/B Testing and Survey Analytics.

Education:

Masters in Computer Science from University of IL, 2017.

Bachelors in Information Technology from Osmania University,2011.

Contact this candidate