Post Job Free

Resume

Sign in

Data Engineer Sql Server

Location:
Columbia, MO, 65203
Posted:
July 06, 2023

Contact this candidate

Resume:

Prashanth Kumar

Email: adx4zm@r.postjobfree.com Phone: +1-804-***-****

PROFESSIONAL SUMMARY:

Over 8 years of experience in Cloud Computing with strong expertise on DWH domain and AWS platform.

Experience working on Database design, Data modelling, performance tuning and query optimization with large volume of multi-dimensional data.

Good experience on Redshift DB, S3, EC2, EBS, AWS Lambda, Big query on GCP, Compute engine, creating VM, APP engine, Data store, Cloud function etc.

Expertise in implementing automations such as Email analyser and fetching data from campaign manager tool on daily basis using python script.

Hands on experience in implementing many pipelines on Airflow and have good expertise on optimizing the issues related to it.

Lead a team of five from offshore by training them on developing ETL’s and monitoring jobs on Airflow.

Experienced in LAMP (Linux, Apache, MySQL, and Python/PHP) and WAMP (Windows, Apache, MySQL, Python/PHP) architectures.

Extensive experience working on AWS Cloud Services like IAM, EC2, S3, EBS, AWS Config, AWS Glue, CloudWatch, and RDS.

Proficient in creating multiple VPCs and public and private subnets as per requirement and distributing them as groups into various availability zones of the VPC.

Good knowledge in writing Map reduces jobs through Pig, Hive, and Sqoop.

Experience in moving data between cloud and on-premises Hadoop using DISTCP and proprietary ingest framework.

Successfully drove Project ATARI for the client NerdWallet from offshore and managed its total life cycle till it is stabilized

Experience working on NoSQL databases including HBase and Cassandra.

Expertise in developing Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.

Experienced in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries.

Experience in setting up and configuring AWS's EMR Clusters and Used Amazon IAM to grant fine-grained access to AWS resources to users Wrote Pig Scripts to generate Map reduce jobs and performed ETL procedures on the data in HDFS.

Good experience in designing and implementing data structures and commonly used data business intelligence tools for data analysis.

EDUCATION:

Bachelors in Computer Science Engineering at JNTUH, May 2015

CERTIFICATION:

AWS Associate Solution Architect

TECHNICAL SKILLS:

Languages

SQL, Python, C, JavaScript, HTML

Python Libraries

NumPy, Pandas, NLTK, Matplotlib, Seaborn

Python Frameworks

Flask, Django

Big Data Tools

Apache Spark, Apache Hadoop, Scala, Hive, Pig, Flume, HDFS, HBase, Sqoop, Oozie, MapReduce, Spark SQL

Cloud Technologies

Amazon Webservices (AWS), Google Cloud Platform (GCP)

Applications and Tools

Airflow, Jenkins, Gradle, Maven, Git, JIRA, Tableau, Data Warehousing, GitHub

IDEs

Eclipse, IntelliJ, Visual Studio

Databases

MS SQL Server, Oracle, MySQL Workbench, Amazon Redshift, DynamoDB, MongoDB, PostgreSQL

Operating Systems

Windows, UNIX, LINUX

Microsoft Tools

Microsoft Project, Access, Excel, Word, Visio, PowerPoint, SharePoint, Publisher, Outlook, MS Office Suite

Analysis/Methodologies

Agile/Scrum, Waterfall

PROFESSIONAL EXPERIENCE:

AT&T, Dallas, Texas Sep 2022 to Present

Sr. Data Engineer

Responsibilities:

Built Database models, views and APIs using Python for interactive web-based solutions.

Involved in Web-services backend development using Python (Flask, SQL Alchemy).

Used Pandas library for statistical Analysis. Worked on the Python Open stack API.

Wrote Python modules to view and connect the Apache Cassandra instance.

Setup and managed windows servers on AWS platform using IAM, Security Groups, EC2, EBS, RDS.

Setup databases on Amazon RDS and EC2 Instances as per the requirement.

Enabled data pipeline with a combination of s3, Redshift clusters.

Responsible for configuration and maintenance of Amazon Virtual Private Cloud (VPC) resources (e.g., subnets security groups, permissions policies, etc.) and make connections between different zones, blocking suspicious IP/subnets via ACL.

Implemented AWS security, privacy, performance, and monitoring solutions, including automated responses.

Worked with Cloud Engineers to analyse and design all aspects of AWS environments and topologies.

Assessed automation opportunities from an architectural perspective and prioritize proposed solutions.

Responsible for importing data from DynamoDB to Redshift in Batches using Amazon Batch using Airflow scheduler and built CI/CD pipeline using Jenkins.

Developed python scripts for Redshift CloudWatch metrics data collection and automating the data points to the redshift database.

Developed scripts for loading application call logs to S3 and used AWS Glue ETL to load into Redshift for the data analytics team.

Responsible for performance tuning, code promotion and testing of application changes.

Conducted Exploratory Data Analysis using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.

Developed MapReduce jobs in both PIG and Hive for data cleaning and pre-processing.

Installed and configured Hive, Flume and Oozie on the Hadoop cluster.

Developed Sqoop scripts for loading data into HDFS from Oracle and pre-processed with AWS DMS.

Automated the tasks of loading the data into HDFS and pre-processing with Pig by developing workflows using Oozie.

Loaded data from UNIX file system to HDFS and written Hive User Defined Functions.

Used Sqoop to load data from Oracle to HBase for faster querying and performance optimization.

Worked on streaming to collect this data from Flume and performed real time batch processing.

Developed Hive scripts for implementing dynamic partitions.

Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.

Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using testing library.

Collected the logs data from web servers and integrated in to HDFS using Flume.

Worked on developing ETL Workflows on the data obtained using Scala for processing it in HDFS and HBase using Oozie.

Wrote ETL jobs to visualize the data and generate reports from MySQL database using DataStage.

Environment: Python, Pandas, Matplotlib, Seaborn, Flask, AWS IAM, Security Groups, EC2, EBS, RDS, MapReduce, Sqoop, Hadoop, Oozie, HDFS, Hive, Flume, Pig, HBase, Scala, DataStage, MySQL, SQL Alchemy, Apache Cassandra, DynamoDB, Jenkins, Unix

Eli Lilly, Indianapolis, IN Aug 2021 to Aug 2022

Software Engineer

Responsible for developing, monitoring, and maintaining Amazon AWS cloud infrastructure and its solutions for the internal cloud-based framework application.

Used Django Framework to develop the application and build all database mapping classes using Django models.

Wrote Python scripts to parse XML documents and load the data in the database.

Utilized PyUnit, the Python unit test framework, for all Python applications and used Django configuration to manage URLs and application parameters.

Designed and developed the UI of the website using HTML, CSS, and JavaScript.

Wrote functional API test cases for testing REST APIs with Postman and integrate with Jenkins server to build scripts.

Created methods (get, post, put, delete) to make requests to the API server and test Restful API using Postman.

Troubleshoot issues seen in API gateway and configured cloud native API gateways and ingress controllers.

Designed, built and edited Spark jobs in PySpark for various tasks, ex - reading kinesis stream using Spark Streaming, joining, and aggregating huge data sets, to integrating with third party data sources.

Created Python scripts to automate daily networking tasks like auto-config generation, retrieving information from network devices, parsing data, etc.

Generated Python Django Forms to record data of online users.

Used Amazon EC2 command-line interface and Bash/Python to automate repetitive work.

Used IAM for creating roles, users, and groups and also implement Multi-Factor Authentication (MFA) to provide additional security to AWS accounts and its resources.

Created S3 buckets/manage policies on S3 buckets and utilize S3 Glacier storage for archival and backup data and provided AWS Service Support.

Configured S3 buckets with various life cycle policies to archive the infrequently accessed data to storage classes based on requirements.

Environment: Python, Django, AWS, EC2, S#, IAM, MFA, Spark, PySpark, PyUnit, XML, HTML, CSS, JavaScript, REST API, Postman, Jenkins

JP Morgan Chase, New York, NY Jan 2020 to Jul 2021

Data Engineer

Responsibilities:

Participated in software and system performance analysis and tuning, service capacity planning and demand forecasting.

Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.

Worked and maintained Pipelines and scripts for migration to Redshift from SQL Server and MySQL databases.

Responded to resolve emergent service problems, design tools and automation to prevent problem recurrence.

Worked on credit card team for volumes and monthly transactions.

Worked on volume reports and usage of credit cards.

Wrote and automated tools and scripts to increase departmental efficiency and automate repeatable tasks.

Managed end-to-end complex data migration, conversion and data modeling.

Delivered innovative BI & Data warehouse solutions.

Created AWS Lambda architecture to monitor AWS S3 Buckets and triggers for processing source data.

Worked with standard Python data science packages (Pandas, NumPy, SciPy) and methodologies

Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for specific events based on use cases.

Created and launched EC2 instances using AMIs of Linux and Windows and write shell scripts to bootstrap instances.

Used AWS Beanstalk for deploying and scaling web applications and services.

Responsible for Continuous Integration and Continuous Delivery process implementation using Jenkins and Python, and Shell scripts to automate routine jobs.

Used Python programming and Django for the backend development, Bootstrap and Angular for frontend connectivity, and MongoDB for the database.

Worked on AWS Glue objects such as Jobs, Tables, Crawlers, Connections and Workflow AWS Glue ETL scripts

Connected continuous integration system with GitLab version control repository and continually build as the check-in’s come from the developer and performed Continuous Integration with Jenkins.

Implemented full CI/CD pipeline by integrating SCM (Git) with automated testing tool Gradle & deploy using Jenkins and Dockized containers in production and engage in a few DevOps tools like Ansible, Chef, AWS CloudFormation, AWS Code Pipeline, EKS, ECR, Kubernetes and infrastructure automation tools (Terraform).

Used SonarQube for continuous inspection of code quality.

Wrote SQL queries implementing functions, triggers, cursors, object types, sequences, indexes, etc.

Worked in Agile/Scrum development team to deliver an end-to-end continuous integration and deployment in SDLC.

Participated in weekly release meetings with stakeholders to identify and mitigate potential risks associated with the releases.

Environment: Python, Django, Pandas, NumPy, SciPy, AWS S3, Lambda, RedShift, AWS Glue, Docker, SonarQube, Git, Gradle, Angular, SQL Server, MongoDB, MySQL, DynamoDB, Linux, Windows, Jenkins

Deloitte, Hyderabad, India Oct 2017 to Nov 2019

Technical Consultant

Responsibilities:

Project 1:

Worked as Data Engineer, designed data flow to migrate data pipelines from Jenkins to Airflow.

Helped the teams that need any design support or help with AWS and GCP infrastructure related queries.

Developed Python scripts to find vulnerabilities with SQL Queries by doing SQL injection.

Connected to Amazon Redshift through Tableau to extract live data for real time analysis.

Implemented Workload Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer running adhoc queries. This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.

Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.

Implemented and managed ETL solutions and automating operational processes.

Optimized and tuned the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.

Collaborated with team members and stakeholders in design and development of data environment

Prepared associated documentation for specifications, requirements, and testing.

Led a team of 8 people and guided them on designing the pipelines in python.

Acted as a communication bridge between the team and client handled all the scrum meetings to assign the tasks within the team.

Environment: Python, Amazon Webservices (AWS), Amazon RedShift, Google Cloud Platform (GCP), Tableau, SAS Visual Analytics, SQL

Project 2:

Developed Airflow and worked with the core community to understand and process new features to promote our custom product.

Installed many features in Airflow as per the vendor requests and some based on the feedbacks of the platform.

Extracted data from various paid marketing platforms and modelled the data effectively, loaded into Redshift, scheduled the jobs using Airflow (a tool with python as background).

Used DevOps tools extensively and responsible for maintaining the DevOps process which includes setting up the architecture for the first time, automating at possible corners etc.

Environment: Python, Airflow, DevOps, Jenkins

NerdWallet, Hyderabad, India Jun 2015 to Sep 2017

Cloud Data Engineer

Responsibilities:

Worked on creating ETL'S for the affiliate transactions and revenue generation

Received data from different sources in different formats which will be converted into Redshift compatible JSON format using Pandas in Python.

Ingestion of new partners into framework 'ATARI' (internal framework) i.e., Affiliate Transactions and Revenue Ingestion.

Developed insert scripts using SQL for ingesting the data into Redshift DB. Used COPY command to load the data after data munging using python. Developed Python scripts for integrating the transform scripts written in SQL

Declared and used python parameters for table names, schema names to stick on to DRY principles. Developed Shell scripts for invoking the Python Scripts, DQ scripts and SQL scripts as per the need. Airflow: Build the ETL pipeline using SSH Operator to load the data into Redshift database.

Connected Tableau with Airflow to generate reports.

Project on GCP:

Worked on Prediction analysis and lead generation for the client websites.

Data analytics using Big Query. Data cleansing and ingestion by using different services based on the scenarios like considering the cost and pre-processing techniques.

Features can be obtained by big query analytics through creating the conversion and history windows before creating a data model.

Data preparation by verifying the History data and performing, Feature engineering, weighted conversion contribution.

Set up infrastructure such as VPCs, DNS and load balancers, administrators.

Automations:

Generated an automation script for organization emails for calculating their response time and setting the alerts for the base camp tasks and building the pipeline for the data to be dumped into the Big Query and creating Dashboard queries for displaying the metrics.

Created an automation script for fetching the data through the campaign manager API and appending the data into a Google Sheet.

Environment: Python, Pandas, Amazon Webservices (AWS), RedShift, Google Cloud Platform (GCP), Airflow, Tableau, Big Query, VPCs, DNS, JSON, SQL



Contact this candidate