Quality Assurance Big Data

Location:

Posted:

November 13, 2024

Resume:

Professional summAry

Over *+ years of experience in SDLC Design, Development, Analytics, Quality Assurance, Build and Release, Integration, Support and Maintenance of large scale enterprise applications.

Excellent experience on Data Ingestion, Analysis, Governance, using tools and Big Data technologies – Hadoop HDFS, Map-Reduce, Pig, Hive, Sqoop, Kafka, Airflow,Flume, HBase, Zookeeper, REST,YARN, Scala, Cassandra, CockRoach, Spark, RDD, Streaming, Detaiku,Dataware house, Impala, Oozie, Weblech, Talend

Excellent experience on cloud environment - AWS, Cloudera/ Horton Nws, Google, and Azure for Big data.

Extensive experience on writing scripts using Scala, Flink, Shell script, Python, SQL, MapR, Pig Latin applications.

Experience on working ETL tool – Talend, Informatica,Snowflake,Data Lake and Data Warehouse and Data Marts, Snowspark,Pyspark,Scala, Java

Experience in working on version control systems like CVS, Subversion and GIT, SVN, putty, Git Bash, GitHub.

Excellent experience on multicloud AWS,Gcloud,Azure using Terraform, HCL, Ansible,Play books, Docker, Kubernates, Kubectl, pods,deploy environment configurations

Deep experience with data schema architecture and ETL design, Build & Scale existing Cloud Infrastructure, performance tuning

Modern data science methodologies including, but not limited to classification algorithms, natural language processing, and assessment of biases and uncertainty in those methods

Experience on AWS - (EC2, S3, RDS, Cloud-watch, DMS, Security Services,Lambda, ECS,Glow, CLI/sdk,EBS,EMR, Glacier, EC2, IAM, CDK, Redshift,VPC, NAT, ACL, DNS, Proxies, Firewalls, VPC Endpoints,Open Search, Direct Connect, VPN etc. ), Azure - (Data factory, ADF, Functions,Synapse,Notebooks, Databricks, Delta, Cosmos), GCP – (BigQuery,CLoud Storage, Dbs, Functions, gutil,gcloud,CLI,sdk, Console, Dataproc)

Expertise multicloud using Terraform, HCL, Docker, Airflow,Kubernates, Kubectl, pods,deploy environment configurations, install software.

Good Experience on Caching, Object and Block Storage, Scaling, Load Balancing, CDNs, Networking features

Having work experience in support of multi platforms like Ubuntu, Fedora and Windows for production, test and development servers.

Excellent experience on Python libraries - Oops, API, text, Pandas,NumPy, SciPy, MatplotLib, SeaBorn, NLP, Tensorflow, PyTorch, etc

Good Understanding on Modern data science methodologies including, classification algorithms, natural language processing

Good understand on ML models, statistical methods and research design such as linear, logistic, and conditional regression modeling, parametric and non-parametric statistics, cross-sectional and longitude

Experience on create XML, ANT, Shell, Ruby, Go, YAML,Real time, Perl,Iceberg,Hudi, Power Shell, HTML, DHTML, JSP and Java Scripts.

Good exposure on Application servers- Tomcat, Apache, Nginx, Weblogic, IBM Websphere, JBoss, Jetty, IIS.

Experience on writing PL/SQL, packages,T-SQL, Stored procedures, modeling, functions and complex SQL queries for databases - Oracle, MySQL, Sybase, DB2, Teradata, PostGre.

Extensive experience on J2EE, Java, Web services,SpringBoot, Micro Services,SOAP, REST, EJB, JMS and Spring Cloud, security.

Application integration with ides like Eclipse, RAD, WSAD, WTX, JDeveloper and deployed to application servers

Experience on Finance,Insurance, Banking, Health-care,Transport,Telecom, Accounting Domains

Extensive knowledge on Construction domain

Experience on methodologies and Architectures - SOA, Oops and Agile -Scrum, Kanban, Lean.

Excellent Problem solving, communication and interpersonal skills

SKILL SET:

Operating systems

Linux(Ubuntu,RedHat, Fedora, Cent OS), Unix, Windows, Ms DOS

Cloud Environment

Cloudera, AWS,EC2, ECS, RDS Lambda, DynamoDB, Cloud-Watch,Azure,Worked on Azure Data factory, Synapse, Cosmos, Datalake, Bigquery, GC-Stack

BIGDATA/ETL

Java, Python,R,MongoDB, Hadoop, Map Reduce, Pig, Sqoop, Flume, HDFS, Hive, HBase, Storm, Spark-MLIB, Streaming, Webleich, Kafka, Cassandra, Talend Studio, Power BI,Hudi, MS Office - Excel,Word,PowerPoint, MS Access,Iceberg,Airflow,

Python libraries - Oops, API, text, Pandas,NumPy, SciPy, MatplotLib, SeaBorn,Tensorflow, PyTorch

J2EE

SpringBoot, MicroServices,React, Struts, Spring, Hibernate,EJB, Node

Automation Tools for Continuous

(Integration, Delivery, Deployment)

Terraform, Puppet, Chef, Ansible, Docker, Kubernetes, Jenkins, GitHub

Application servers

Weblogic,Tomcat, Websphere, OC4J, JBoss, WESB

Databases

Oracle, DB2/UDB, Sybase, SQL Server, MariaDB

Design tools

OOAD, Design Patterns, UML, Talend Studio,MYSQL, Teradata

SCRIPTS

JSON, YML, Java Script, XML, HTML5, CSS3, Perl, Ajax, PHP, Python, XSL, XSLT, DOM, SAX,Python- SQLlite3,Pandas, NTDP, OOPS, Tensor, Datasets, Matplot

Professional DETAILS :

Project: Repave

Client: JPMC,Plano TX

Period: Jan '2021– till date

Role: Sr. Technical consultant

Responsibilities:

Collaborated with stakeholders, dependency teams, team members for requirements, issues, project dependencies

Worked on Relational and Non-Relational databases architect, design, modeling, migrate SQL and NoSQL data from on-prem to Cloud

Worked on AWS services – Cloud Formation Templates, EMR, Bitbucket, Artifactory,Jenkins, CloudWatch, EC2, DynamoDB, NoSQL, Airflow, MongoDB, RDS, AWS Glue, Athena, Step wise, serverless, ECS, S3- Standard, Glacier, Aurora DB, Lambda, SQS, SNS, API gateway, Route 53, Kinesis, CloudFront, VPC, NAT, ACL, AWS CDK/CLI, EKS

Worked on Java, Python,Scala, Scoop, Spark,GIT, Shell script, YML, JSON, Hive, HDFS,Teradata, Kafka, ETL tools Informatica IICS, Talend,Snowflake

Worked on the services like Caching, Object and Block Storage, Scaling, Load Balancing,aggregation

Worked on scripting with Python libraries - Oops, API, Text, Pandas, Gen2, NumPy, SciPy, MatplotLib, data partition, building data pipelines, machine learning models, dashboards, Spark, Text files, CSV and JSON.

Worked on ETL/ELT architecture, ETL operations, performance query optimizer techniques – Fast data load, Multiload, transformations, cleansing, archiving.

Worked on Data Modeling, DB Performance Tuning, Data Security & Governance, Cloud Integration and Management, Cloud Infrastructure, Automation and DevOps, Data Warehousing and Analytics, Security and Compliance

Develop data transformation processes, including data cleansing, normalization, aggregation, and enrichment to prepare data for analytics and reporting

Worked on environment create, maintain, deploy using Terraform, Docker, Hashicorp scripts,Terradata utilities, pipelines, registries,images

Worked on Create and maintain data models, define data structures, relationships and data storage requirements, using techniques like entity-relationship and data flow diagrams

Worked on identify and resolve performance bottlenecks in data processing and storage systems, optimizing query performance and improving overall data pipeline efficiency

Worked on Monitor data pipelines, diagnosing and troubleshooting issues, performing system upgrades and maintenance tasks to ensure data reliability and availability

Build automate data pipelines, maintain pipelines, identify existing data gaps and provide automated solutions to deliver analytical capabilities and enriched data to applications

Track and effectively communicate sprint/release progress to all affected teams and management

Worked on multicloud AWS,Gcloud,Azure using Terraform, HCL, Ansible,Play books, Docker, Kubernates, Kubectl, pods,deploy environment configurations, install software and patches, create clusters, instances, subnets, databases and load data into the target database.

Worked on Teradata utilities, enhance Teradata performance by using S3 for staging data or integrating with Redshift for analytics

Worked on Tablue to analyse the performance, forecasting, Bi weekly reports, Bar charts, monthly performace

Developed several complex Teradata, relational SQL queries, PL/SQL stored procedures, jobs, packages, procedures, functions

Environment : Shellscript,Jira,Excel,CloudFormation, Lambda, Stepfunctions,Teradata, YML, EMR, Bitbucket, Jenkins, CloudWatch, EC2, RDS, Terraform, Docker, EKS, DynamoDB, MongoDB, RDS, Talend,Gitlab, AWS glue, Athena, Avro, Parquet, Data pipelines, Hive, HDFS, PostgreSQL, Devshell, Python, Kafka,Tablue, Oracle, Tomcat, Java11, SQL, ServiceNow, Spark, PL/SQL

Project: LIBOR Transition

Client: CITI group, Tampa,FL

Period: Jan' 2020 to Nov' 2020

Role: Bigdata engineer

Responsibilities:

Worked on Data requirements analysis, ingestion, creation, manipulation, transformation,deployments

Worked on spark scripts, Shell scripts, creating DDL, DDF, schemas,SQL scripts, aggregations.

Worked on Databricks, Delta tables, workflows, analytics, Notebooks, pyspack,pipelines,deploy,cache

Worked on data requirements – Data ingestion, data analysis, data processing, develop code, deploy to pipelines, monitor pipeline performance

Created the VPC, configured the subnets, attached the gateway and routing tables to the subnets and deployed the EC2 instances in the subnets created.

Worked on AWS - CloudFormation templates, YML, Data Vault, Bitbucket, Artifactory,Jenkins and CloudWatch, EC2, RDS, Terraform, Docker, K8, Ansible, AWS glue,Athena, Step wise,serverless, deployment

Worked on Gcloud, dataproc, cloud storage,BigQuery, Messaging, Gutils,Gcloud, Linux platforms, storage, IAM,Security,Roles, Polacies

Worked on the services - Caching, Object and Block Storage, Scaling, Load Balancing, CDNs, Networking

Worked infrastructure Asset Management initiatives including webservers, Database servers

Create and maintain data models, defining data structures, relationships, and data storage requirements, using techniques like entity-relationship diagrams and data flow diagrams

Develop data transformation processes, including data cleansing, normalization, aggregation, and enrichment, to prepare data for analytics and reporting

Implement data quality assurance processes, validating data pipelines, and resolving data quality issues

Worked on multiple Warehouses, Dblakes, databases, file formats especially with SnowFlake, BigQuery, PostgreSQL, Parquet and Avro

Worked on Python, Scala, Spark real time, Map Reduce, Spark transformations, Spark RDD's, Spark streaming, Spark SQL using Python - memory management, performance management, parallelism

Worked on CITI specific tools like Autosys, DTS - aggregations, joins and testing.

Deployed code to Dev, Staging, SIT and test using shellscript and spark scripts.

Worked on creating Sqoop, Hive, External tables,manage tables, DDL, DML, aggregations deploy to Bit bucket.

Created Avro schema, spark code, Bit bucket maintenance, trigger Jenkins content.

Talend to Design, build, and test data integration jobs for large-scale data-intensive applications

Worked on spark jobs, improve performance, changing code and options for submit as per need.

Environment: Python,spark, Shell script,Azure SynapseDB, AWS, SNS,SQS, Lambda,EC2, S3, Route53, Snowflake, RDS, Rest API, JAVA11, SpringBootAPI, Terraform, Docker, Hashicorp, Maven,Hadoop2, Sqoop, Hive,DMS, Pycharm, Spark, Oozie, HBase,Impala,Cloudera, SpringBoot, Arcadia, SQL, Linux, RedHat, MySQL, YARN, NOSQL,Oracle, Bit Bucket, Eclipse4.5, GIT, Tectia, Shell, ETL, Oracle, Jenkins.

Project: Loadplan Builder

Client: FedEx, Pittsburgh,PA

Period: Feb' 2019 - Dec' 2019

Role: Sr. Software Consultant/Hadoop Consultant

Description: This application gives information about Aircraft Maintenance like the regions of Aircraft, Airport stations, Workers, Employee Maintenance, Employee timings, Airport Gates, Departments, Employee shifts, Available hours of employees, required hours for the particular work. Payroll for Employees, Aircraft maintenance, Region, gates, Stations. etc. Customer activities, data mining, analysis, filter, move to data pipelines, refine, move to permanent storages. Here with my brief responsibilities as:

Responsibilities:

Worked on implementation of Avro, ORC, and Parquet, text, data formats for computations to handle custom business requirements.

Worked in retrieving transaction data from RDBMS to HDFS, save the output in Hive tables as per user using MapReduce jobs.

Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.

Worked on Hive to create Hive tables, load data, write Hive queries,buckets,map, search, caching

Worked on OOZIE Operational Services for batch processing and scheduling workflows dynamically

Worked on Kafka, Spark streaming, Consumers and Producers, source XML Type columns to dynamically extract, integrate and load data in target schema using Talend

Worked on Spark - memory management, performance management, parallelism,SQL - aggregation, explain plans, joins, sort-merge, nested-loops

Worked on Terrform,Docker,JSON,YAML,K8 to create containers, Pipelines,maintain pipelines, registiry

Worked on Python libraries API,Matplotlib, text, Pandas,Numpy, Mapplot- Libraries, methods, functions, data partition, multi threading, transformations

Worked on Azure Services - Data Lake Storage, Data Factory,SQL, Data Warehouse, Synapse Analytical

Worked on REST Web services, Micro Services development and deployments, testing deployments.

Worked on performance tuning, monitoring and testing configurations and code changes.

Worked on Talend for data models, data formats - XML,JSON, CSV, batch jobs using MapReduce and Spark

Worked on ELK configurations and writing JSON Script, Facets, documents,Update,create,index,search,queries .

Worked on creating auto deployment configurations of cluster using Cloud formation templates for AWS components.

Worked on creating cluster environment and test scale-in and scale-out, load balencing, monitor and troubleshooting.

Environment: Java, AWS, EC2, EMR, S3, Route53, AWS, SNS,SQS, Lambda, RDS,Redshift, version control, Micro Services, Rest WS, JAVA8, Maven,Hadoop2, PIG, Sqoop, ELK, Synapse, Elastic search, Hive, Spark Streaming, PySpark, Oozie, HBase, Python, SQL, Linux, RedHat, Talend, MySQL, YARN, NOSQL, GIT, Shell, ETL, Oracle, DB2, Web sphere AS, Jenkins.

Project: Image Processing

Client: Microsoft, Redmond

Period: Feb' 2018 to Oct' 2018

Role: Sr Technical Consultant

Description: The project basically developed for the security of Microsoft Corporation internal, The aim is to capture the one/stream of human images and their movements - face, hand movements, legs and gestures of body and stored in a file system, and other functions like image comparison, grabbing things, directions.

Responsibilities:

Worked on create software scripts to automate test, staging and production service deployments.

Worked on Jenkins for automating builds and automating deployments maintained with shell script.

Performance tuning and debugging java code database environments in order to ensure acceptable database performance in production mode.

Worked on Cloud formation,stack templates, Auto scaling, loadbalencing to automate and deploy AWS resources and configuration changes

Worked on configuration management tool Ansible for continuous delivery.

Created playbooks for new environments and modified existing plays to provision.

Worked on testing environment, debugging, monitoring, performance tuning, security vulnerabilities.

Worked on TypeScript, HTML, JavaScript, CSS, JSON, JSP pages.

Worked on configure Continuous integration and build process using Jenkins as Continuous integration tool.

Worked on Docker images, compose, containerization.

Leveraged AWS cloud services such as EC2, auto-scaling and VPC to build secure

Environment : Java,Springboot, Rest, Bitbucket,AWS, Kubernetes, Cassandra, Java1.8, MongoDB, NOSQL, S3, RDS, GIT, Linux, Redshift, Ubuntu, EC2, Lamda, Python, Oracle10g, LDAP, Shell Script, Maven, Tomcat WS

Contact this candidate