Post Job Free
Sign in

Data Analyst Engineer

Location:
Dallas, TX
Posted:
March 18, 2025

Contact this candidate

Resume:

Harika K

*************@*****.***

+1-940-***-****

www.linkedin.com/in/harika-k-14a664329

Senior Data Engineer/Big Data Analyst/Data Analyst/Data Architect

PROFESSIONAL SUMMARY:

Senior Cloud Data Engineer with 10+ years of IT Experience in building data pipelines in all phases (Design, development, and Testing).

Expertise in Dimensional Modelling, Data Analysis, Data Integration (ETL/ELT using AWS and Azure data services), and Data modeling.

Experience in building ETL Data pipelines to cleanse, standardize, ingest, transform data, and enrich the database using Pyspark, Databricks, AWS, and Azure cloud IAAS and PAAS services.

Worked on developing and migrating Analytics, data warehouse, data stores, and third-party systems.

Experienced in Python, Spark, Java, Rest API, Azure Data Factory, Databricks, Azure dev ops, Azure synapse analytics (SQL data warehouse), Hive, AWS S3, AWS Glue, AWS Lambda, AWS RDS and Azure Functions.

Involved in setting up AD, ACLS, and service principles for security.

Used ETL/ELT like Informatica and Oracle to develop data pipelines for extracting, cleaning, transforming, and loading data into a data warehouse using Pyspark.

Expertise in Hadoop ecosystem components such as Spark, HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Flume, Oozie, Impala, Zookeeper, Hive, NiFi, and Kafka for scalability, distributed computing, and high-performance computing.

Excellent understanding of Hadoop architecture, Hadoop daemons, and various components such as HDFS, YARN, Resource Manager, Node Manager, Name Node, Data Node, and MapReduce programming paradigm.

Implemented a proof of concept of deploying the products in AWS S3 Bucket and Snowflake.

Good understanding of Apache Spark, Apache Flink, Kafka, Storm, Nifi, Talend, RabbitMQ, Elastic Search, Elastic stack, Apache Solr, Splunk, and BI tools such as Tableau.

Worked on Import and Export of data using Sqoop from RDBMS to HDFS.

Have good knowledge of Containers, Docker, and Kubernetes for the runtime environment for the CI/CD system to build, test, and deploy.

Hands-on experience in setting up workflow using Apache Airflow and Oozie workflow engine for managing and scheduling Hadoop jobs.

Hands-on experience migrating on-premise ETLs to Google Cloud Platform (GCP) using cloud-native tools such as BigQuery, Cloud Data Proc, Google Cloud Storage, and Composer.

Highly knowledgeable in developing Data Marts in the big data world in Bigquery or on-premises Hadoop clusters.

Used packages like Numpy, Pandas, Matplotlib, and Plotly in Python for exploratory data analysis.

Hands-on development experience with RDBMS, including writing complex SQL scripting, Stored procedures, and triggers.

Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.

Senior-level expertise in AWS Glue and Python, specializing in building and optimizing ETL pipelines.

Hands-on experience in Sequence files, Combiners, Counters, Dynamic Partitions, and Bucketing for best practice and performance improvement ETL.

Experienced in Informatica ETL developer role in Datawarehouse projects, Enterprise Data Warehouse, OLAP, and data modeling.

Hands-on experience with Apache Hudi, including implementing Change Data Capture (CDC) for real-time data accuracy.

Proven track record in migrating databases from Teradata to Amazon Redshift, optimizing data architecture for cloud environments.

Experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, CloudWatch, SNS, SES, SQS, and other services of the AWS family.

Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage. Capable of using AWS utilities such as EMR, S3, and cloud watch to run and monitor Hadoop and spark jobs on Amazon Web Services (AWS).

Proficient in automating and orchestrating workflows with AWS Lambda and AWS EventBridge, enhancing operational efficiencies.

Exposure to DevOps technology like Chef, Puppet, Ansible, Jenkins, and Ansible Tower.

Experience in software methodologies like Agile and waterfall models.

Capable of using Amazon S3 to support data transfer over SSL and the data gets encrypted automatically once it is uploaded.

Ingested data from different sources like Oracle, Teradata, and SQL servers.

Utilized ODBC for connectivity to Teradata to retrieve automatically from Teradata Database.

Proficient in SQL databases MS SQL, MySQL, Oracle, and NoSQL databases MongoDB, Graph DB.

Designed and developed spark workflows using Scala for data pull from AWS S3 Bucket and Snowflake applying transformations on it.

TECHNICAL SKILLS:

Big Data Technologies

HDFS, MapReduce, Flume, Sqoop, Hive, Morphline, Kafka, Oozie, Spark, Nifi, Zookeeper, Elastic Search, Apache Solr, Cloudera Manager, R Studio, Confluent

NoSQL

HBase, Couchbase, Mongo, Cassandra

Machine Learning

Decision Tree, LDA, K-NN, K-Means, Neural Networks, ANN & RNN, PCA, SVM, Deep Learning

Programming

SQL, Python, C++, Shell scripting, R

Databases

Oracle, DB2, MS-SQL Server, MySQL, MS-Access, Teradata

Cloud Technologies

AWS, Azure

Build Tools

Maven, Scala Build Tool (SBT), Ant

Code Repository Tools

GitLab, SVN – Tortoise, TFS, StarTeam

Operating System

Windows, Sun-Solaris, Red Hat Linux, Ubuntu

Application Servers

Apache Tomcat, JDBC, ODBC

BI Tools

Power BI, Tableau, Talend

Other tools

Visual Studio, MS Office, WinSCP, and Putty

EDUCATION:

Bachelor of Technology in Mechatronics Engineering

Mahatma Gandhi Institute of Technology, Telangana, India – 2013

CERTIFICATIONS:

AWS Certified Solutions Architect – Associate

AWS Certified Data Engineer – Associate – June 2024

PROFESSIONAL EXPERIENCE:

CVS Health, Dallas, TX Oct 2022-Present

Sr. Data Engineer

Responsibilities:

Involved in SDLC Requirements gathering, Analysis, Design, Development, and Testing of applications using Agile Methodology.

Designed and optimized batch and near-real-time ETL/ELT pipelines using Azure Data Factory and Databricks. Responsible for the execution of big data analytics, predictive analytics, and machine learning initiatives.

Created Linked Services for multiple source systems (i.e., Azure SQL Server, ADLS, BLOB, Rest API).

Created a Pipeline to extract data from on-premises source systems to Azure cloud data lake storage, and implemented an Error Handling concept through copy activity.

Exposure to Azure Data Factory activities such as Lookups, Stored procedures, if condition, for each, Set Variable, Append Variable, Get Metadata, Filter, and Wait.

Utilized Azure Synapse Analytics and ADLS for scalable data storage and transformation, ensuring data integration from multiple sources. Optimized existing data pipelines for performance, scalability, and reliability.

Configured and implemented the Azure Data Factory Triggers and scheduled the pipelines; monitored the scheduled Azure Data Factory pipelines and configured alerts for pipeline failures.

Worked closely with Analysts and Senior Architects to support the re-architecture of the Analytics Engineering layer, ensuring efficiency and scalability.

Developed, maintained, and optimized ETL workflows using Apache Airflow and DBT for business analytics and data processing.

Designed and implemented scalable, high-performance database solutions using Snowflake and Amazon Redshift.

Applied indexing, partitioning, and query-tuning techniques to optimize SQL queries and improve database performance.

Ensured data consistency and integrity by integrating multiple data sources into a centralized data platform.

Assisted in defining data governance policies, data lineage tracking, and metadata management for compliance and quality assurance.

Evaluated and recommended modern data infrastructure tools, including data cataloging and monitoring solutions.

Collaborated with cross-functional teams to align data models with business requirements and advanced analytics use cases. Implemented delta logic extractions for various sources using control tables.

Implemented event-driven cloud architectures to support scalable data applications on Azure, leveraging Azure Event Grid and Service Bus. Utilized Azure DevOps and other Microsoft Azure features for existing business applications.

Efficient in writing Python scripts to build ETL pipelines and Directed Acyclic Graph (DAG) workflows using Airflow.

Prepared ETL test strategy, designs, and test plans for ETL and BI systems.

Build visualization and Dashboards using Kibana.

Worked on migration of data from On-prem SQL Server to Cloud databases. Enhanced data pipelines by integrating Google BigQuery, writing optimized SQL queries for large datasets, and performing ETL transformations for reporting needs.

Monitored and fine-tuned ETL processes to handle increasing data volumes and reduce processing times.

Experience in scheduling jobs using Ansible Tower. Leveraged GCP tools like BigQuery and Cloud Storage for efficient data extraction, modeling, and analysis.

Automated delivery pipeline using GIT, Jenkins, Ansible, and Ansible Tower.

Experience writing T-SQL (DDL, DML) and developing new database objects such as tables, views, stored procedures, and triggers.

Deployed codes to multiple environments using the CI/CD process and resolved code defects during SIT and UAT testing.

Responsible for automated identification of application server and database server using Ansible and managing them in a production environment.

Wrote several Teradata SQL queries using Teradata SQL Assistant. Created dashboards using Tableau and Power BI to visualize key metrics for customer reporting and meter-related insights.

Used different file formats like Text files, pipe-delimited files, Parquet, and JSON.

Created Teradata objects like tables and views and created multiset and volatile tables in the Teradata database.

Utilized ODBC for connectivity to the Teradata Database.

Created UNIX scripts that use BTEQ to access the Teradata Database.

Played a key role in Migrating Teradata objects into the Cloud Environment.

Writing UNIX shell scripts for processing/cleansing incoming text files and automating logs, backups, and daily checks.

Involved in creating Unix shell scripts for Informatica workflow execution.

Expertise in Reference Data Management (RDM) concepts and methodologies, and building data warehouses using RDM/MDM tools.

Develops and maintains scalable data pipelines and builds new API integrations to support data volume and complexity. • Design data integrations and data quality frameworks.

Responsible for CI/CD process integration using Jenkins along with PowerShell to automate routine jobs.

Developing Spark (Scala) notebooks to transform and partition data in ADLS.

Working on Azure Data bricks to run Spark-Python Notebooks through ADF pipelines.

Using Data bricks utilities called widgets to pass parameters at runtime from ADF to Databricks.

Created Triggers, PowerShell scripts, and parameter JSON files for deployments. Partnered with QA teams, electric quality teams, and leadership to identify data reporting gaps and align data solutions with business goals.

Reviewed work on ingesting data into the Azure data lake and provided feedback based on reference architecture, naming conventions, guidelines, and best practices.

Implemented End-to-end logging frameworks for Data factory pipelines.

Implemented best practices for data extraction, transformation, and loading to improve efficiency.

Environment: Azure Data Factory, Azure Data Bricks, PolyBase, Azure DW, ADLS, Azure DevOps, BLOB, Azure SQL Server, Git, Jenkins, Ansible, Azure DW, Azure DevOps, Informatica, Azure Synapse, Ansible Tower.

Charles Schwab, Westlake, TX Jan 2021- Sep 2022

Sr. Data Engineer

Responsibilities:

Involved in Requirement gathering, Business Analysis, Design and Development, testing, and implementation of business rules.

Managed cloud computing tool AWS and the code in Git (version controlling) and deployed and operated AWS, specifically VPC, EC2, S3, EBS, IAM, ELB, CloudFormation, and CloudWatch using the AWS console and AWS CLI.

Built scalable data pipelines using PySpark on Databricks and optimized performance for large-scale data ingestion.

Worked extensively on Azure tools like ADF and Databricks to deliver both batch and near real-time ETL solutions.

Delivered reliable cloud-based data solutions using Azure services such as ADLS and Azure Synapse, supporting event-driven architectures.

Designed and implemented automated ETL pipelines to support enterprise data warehousing initiatives.

Optimized and tuned queries in Snowflake and Redshift to enhance data processing performance.

Led efforts to improve data integration workflows using AWS Glue and DBT.

Maintained scalable data lakes on AWS S3, supporting structured and semi-structured data ingestion.

Worked closely with cross-functional teams, including Engineering, Data Science, and Business, to improve data accessibility.

Collaborated with business stakeholders to ensure that data engineering solutions met business objectives.

Worked in all areas of Jenkins: setting up CI for new branches, build automation, plugin management, securing Jenkins, and setting up master/slave configurations.

Created projects in OpenShift Console with Quotas for non-prod and prod and troubleshooting OpenShift EFK stack and ELK with LMA for central logging.

Utilized Google BigQuery and GCP ETL tools to streamline data transformations, ensuring timely and accurate data availability for business reporting.

Regularly collaborated with business stakeholders to gather requirements and deliver actionable insights through Tableau and Power BI.

Monitored and ensured data quality across datasets, proactively resolving discrepancies for consistent reporting.

Used Ansible playbooks to set up a Continuous Delivery pipeline, including Jenkins, Sonar server, Maven, and other supporting software.

Expertise in AWS Lambda function and API Gateway to submit data via API Gateway accessible via Lambda function.

Worked on AWS Data pipeline to configure data loads from S3 data to Redshift.

Experience in cloud databases and data warehouses (SQL Azure and Redshift/RDS).

Extensive usage of BI tools like SSIS, SSAS, SSRS, and Power BI.

Excellent experience in database design and dimensional data modeling (star and snowflake schema).

Involved in writing Flink jobs to parse real-time data and then push to Hive.

Worked with GIT to store the code and integrate it into Ansible Tower to deploy the playbook.

Utilized AWS services focusing on big data analytics, enterprise data warehouse, and business intelligence solutions to ensure optimal architecture, scalability, and flexibility.

Designed AWS architecture, cloud migration, AWS EMR, DynamoDB, Redshift, and event processing using the Lambda function.

Developed real-time applications using PySpark, Apache Flink, Kafka, and Hive on a distributed Hadoop Cluster.

Experience in IaaS managing AWS infrastructure with automation and configuration management using Ansible.

Developed microservices using EMR Lambda, API Gateway, DynamoDB, and RDS.

Worked on Snowflake schemas and data warehousing and processed batch and streaming data load pipeline using Snowpipe.

Designed and developed data marts following star and snowflake schema methodology using data modeling tools like Erwin.

Created external tables with partitions using Hive, Athena, and Redshift.

Gathered data from various sources like Google AdWords, Apple search ad, Facebook ad, Bing ad, Snapchat ad, Omniture data, and CSG using their API.

Created inventory, job templates, and scheduling jobs using Ansible Tower and expertise in writing Ansible playbooks.

Data modeling in Erwin's design of target models for enterprise data warehouse.

Expertise in Erwin data modeling tools for relational and dimensional modeling.

Created Informatica mappings with T-SQL procedures to build business rules to load data.

Wrote stored procedures and triggers in PL/SQL.

Strong experience in AWS cloud services like EC2, S3, EBS, RDS, VPC, and IAM.

Defined and documented Reference Data Management business processes and workflows.

Designed and managed public/private cloud infrastructures using AWS.

Managed and configured AWS cloud services like EC2, S3 bucket, Security groups, RDS, EBS, ELB, Auto-scaling, AMI, Elastic search, IAM through AWS Console, and API Integration.

Proficient in Elastic Search, data modeling, and querying using log aggregation, data extraction, and reporting using Elasticsearch and Kibana tools.

Implemented token-based authentication for the Rest API.

Prepared conceptual, logical, and physical ER data models using Erwin Data Modeler.

Involved in creating Hive tables, loading data into them, and writing Hive queries to analyze the data.

Involved in data ingestion into HDFS using Sqoop for full load and Flume for incremental load from various sources.

Parsing data from S3 through Python API calls through Amazon API Gateway generating batch processing.

Experience in custom aggregate functions using Spark SQL and performing interactive querying.

Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.

Designed and deployed new ELK clusters.

Created UDFs to calculate the pending payment for the given customer data based on the last day of every month and used them in Hive scripts.

Used Elastic Search & MongoDB for storing and querying offers and non-offers data.

Responsible for data modeling in MongoDB to load structured and unstructured data.

Analyzed large data sets to apply Machine Learning techniques and develop predictive models.

Demonstrated Key Performance Indicator (KPI) dashboards using Tableau.

Environment: Hadoop, HDFS, Flume, Hive, MapReduce, Sqoop, LINUX, MapR, Big Data, Erwin Data modeler, UNIX Shell Scripting, TWS, Python, SQL Server, python, Flink, Tableau, PySpark, Cassandra, Snowflake, Elasticsearch, Kibana.

Cox Communications, Atlanta, Georgia Nov 2018 -Dec 2020

Data Engineer

Responsibilities:

Requirement gathering, Business Analysis, Design, Development, testing, migration, implementation, and documentation for data warehousing projects.

Managed cloud computing tools AWS and GCP: experience with VPC, EC2, S3, EBS, IAM, ELB, CloudFormation, CloudWatch, BigQuery, Dataflow, Pub/Sub, and GCS.

Extensive backend programming experience using Transact-SQL and PL/SQL, including stored procedures, functions, and triggers.

Developed and optimized data models to support advanced analytics and business reporting requirements on cloud platforms like Azure.

Used Azure Data Factory and Azure Synapse Analytics to design and implement ETL solutions for data integration from on-premises and cloud-based systems.

Developed robust data models following best practices in dimensional modeling and data warehousing.

Optimized ETL workflows using Apache Airflow and Databricks for efficient data ingestion and transformation.

Created and maintained data lineage documentation, ensuring transparency in data flow and transformations.

Assisted in integrating metadata management solutions to improve data discovery and governance.

Delivered advanced analytics solutions using GCP services, including BigQuery and Dataflow, for processing and analyzing large datasets.

Developed data models to support customer-related reporting, integrating GCP tools for seamless data management.

Built interactive dashboards and reports using Tableau and Power BI, focusing on call center data and customer insights.

Proficient in PySpark and Python for transforming and processing large datasets across cloud platforms, including Azure and Databricks.

Involved in the cloud migration of large datasets to Azure, ensuring seamless data ingestion and transformation.

Developed Bash scripts, T-SQL, and PL/SQL scripts for automation and data manipulation.

Coordinated with the Data Science Team to implement Advanced Analytical Models in Hadoop clusters over large datasets.

Worked on downloading BigQuery data into Pandas or Spark DataFrames for advanced ETL capabilities.

Utilized Google Data Catalog and other Google Cloud APIs for monitoring, querying, and billing analysis.

Created POCs for utilizing ML models and Cloud ML for batch process quality analysis.

Demonstrated the use of Ansible and Ansible Tower for automating software development processes and infrastructure management.

Expertise in designing and deploying Hadoop clusters and using Pig, Hive, SQOOP, and Apache Spark with Cloudera distribution.

Built efficient pipelines for moving data between GCP and Azure using Azure Data Factory.

Extensive use of Cloud Shell SDK in GCP for configuring and deploying services like Data Proc, Storage, and BigQuery.

Developed multi-cloud strategies using GCP (for PAAS) and Azure (for SAAS).

Used data analytics and visualization tools including Elastic Search and Kibana to build application metrics.

Designed, built, and managed ELK clusters for centralized logging and search functionalities.

Built data pipelines in Airflow in GCP for ETL jobs using various Airflow operators.

Used Ansible for documenting infrastructure into version control.

Proficient in calculating measures and members of SQL Server Analysis Services (SSAS) using multi-dimensional expressions.

Experience with SSIS script tasks, look-up transformations, and data flow using T-SQL.

Developed PL/SQL stored programs for data transformations, integrated with Informatica.

Installed, configured, and administered Jenkins CI on Windows and Linux.

Administered CI/CD tools stack including Jenkins and Chef for administration and maintenance.

Worked with different file formats like Text files, Sequence Files, Avro, and ORC.

Ingested real-time and near-real-time (NRT) streaming data into HDFS using Flume.

Created and managed TWS Jobs and Job Streams for automated scheduling.

Worked with Golden Gate replication tool for data migration into HDFS.

Used HBase to support enterprise production and data loading via SQOOP.

Collected and aggregated large log data using Apache Flume, staging data in HDFS for analysis.

Exported data from Avro files and indexed documents in ORC format.

Created dashboards in MS Excel and Tableau with ODBC connections to BigQuery and Presto SQL engine.

Implemented a Continuous Delivery pipeline with Docker, Jenkins, GitHub, and GCP for automated builds and deployments.

Engaged in data modeling and populating business rules for Metadata Management.

Hands-on experience with cloud technologies such as Azure HDInsight, Azure Data Lake, AWS EMR, Athena, Glue, and S3.

Experience orchestrating API integration use cases involving multiple systems and complex business logic.

Designed and built REST APIs for various applications.

Developed UDFs for Hive scripts to calculate pending payments based on customer data.

Wrote shell scripts to run jobs in parallel for improved performance.

Processed and transformed data using Spark with Python (PySpark).

Involved in running TWS jobs for processing large datasets using ITG.

Analyzed systems for enhancements and performed impact analysis for ETL changes.

Acquired and maintained data from primary and secondary sources, checking for invalid/out-of-range data.

Environment: Core Java, J2EE, Hadoop, HDFS, Flume, Hive, Airflow, MapReduce, Sqoop, LINUX, Mapr, Big Data, Golden Gate, UNIX Shell Scripting, TWS, HP ALM 12, ITG, Hive, HBase, Tableau.

Costco, Issaquah, WA Aug 2016-Oct 2018

Big Data Analyst

Responsibilities:

Experience in analysis, design, and development of data warehousing and business intelligence applications.

Enhanced existing data models to include memory-related components.

Designed and developed robust and scalable data pipelines using Azure Data Factory and Databricks for real-time and batch processing.

Implemented data modeling and transformations in Azure Synapse and ADLS to meet business reporting and analytics requirements.

Built and optimized event-driven architectures using Azure Event Hub and Azure Functions for data integration across cloud platforms.

Designed scalable data solutions to analyze large data sets related to customer and meter insights, optimizing performance and storage using BigQuery and GCP tools.

Created customer-facing dashboards using Tableau and Power BI, presenting complex data in an understandable format for business decisions.

Worked closely with leadership to gather requirements and implement data strategies aligning with business objectives.

Worked with Azure Data Factory to integrate and transform data from on-premises (MySQL, Cassandra) and cloud sources (Blob Storage, Azure SQL DB) and loaded it into Azure Synapse.

Developed Spark Scala functions for real-time data mining, configured Spark Streaming to process data from Apache Flume, and stored stream data in Azure Table.

Utilized DataLake for data storage, processing, and analytics.

Ingested data into Azure Blob Storage and processed it using Databricks, including writing Spark Scala scripts and UDFs.

Expertise in Dimensional Modeling, Data Analysis, and ETL/ELT processes using AWS and Azure services.

Configured and managed Puppet Master Server, updating modules and pushing them to clients.

Designed and developed ETL processes in AWS Glue to migrate data from S3 to AWS Redshift.

Created complex stored procedures, triggers, cursors, table views, and T-SQL queries.

Worked with Spark DataFrames to create datasets, applying business transformations and data cleansing in DataBricks Notebooks.

Proficient in writing Python scripts for ETL pipelines and Directed Acyclic Graph (DAG) workflows using Airflow and Apache NiFi.

Created conceptual models for data warehousing using Erwin Data Modeler, including DFD and ERD diagrams.

Hands-on experience with AWS services: Lambda, Athena, DynamoDB, Step Functions, SNS, SQS, S3, and IAM.

Utilized JSON schema for defining table and column mappings from S3 to Redshift.

Experienced with AWS CloudFormation for creating IAM Roles and end-to-end architecture deployment.

Managed application deployments, orchestration, and automation using Ansible.

Distributed tasks to Celery workers for managing communication between services. Monitored Spark clusters using Log Analytics and Ambari Web UI.

Developed data ingestion pipelines on Azure HDInsight Spark cluster using Azure Data Factory and Spark SQL. Worked with Cosmos DB (SQL and Mongo APIs).

Designed custom-built input adapters using Spark, Hive, and Sqoop for data ingestion into HDFS.

Leveraged advanced AWS Glue techniques to design and optimize ETL processes.

Implemented Apache Hudi for managing datasets on AWS, including setting up Change Data Capture (CDC) for real-time synchronization.

Developed and maintained Python and PySpark scripts for automating data processing tasks.

Managed AWS S3 data storage solutions, implementing robust ingestion and retrieval mechanisms.

Automated and orchestrated workflows using AWS Lambda and AWS EventBridge.

Successfully migrated a large-scale database from Teradata to Amazon Redshift, optimizing performance and reducing costs.

Tuned T-SQL queries and SSIS packages for performance improvements.

Worked on loading data from Web Servers and Teradata using Sqoop, Flume, and Spark Streaming API.

Managed resources and scheduling across clusters using Azure Kubernetes Service (AKS).

Extensively used Kubernetes for handling online and batch workloads for analytics and machine learning applications.

Utilized Azure DevOps and VSTS for CI/CD, Active Directory for authentication, and Apache Ranger for authorization.

Experience in Spark applications, optimizing batch interval times, parallelism, and memory usage.

Used Scala for concurrency support, parallelizing processing of large datasets

Environment: Scala, Spark SQL, Hive, Data Migration, Data Warehouse, Snowflake, AWS, NoSQL, Teradata, T-SQL, ETL, MapReduce, Microsoft Azure, Cassandra, MongoDB, Apache Hadoop, Git.

Concentrix Inc., Hyderabad, India Nov 2013-Jun 2016

Data Analyst

Responsibilities:

Representation of the system in hierarchy form by defining the components, and subcomponents using Python and developed a set of library functions over the system based on the user needs.

Developed tools using Python, Shell scripting, and XML to automate tasks.

Enhanced by adding Python XML SOAP request/response handlers to add accounts, modify trades, and security updates.

Build and deploy the code artifacts into the respective environments in the Confidential Azure cloud.

Implemented RESTful web services for sending and receiving data between multiple systems.

Developing scalable and reusable database processes and integrating them.

Worked on SQL and PL/SQL for backend data transactions and validations.

Installing and automation of application using configuration management tools Puppet and Chef.

Using Python NLTK toolkit to be used in smart MMI interactions.

Supported development of Web portals, completed Database Modelling in PostgreSQL, front-end support in HTML/CSS, JQuery.

Assess the infrastructure needs for each application and deploy it on the Azure platform.

Involved in various phases of the project Analysis, Design, Development, and Testing.

Developed a Front-End GUI as a stand-alone Python application.

Debug application deployments on multiple levels.

Involved in the development of Web Services using SOAP for sending and getting data from the external interface in the XML format.

Trained and documented initial deployment and supported product stabilization/debugging at the deployment stage.

Designed and maintained databases using Python and developed Python-based API (RESTful Web Service) using Flask, SQL Alchemy, and PostgreSQL.

Used



Contact this candidate