Swathi
Mail ID: *************@*****.***
Ph number: 980-***-****
Linkedin : https://www.linkedin.com/in/swathi-k-528672249/
Professional Summary:
9+ years of expertise in Bigdata application development using Hadoop/YARN, Spark, Hive, Sqoop and Oozie.
Real time programming experience in Python, Scala, Java, Shell & SQL.
Experience in importing and exporting data from different RDBMS Servers like MYSQL, Oracle and Teradata into HDFS and HIVE using Sqoop and Spark JDBC.
Experience in using Sqoop & Flume for data integration.
Developed MapReduce programs in Java for data cleansing and data aggregations and used optimization techniques like Combiner & Partitioner to enhance the performance.
Experienced in analyzing the data using PIG Latin scripts.
Good knowledge on Joins, group and aggregation concepts and resolved performance issues in Hive and Spark.
Strong experience in working with Azure data services, including Azure Data Factory, Azure Data Lake, Azure SQL Database.
Hands-on experience on Visualizing the data using Qlik Sense, Power BI, Tableau, Python (Pandas, matplotlib, NumPy, SciPy).
Experience with data modeling, data warehousing, and ETL/ELT processes.
Can effectively operate in parallel across GCP and AWS clouds.
Hands on Experience in GCP, Big query, cloud functions.
Hands on experience working with data integration tools like Talend, Informatica Power Centre, SSIS.
Hands-on-experience with ETL and ELT tools such as Airflow, Kafka, flink, NIFI, AWS Glue.
Experience in developing Hive UDF's and Spark UDF’s to implement various business rules.
Experience in designing tables and views for reporting using Impala.
Created dynamic dags in Apache Airflow and used various AWS and GCP operators as part of ETL and also PySpark.
Experience in writing complex queries for reporting using Impala, Hive and SparkSQL.
Expertise in writing Spark RDD transformations, actions, Data Frame's, case classes for the required input data and performed the data transformations using Spark-Core also convert RRD to Data Frames and Datasets.
Experience working on AZURE environment.
creating Data Flows using ADF and loading data into Azure Data Lakes and Azure.
Integrating Azure Databricks with Power BI and creating dashboards.
Data Ingestion to Azure Services and processing the data in In Azure Databricks.
Experience in cloud based data migration tools and validations.
Experience of delivery of client facing projects, ideally in a data migration consultancy.
Experience in migrating MapReduce and Hive jobs to Spark jobs.
Experience with Docker, Kubernetes, Git, and deploying applications on AWS.
Good understanding of orchestration frameworks like Apache oozie and Apache Airflow.
Workflow orchestration and scheduling using Oozie, Airflow and UC4 Automata.
Experience working with NoSQL Database (HBase & MongoDB) application development.
Experienced on different Relational Data Base Management Systems like Teradata, DB2, Oracle and SQL Server.
Experience in agile testing approaches, frameworks and implementations.
Having Experience on UNIX commands and Deployment of Applications in Server.
Worked with varied data formats like JSON, XML, Avro, Parquet, RC and ORC.
Establish a centralized logging & monitoring framework for Kubernetes clusters across AWS, Azure, and GCP. Developed Kubernetes based application hosting patterns on AWS, Azure & GCP.
Used compression codecs like Snappy & BZIP2 to improve the storage efficiency.
Expertise in implementing REST API’s using Python Flask and integrated big data technologies, relational database and NoSQL databases (HBase and MongoDB).
Experienced on Developing and Productionizing Flask API with Gunicorn, Nginix, setting up APIGEE layer between client and API and also dockerized and setup a continuous integration and deployment pipeline and deployed on OCP/Open shift platform
Developed API’s in Python and dockized, deployed in AWS EKS cluster through CICD pipeline.
Experience in Developing ETL jobs in IBM data stage and migrating ETL jobs from IBM DataStage jobs to Hadoop platform.
Technical Skills:
Big Data skills
Hadoop, Map Reduce, Yarn, Hive, Pig, Sqoop, HBase, Oozie, Kafka, Flume, Apache Spark, NIFI.
Monitoring & Automation
Cloudera Manager, Control-m, Corntab, Airflow
Programming
Python, Java, Scala, Shell, Hive-QL, SQL
Databases
MySQL, SQL Server, Teradata, NoSQL HBase, MongoDB, Cassendra, Snowflake (Data warehouse).
Cloud Infrastructure
AWS – EC2, EMR, S3, Lambda, Glue, Red shift, Pipeline, Athena.
Azure- Databricks, Dataflow, Synapse, Data factory, Datalake storage, Sql Database.
GCP- Cloud fucntions, Bigquery, Datastore, Dataflow, Cloud Pub/Sub.
Others
Eclipse, Maven, SBT, IntelliJ, PyCharm, VS code, Github. Flask, Gunicorn, Nginix, APIGEE, Dockerization.
Education:
Bachelors in electrical engineering, VGNT, India. April,2014
Certifications:
Python with IIT Certification
Professional Experience
Florida Department of Business and Professional Regulation, Orlando, FL(Remote)
Data Analyst Feb 2023- Present
Responsibilities:
Extensive knowledge of virtual data mart, data modeling and data visualization in Qlik Sense Enterprise.
Possesses working knowledge of Relational Database Management Systems (RDBMS) and data warehouse front-end tools.
Analyzed business requirements and their specifications, sources, extracted data from physical data model, Transformed and Load process to develop Qlik Sense applications.
Developed Qlik Sense Dashboards by extracting data from different sources such as SQL Server, Oracle, QVD, CSV, XML, Excel Files.
Built QVD’S using source system like Oracle, SQL Server, Excel Files and experience in Qlik Scripting, Set Analysis, Section Access.
Extensively worked on objects like Straight Tables, Pivot Tables, Containers, Line Chart, Bar Chart, Multi Boxes, Buttons, KPI’s.
Used Advanced SQL, QLIK language for scripting and proficient in fixing the Qlik Sense Dashboard issues.
Schedules the jobs for daily refresh, created tasks in QMC.
Experienced in using SQL Tables, Views, Joins, sub-queries and Indexes.
Responsible for prototyping solutions, preparing test scripts, and conducting tests and for data replication, extraction, loading, cleansing, and data modeling for data warehouses.
Have experience working with multiple teams in various DBPR setups at once.
Understands the methodologies and technologies that depict the flow of data within and between technology systems and business functions/operations.
Maintains knowledge of software tools, languages, scripts, and shells that effectively support the data warehouse environment in different operating system environments.
Environment: SQL, Qlik Language, ORACLE, SAP, SQL Server, QLIK Sense, CSV, XML, Excel
Comcast, Philadelphia, PA(Remote)
Sr. Data Engineer Jan 2022 – Jan 2023
Responsibilities:
Develop and implement large-scale data from identifying source to completion. Creation of Hive tables and loading data incrementally and full load into the tables using Dynamic Partitioning.
Lead data migration and integration projects from legacy scripting languages to modern data architectures in the cloud.
Proficient in data migration pipeline process, data ingestion pipeline process, exception handling and metadata management from multiple source formats such as Mainframe, relational sources, flat files, etc. to Azure ADLS as well as Hadoop HDFS.
Executed data migration runs.
Performed data migration using Azure. Data migration stack is preferred OR similar cloud migration stack
Developed Spark Streaming job to consume the data from the Kafka topic of different source systems and push the data into HDFS locations.
Develop the Shell scripts and Spark wrappers as part of framework development to load the data in batch and real-time processes.
Building complex ETL pipeline with the help of Apache flink, kafka.
Extensively worked with flink to maintain data stream processing in order to support the needs of modern applications.
Written complex scripts and data pipeline to process structured and unstructured data using Apache Flink.
Responsible for migrating data from Legacy systems to AWS cloud which was running on XMBI data lake
Developed an Automation Tool using Python to validate schema, duplicate check, record count and data validation for different layers by taking varying input parameters from the JSON file.
Highly experienced in Installation, upgradation, migration, memory management, troubleshooting database design and performing information security tasks in oracle instances.
Responsible for Continuous Integration and Continuous Delivery process implementation using Jenkins along with SQL scripts to automate routine jobs.
Designed and built a membership-based REST-API for Claims Adjudication process using Spring Boot by pulling the data from Cassandra.
Analyzed existing databases, tables and other objects to prepare to migrate to Azure Synapse.
Closely worked with ETL to implement Copy activity, Custom Azure Data Factory Pipeline Activities for On-cloud ELT processing
Worked on shell scripting and automated the spark jobs using schedulers.
Developed dashboards using Tableau and Splunk to visualize data and for logs.
Built a data pipeline to read data from Kafka topics and write into tables in Snowflake.
Developed HQL and Spark jobs to perform transformations and joins based on the business requirements provided in the mapping document.
Working on micro services architecture written in Python utilizing distributed data transferring via Kafka with JSON as data exchange formats.
Develop database scripts, code and design tables structures. Performed Data Ingestion and Data Migration from databases and ETL transformations to load data into Hadoop and another RDBMS.
Performance analysis and optimization for data processing. Install and perform data quality checks.
Prepare the workflows using Automation Engine tool to schedule the data processing into hive, Hadoop and ADLS.
Followed Agile with scrum Methodologies.
Implemented for various process to automate the ETL job’s and using CRONTAB as scheduler.
Designed and developed data solutions using Azure technologies such as Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics.
Implemented and managed data pipelines using Azure Data Factory.
Common Spirit Health– Phoenix, AZ
Sr. Big Data Application Developer May 2020 – Dec 2021
Responsibilities:
Hands on development and maintenance of the Hadoop Platform and various associated components for data ingestion, transformation and processing.
Installed and configured Hadoop MapReduce, Apache Spark, HDFS, Developed multiple Spark Jobs in Scala for data cleaning and preprocessing.
Importing and exporting data into HDFS and Hive using Sqoop.
Developed HiveQL scripts to analyze pre-pay claims data
HiveQL scripts to create, load, and query tables in a Hive.
Data Migration from Oracle, My SQL to Hadoop for building advanced data analytics to achieve better performance.
Experienced in defining job flows.
Developed on GCP and Big Query utilizing Python scripted automated feature engineering processes.
Performed work on transferring data from an Oracle database to the Google cloud platform (GCP).
Experienced in managing and reviewing Hadoop log files
Responsible to manage data coming from different sources
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Performed several optimization techniques to improve the performance of hive queries.
Developed Generic Wrapper Scripts that accepts jobs and process in Map reduce, Impala & Spark .
Developed the Generic Spark Sql Execution in Scala that accepts all the hive sql and executes from a hql file.
Develop, build, and implement unique enterprise GIS apps and data integration solutions.
Was involved in setting up of Apache airflow service in GCP
Converted hive queries and the map reduce jobs to the spark and improves the performance.
Developed Spark jobs to identify the claim id’s which are eligible for pre-pay claims.
Developed the oozie workflows for sqoop, hive, spark and shell scripts and it automated using the oozie coordinator.
Developed a web service application in python that integrated with Apache Hbase and SQL Server.
Designed AWS Cloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Created GCP Big Query authorized views for row level security or exposing the data to other teams
Data Extraction, aggregations and consolidation of data within AWS Glue using PySpark.
Write AWS Lambda functions to extract the data from an API and load them into DynamoDB.
Preformed the Manual testing on the Hive scripts and spark jobs.
Developed new plugins to support new Terraform capability and used Terraform and cloud formation to manage and maintain EC2 instances.
Maintain different version of code using bit bucket.
Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Apache Hadoop, Cloudera, AWS, S3, Redshift, GCP, MapReduce, Hive, Sqoop, Apache Spark, Java, Oozie, Oracle, My Sql, Scala, Bit Bucket, Bamboo, Jira, and UNIX Shell Scripting.
Bank of America –Charlotte, NC
Sr. Data Engineer August 2018 – April 2020
Responsibilities:
Experience in implementing advanced procedures like text analytics using MapReduce, Pig, Hive and processing using the in-memory computing capabilities like Apache Spark written in Scala
Experience in importing data from various data sources like Mainframes, Teradata, Oracle and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and Spark and loaded data into HDFS.
Worked extensively on Spark and Hadoop/HDFS API, MapReduce, Hive, PIG, Sqoop, Oozie, etc
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce and also loading data into HDFS using Sqoop.
Developed PIG scripts for source data validation and transformation.
Analyzed Azure Data Factory and Azure Data Bricks to build new ETL process in Azure.
Converted legacy reports from SAS, Looker, Access, Excel and SSRS into Azure BI and Tableau
Involved in creating Hive tables, loading with data and writing hive queries to process the data.
Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
Developed complex SQL/USQL/PySpark code for Data Engineering pipelines in Azure Data Lake analytics and azure data factory.
Developed Pig and Hive UDF’s as per business use cases.
Experienced in using Impala to do Adhoc queries.
Experienced working with different file formats - Avro, Parquet, ORC and JSON.
Experience in using Gzip, LZO, Snappy and Bzip2 compressions.
Experienced in using NoSQL Databases like Hbase .
Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
Experience in using Spark API over Hadoop YARN to perform analytics on data.
Migrated PIG scripts, MR to into Spark Data frames API and Spark SQL to improve performance.
Experienced in implementing Spark RDD transformations, actions to implement business analysis.
Extensively worked on Spark Data frames, Spark Data sources, Spark SQL and Streaming using scala.
Environment: Cloudera Distribution, Hadoop, Map Reduce, Pig, Hive, Sqoop, Oozie, Scala, MySQL, SQL Server and Unix
GEICO – Maryland, MD
Hadoop Developer May 2017 - July 2018
Responsibilities:
Worked with the Teradata analysis team to gather the business requirements.
Worked extensively on importing data using sqoop and flume.
Responsible for creating complex tables using hive.
Created partitioned tables in Hive.
Transportation of data to Hbase using hive.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
Involved in source system analysis, data analysis, data modeling to ETL (Extract, Transform and Load).
Experience with HIVE DDLs and Hive Query language (HQLs).
Worked on dash boards that internally use Hive queries to perform analytics on structured data, Avro and Json data.
Developed data acquisition into python using pandas, Numpy, file IO.
Involved in importing the real time data to Hadoop using Kafka, flink and implemented the Oozie job for daily.
Written multiple MapReduce procedures to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
Handling structured and unstructured data and applying ETL processes.
Develop Hive queries for the analysts.
Prepare Developer (Unit) Test cases and execute Developer Testing.
Create/Modify shell scripts for scheduling various data cleansing scripts and ETL loading process.
Supports and assist QA Engineers in understanding, testing and troubleshooting.
Written build scripts using ant and participated in the deployment of one or more production systems.
Production Rollout Support which includes monitoring the solution post go-live and resolving any issues that are discovered by the client and client services teams.
Designed, documented operational problems by following standards and procedures using a software reporting tool JIRA.
Environment: Apache Hadoop, Java, Flat files, Oracle, MySQL, UNIX, Sqoop, Hive, Oozie.
HDFC Bank, India May 2014 - February 2017 Role: Software Developer
Responsibilities:
Involved in the full software development life cycle of the project from analysis and design to testing and deployment.
Involved in preparing use-case diagrams, sequence diagrams and class diagrams using Rational Rose, UML.
Extensive use of core Java Collections, Generics, Exception Handling, and Design Patterns for functionality, such as portfolio summary and user information.
Design E-R graphs and relationship among different database tables, Creation tables with low coupling.
Involved in writing complex SQL queries, Stored Procedures, Triggers to access the data from Relational database.
Having good knowledge in Data warehouse, Relational Databases, and IBM Data stage ETL Tool
Developed the ETL Jobs Using IBM Data stage.
Environment: Core Java, JDBC, Web Services, XML, JavaScript, CSS, HTML, PL/SQL and Unix, Servlet, Oracle 10g, Eclipse.