Professional Summary
●* years of experience in Data Analysis, Data Engineering, Data Warehousing on cloud platforms such as GCP, AWS and Azure.
●Extensive experience in developing Data Engineering Pipelines using technologies like Python, SQL, Spark on Cloud Platforms such as Azure and Databricks.
●Experience in building Data Lake and Data Warehousing using technologies like Azure Data Lake Storage, ADF Data Flow, ADF Pipeline and Azure Synapse
●Extensively worked on Databricks platform using components such as Databricks CLI, Databricks jobs and Workflows, Databricks Secrets, Databricks Compute, Databricks Notebooks, etc.
●Expertise in Data Engineering and Data Analysis using extreme automation and metadata driven development.
●Extensively worked on writing SQL Queries to take care of joins, aggregations, sorting, ranking, etc. in Databases such as PostgreSQL, Azure SQL, Azure Cosmos DB, Oracle, Synapse, Databricks, etc.
●Good experience in Test Driven Development, Unit Testing and Data Validations.
●Well versed with Continuous Integration and Continuous Deployment (CI/CD), Code refactoring techniques
●Experience in development methodologies such as Agile and Waterfall.
●Providing Production support for the application and Bug fixing within production environment
●Experience in using IDE tools like Visual Studio Code for Python based application development and pgAdmin or Toad to run SQL Queries and Git based tools.
●Excellent Technical, Analytical, Problem-Solving skills, strict attention to detail and ability to work independently, work within a team environment.
●Excellent Communication skills, Strong Team player, Disciplined, and Adaptive learning abilities etc.
Technical Skills
Programing Languages
Python, Java
Data Processing Frameworks
Pandas, Spark, SQL
Cloud Technologies
Azure, GCP, AWS
Databases
MySQL, Postgres, Azure SQL (SQL Server), Azure Cosmos DB.
Data Warehousing
Databricks SQL, Azure Synapse Analytics, Databricks SQL Warehouse
Web Technologies
Flask, HTML, CSS, JavaScript, SOAP, REST
IDEs and Tools
Visual Studio Code, pgAdmin, Toad, Git based tools
Education:
Bachelors in computer science and Engineering, GPREC, India (2015)
Professional Experience:
American Airlines, Dallas, TX Feb 2021 to Till Date
Role: Data Engineer
American Airlines is a major US-Based airline company. It is the World’s largest airline when measured by fleet size, scheduled passengers carried, and revenue passenger mile. I am part of the Ticketing management team, where I am actively involved in building Data Pipelines using Technologies like Azure Data Analytics services and Azure Databricks.
Responsibilities:
●Developed and maintained end-to-end ETL or Data engineering pipelines to process large scale data using Azure as Cloud platform
●Developed Data Engineering Pipelines utilizing Python, Spark, Databricks, Airflow, and other technologies.
●Researched and implemented various components like pipeline, activity, mapping data flows, data sets, linked services, triggers and control flow.
●Built ETL pipelines using Azure Data factory with Azure SQL as source and Azure Synapse Analytics as Data Warehouse
●Worked on building ETL pipelines using Databricks with Azure Data Lake storage (Gen 2) as source and Azure Cosmos DB as Destination.
●Created ADF pipelines which will run Dynamically using parameters and used Filter and Aggregate Transformations to transform the data as per the requirement.
●Extensively worked on Data Analysis using SQL based queries in Azure SQL, Azure Databricks and Azure Synapse Analytics.
●Worked on Building ADF data flow using Azure SQL and Azure Synapse analytics
●Orchestrate complex data pipelines using ADF pipelines. The pipeline contains activities based on Azure Data Copy, ADF Dataflow (with Azure SQL as source and Azure Synapse Analytics as target), for Each for Baseline etc.
●Performance Tuning of ADF dataflows and Pipelines using Custom integration Runtimes and reducing the shuffle partitions
●Used Git commit, Git push and Git add to store the application data and maintaining code versions, and for collaborating with teammates
●Performed extensive debugging, data validation, error handling mechanism, transformation types and data clean up analysis within large datasets.
●Experience using CICD processes for application software integration and deployment using Git
●Developed sample test cases to Test the application and performed Unit testing to validate the application against the requirements, with proper documentation of the results
● Following Agile methodology with 2-week sprints, involving Backlog refinements and Sprint planning.
●Played the role of liaison between business users and development team and created technical specifications based on the business requirements.
Excentus (PDI technologies-Current name)), Dallas, Tx Feb 2020 to Jan 2021
Role: Software Engineer
Excentus Corporation develops and designs fuel-pump controls and cross-marketing technology. The Company offers pc-based fuel controllers, which allows pay-at-the-pump capabilities, and cross- market technology allows users to implement loyalty programs as incentives for shoppers to return.
Responsibilities:
●Designed and Implemented end to end Data Engineering Pipelines for the ERP Solutions and Retail Solutions using ADF Data Flow and Spark on Azure Databricks.
●Handle Ingestion of data from various data sources into Azure Storage using ADF Data Flows.
●Created a PySpark-based application to convert data from one format to another, like CSV to Parquet.
●Designed and developed applications using Pyspark to read the CSV files and dynamically created the tables in Azure data storage
●Implemented ETL Logic using Pyspark and Spark SQL based Notebooks using Azure Databricks. Built Orchestrated Workflow or Pipeline for the same using ADF Pipeline.
●Implemented Data Ingestion from source RDBMS Databases such as Postgres, Azure SQL, etc. using Spark over JDBC on Azure Databricks. The solution is designed using Databricks Secrets and Pyspark.
●Developed required Spark SQL Statements to create databases and tables as per medallion architecture using different providers or file formats such as Delta, Parquet, CSV, JSON, etc.
●Developed and Deployed Databricks Workflows/Jobs as per schedule using Databricks Notebooks built on PySpark and Spark SQL. Orchestrate Databricks based Tasks using Databricks Jobs
●Worked with Python Collections, and used Pandas to read Json files, process those files, and analyze the data
●Extensively worked on writing queries for ad hoc analysis of the data based on the business requirements
●Unit Testing and Data Validations by running basic to complex SQL Queries. Automated Data Validations using Python as a programming language.
●Performance Tuning of long running Spark SQL Queries using techniques such as Partitioning
●Performed extensive debugging, data validation, error handling mechanism, transformation types and data clean up analysis within large datasets.
●Used Git for version control with colleagues.
●Actively worked with business to translate requirements to technical specifications and coordinate with offshore teams.
Environment: Pyspark, Python, SQL, Spark, Databricks, ADF, Databricks jobs and workflows, Git, Databricks secrets
Bean Infosystems, India Jan 2017 – Jan 2020
Role: Data Engineer
Bean Infosystems is leading Technology development Company delivering an array of cutting-edge technology solutions and products. It provides a full range of software development products to assist Clients with solutions to their most important business and operational challenges. I as part of Bean Infosystems have worked on a project for building an end-to-end application for a food chain Industry.
Responsibilities:
●Worked on Developing APIs for a backend application for a bulk order management system to extract and transform orders details.
●Created a Python-based application to convert data from one format to another, like CSV to Parquet.
●Worked on Gmail APIs and Services to get data from the emails received
●Converted the received data from emails into required file formats and processed the Data
●Worked with Python Collections, and used Pandas to read Json files, process those files, and analyze the data
●Extensively worked on writing queries for ad hoc analysis of the data based on the business requirements
●Load Data into nonproduction environments using Database Import tools or utilities.
●Used Git commit, Git push and Git add to store the application data and maintaining code versions, and for collaborating with teammates
●Unit Testing and Data Validations by running basic to complex SQL Queries. Automated Data Validations using Python as a programming language.
●Experience using CICD processes for application software integration and deployment using Git
● Following Agile methodology with 2-week sprints, involving Backlog refinements and Sprint planning.
●Troubleshoot and Debug database connectivity issues of tools such as Tableau, Power BI, Toad, etc. for Business Analysts and other non-technical users.
●Played the role of liaison between business users and development team and created technical specifications based on the business requirements.
●Onsite Coordinator between developers, testers, business analysts spread across the globe.
Environment: Python, SQL, Gmail API’s, Databricks secrets, Agile methodology, Pyspark
Bibox labs (Evobi Automations Pvt Ltd), India April 2015 – Dec 2016
Software Developer and Tinkering Mentor
Responsibilities:
●Created a Python-based application to convert data from one format to another, like CSV to Parquet.
●Worked with Python Collections, and used Pandas to read Json files, process those files, and analyze the data
●Extensively worked on writing queries for ad hoc analysis of the data based on the business requirements.
●Following Agile methodology with 2-week sprints, involving Backlog refinements and Sprint planning.
●Performed extensive debugging, data validation, error handling mechanism, transformation types and data clean up analysis within large datasets.
●Experience with building IoT projects using micro processers and micro controllers.
●Experience in guiding students to build IoT projects using Arduino and Raspberry pi.
●Worked with Humanoid Robots and coded them using python to perform certain actions.
●Following Agile methodology with 2-week sprints, involving Backlog refinements and Sprint planning.
●Used Git for version control with colleagues.