Resume

Senior Data Engineer

Location:

Dallas, TX, 75215

Posted:

November 17, 2023

Contact this candidate

Resume:

PRANAV M

SENIOR DATA ENGINEER

Phone 469-***-****

E-mail ad08mp@r.postjobfree.com

LinkedIn https://www.linkedin.com/in/mpranav761/

Around 9 years of IT experience as a Data Engineer in creating and maintaining data pipelines using AZURE and AWS Clouds and a strong background in Big Data technologies. Using AWS and AZURE services like AWS Glue, AWS Kinesis, Azure streaming analysis, Kafka, and Spark Streaming created real- time pipelines. Proficiently worked with visualization tools like Power BI, Tableau, and Looker. Expertise working with Big Data Technologies like Hadoop, Hive, Kafka, Scala, Sqoop, and Spark. Acquaintance with different SQL and NoSQL databases. Hands-on-experience in using CI/CD like GIT, Jenkins, and Azure DevOps for the control system. Experienced in scheduling jobs in Airflow using Crown Syntax and also wrote Python Scripts using DAGs. Proficiently migrated the project, moving a legacy on-premises data warehouse to Snowflake, resulting in a 50% reduction in query response times and operational costs. Proficient in Python and proven expertise in designing, implementing, and maintaining robust software solutions. Involved in all phases of the Software Development Life Cycle, which includes Design, Analysis, specifications, and testing using Agile methodologies. Skills

Aws: EC2, S3, DBT, Glue, EMR, Kinesis, Code Pipeline, Code Deploy, SQS, Athena, RDS, Redshift, Cloud watch.

Azure: Azure Data Factory, Azure Data Bricks, Data Lake Storage, Azure SQL Database, Azure DevOps, Azure vault, Synapse Analytics, Azure bog Storage, AKS. Big Data: Hadoop, Hive, Kafka, Sqoop, Flume.

Languages: Python, Java, C

Databases: Snowflake, DynamoDB, MongoDB, Cosmo DB, My SQL, Oracle, PostgreSQL, Tera data.

Other: Git, GitHub, Jenkins, Apache Airflow, Snowflake, Splunk, EHR, HL. Certifications

2023-03 AWS Certified Data Analytics – Specialty

2023-07 AWS Certified Solutions Architect – Associate

Work History

2022-03 -

Current

Senior Data Engineer

ULab Systems, Memphis, TN

U Lab Systems is a Medical Equipment Manufacturing Company. This company manufactures aligners for customers, the objective of our team is to develop a Data pipeline for payments, Doctor appointment details, and a streaming pipeline for Customer Data.

Responsibilities:

• Manufactured Aligners For patients, we collected patient data from S3 and created an AWS GLUE job for transforming data through the Batch Pipeline.

• We used AWS Crawler to extract patient data from the S3 bucket performed the transformations using AWS Glue, and also converted files CSV and JSON into relational datasets.

• Performed Data Cleaning, Data processing, and Data Mapping transformations using the Pyspark Scripts.

• Created serverless architecture using Lambda in Glue and Kinesis.

• Stored transformed data in Snowflake and built visualizations and reports in Tableau for a dynamic dashboard using Snowflake data.

• Experience working with AWS services like EC2, RDS, and Lambda to build reliable and scalable Applications.

• Proficiently migrated the project, moving a legacy on-premises data warehouse to Snowflake, resulting in a 50% reduction in query response times and operational costs.

• Familiarity with AWS database migration services, including schema and data migration from on-premises databases to RDS instances.

• Designed and implemented data warehousing solutions using Amazon Redshift, optimizing schema design and query performance for large datasets.

• Used Apache Airflow Dags to schedule the pipelines from AWS S3 to Snowflake.

• Implement CI/CD pipeline for Code Deployment using Jenkins.

• Proficient in administering Amazon RDS instances (MySQL, PostgreSQL, SQL Server) for optimal performance, security, and availability.

• Designed and implemented a high-volume pipeline data processing using Oracle 19 database and Oracle data integration.

• Utilized Agile methodology for iterative application development, weekly sprints, and customer reporting backlogs.

Tech Stack: S3, EC2, AWS GLUE, AWS Crawler, Pyspark, Lambda, Kinesis, Snowflake, RDS, RedShift, Tableau, Airflow, GitHub, DAGs, Batch, CI/CD, Cloud Watch, Oracle 19, EMR, Jenkins.

2021-01 -

2022-02

Data Engineer II

Texas Fair Plan Association, Austin, TX

Texas Fair Plan Association is an American holding company for property insurance for potential Texas policyholders. This project, “TFPAIC” is specifically for property insurance which covers fraud detection, Claim data processing, and geographic information systems. Implemented a set of predefined functionalities in pipelines to reduce redundant claims from customers.

Responsibilities:

• Designed and implemented an ETL pipeline using Azure Databricks to process and store large amounts of property insurance data from Azure Data Lake Storage.

• Developed Apache Spark ETL application using PySpark and extracted property insurance data from different sources like Vertica and SQL Server.

• Used PySpark to clean, transform, and aggregate data with proper file and compression types as per requirement before Writing data to Azure data lake storage.

• Migrated property insurance data from Azure Data Lake storage to Azure SQL Database using Azure Data Factory and Azure DataBricks notebooks.

• Used PowerBI and generated dynamic visualizations for business purposes.

• Created and maintained optimal data pipeline architecture in Azure using Data Factory and Azure Data Factory.

• Expertise with troubleshooting and resolving issues with a Splunk environment.

• Developed CI/CD pipelines to filter the data based on the application and deploy the application to a higher environment.

• Scheduled pipelines using Cron Syntax in Airflow and created the workflow using Azure Logic Apps.

• Expertise working with Azure Kubernetes Service (AKS) for integrating CI/CD pipelines.

• Experienced in creating and configuring Kubernetes clusters on AKS using Azure Portal, and Azure CLI.

• Proficiency in writing queries and manipulating data in NoSQL databases using the appropriate APIs and Tools.

• Implemented Map Reduce jobs in Python for data cleaning and data processing.

• Worked in SCRUM Methodology for designing, analyzing, and developing the pipelines, and testing the use cases for the business. Tech Stack: Azure Data Factory, Azure Data Lake, Azure Databricks, Pyspark, Python, PowerBI, Map Reduce, Airflow, Azure Logic Apps, NoSQL, CronSyntax, No SQL, Azure CLI, Azure Portal, AKS, Kubernetes cluster.

2018-01 - Data Engineer

2020-12 VPhrase Analytics Solutions Pvt. Ltd, India V Phrase Analytics Solutions Pvt is an Indian-based company that provides business intelligence solutions using NLP and Big Data Technologies. Implemented Hadoop methodologies for reducing data misinterpretation to derive meaningful insights from complex datasets.

Responsibilities:

• Installed/Configured/Maintained Apache Hadoop clusters for application development based on the requirements.

• Extracted files from Hadoop and dropped them on an hourly basis into S3.

• Wrote Pig Scripts to generate MapReduce jobs and performed ELT procedures on the data in HDFS.

• Proficient with Elasticsearch Query DSL and related search technologies, such as Lucene and Kibana.

• Used Sqoop to channel the data from different sources of HDFS and RDBMS

• Developed real-time data ingestion application using Flume and Kafka.

• Wrote code and created Hive jobs to parse logs and structure them in tabular format for effective querying.

• Understood how NoSQL databases fit into the larger context of modern software architectures, including microservices, and cloud computing.

• Experience with integrating NoSQL databases with other technologies such as Apache Kafka, Apache Spark, and Elasticsearch.

• Experienced in designing and implementing database solutions using PostgreSQL, including database schema design, query optimization, and performance tuning for customers' data.

• Skilled in writing SQL queries and creating models using DBT's templating language.

• Used Python for loading data from different sources to a data warehouse to perform some data aggregations for business intelligence.

• Developed spark applications using PySpark and Spark-SQL for data extraction, transformation, and aggregation for multiple file formats.

• Used Quick Sight for analysis of applications for business purposes. Tech Stack: Hadoop, Hive, AWS, Flame, Kafka, Python, PySpark, Sqoop, Flume, Map Reduce, Quick sight, Spunk, NoSQL, Elasticsearch, ELK, PostgreSQL, DBT. 2015-01 -

2017-12

Python Developer

E-commerce, India

Lime Road is an Indian-based, company that provides customers with products at the best prices. Implemented Python Script to build the application and stored the data in a Data Warehouse.

Responsibilities:

• Developed custom reports using HTML, Python, and MySQL and also developed monitoring tools using Python.

• Used NumPy, SciPy, and Matplotlib libraries for n-dimensional representation of data and plotting graphs.

• Experienced working with PostgreSQL, for developing SQL reports that meet client expectations for the application.

• Worked on Python Open stack APIs and developed tools using Python, Shell scripting, and XML to automate some of the menial tasks.

• Designed and developed the UI of the website using HTML5, XHTML, AJAX, CSS3, and JavaScript.

• Experienced working with SQL, NoSQL, MongoDB, DynamoDB, and PostgreSQL.

• Worked with JSON-based REST Web services and Amazon Web Services(AWS).

• Involved in Sprint planning sessions and participated in the daily Agile SCRUM meetings and monitored JIRA (Agile).

Tech Stack: Python, MySQL, HTML5, JavaScript, Jira, XML, Shell scripting, AJAX, CSS3, JavaScript, NoSQL, PostgreSQL, Oracle.

Education

Master of Science: Computer And Information Sciences Western Illinois University - Macomb, IL

Bachelor of Technology: Computer Science And Engineering BML Munjal University - HR, INDIA

Contact this candidate