Data Engineer Google Cloud

Location:

Dallas, TX

Salary:

90000

Posted:

October 21, 2023

Contact this candidate

Resume:

Ramarakshit Biradar Patil Dallas, TX +1-945-***-**** *************@*****.***

linkedin.com/in/ramarakshit-biradar-patil-2a3a09135 https://github.com/Rakshith816?tab=repositories EDUCATION:

The University of Texas at Dallas, M.S., Business Analytics G.P.A 3.66, DEC 2023 The P.E.S Institute of Technology at Bengaluru, B.E., Electronics and Communication. G.P.A 3.9, May 2018 CERTIFICATIONS & TECHNICAL SKILLS

Certifications: AWS Associate Level Solution Architect Certification, Microsoft certified Azure data engineer associate (DP 203) Programming languages: SQL, Python (Pandas, NumPy, Scikit-learn), Spark, PySpark, R, JAVA, basics of C, Unix/Linux scripting. Business Intelligence Tools: Tableau, Power-BI, AWS Quick sight, MS Excel (finance functions, V/H-Lookup, PIVOT table, VBA) Cloud Technologies: Snowflake, AWS Lambda, AWS Redshift, AWS EC2, AWS S3, RDS, AWS Aurora, AWS Glue, AWS Data lakes, Azure Synapse Analytics, Azure Data Lake Gen2, Azure Databricks, Azure Data Factory, Azure PowerShell, Google Cloud Platform. Big data technologies: HDFS, Apache Hadoop, Apache airflow, Apache Kafka, Apache spark, hive, MapReduce Databases: MySQL, Postgres database, HBase, Oracle DB, Cassandra, MongoDB, Teradata. Business Logic: Snowflake schema, STAR schema, Data Analysis, Machine learning algorithms, Data Modelling. Soft Skills: collaboration, decision-making, problem-solving, communication skills, innovation, and attention to detail. PROFESSIONAL EXPERIENCE

Data Engineer: NG Talent Tech Group: ETL, SQL, Python, Spark, Tableau. Jan 2023-May2023

• Built scalable ETL that ingested transactional and event data from a web app to a data warehouse with thousands of daily active users, resulting in annual savings of $30K-$40K in external vendor cost.

• Automated ETL process across millions of rows of data, which reduced manual workload.

• Achieved 100% automation eliminating manual errors and dependencies.

• Effectively communicated each stage of project progress to managers and analysts, enhancing efficiency and transparency.

• Leveraged Spark and Python to improve ingestion speed and processing on large stream datasets by 60%. Data Engineer II: Temenos India Pvt. Ltd: SQL, Tableau, ETL, T24, AWS Glue. Jul 2020-Nov 2021

• Mentored 4 new data engineering employees, which improved their efficiency and reduced their training time by 1 week.

• Designed and implemented optimized data processing pipelines using AWS Glue for financial applications such as security and derivative trades, reducing manual intervention and assurance of error-free processing.

• Effectively scheduled and managed daily ‘close of business’ (COB) operations by leveraging AWS EC2 instances with Cron Jobs. Data Engineer I: Temenos India Pvt. Ltd: S3, Lambda, Glue Crawler, Athena, Glue catalog, IAM roles July 2018-July 2020

• Orchestrated AWS lambda to convert JSON to Parquet, automating clean data storage on S3; reduced transformation time using AWS triggers.

• Enabled AWS crawlers to build a Glue catalog for Athena querying, providing valuable insights from the transformed data.

• Developed ETL spark job for seamless CSV to Parquet conversion, significantly reducing processing time.

• Established ETL pipeline job for the reporting layer by joining multiple processed tables, partitioning data, and creating a new data catalog in glue. Stored enriched data in analytical S3 storage and accessed via quick Sight for insightful reporting visuals. Data Engineer Intern: Temenos India Pvt. Ltd: SQL, EXL Dec 2017-July 2018

• Created and maintained documentation for data pipelines and workflows ensuring a smooth handover.

• Performed data transformations, filtering, and aggregation using SQL aiding enrichment and preparation for analysis.

• Learned big data best practices and architectural principles by pairing with senior data engineers. Projects

Azure Synapse SQL Pool – Implement Polybase: ADL Gen2, ADLS, Azure Synapse SQL pool, TSQL, SQL, Polybase May 2023

• Leveraging the capabilities of Polybase in Azure Synapse SQL pool to enable SQL query execution on data stores in Azure Data Lake Storage, Blob, and HDFS

• Adeptly managed the creation of Azure Synapse workspace, Azure Data Lake Gen2, Azure Synapse SQL pool in Azure Synapse studio, a configuration of Polybase with TSQL and retrieved data from ADL storage account using SQL queries. End-to-end PySpark real-time data pipeline: Spark, PostgreSQL, Python, HDFS, Azure, Google Cloud, AWS, PyCharm. Oct 2022

• Implemented PySpark real-time end-to-end ETL pipeline in the USA healthcare domain, generated comprehensive reports for healthcare insights and prescriber analysis.

• Extracted data from Google Cloud and transferred to HDFS to optimize performance, expose file information to enable analysis of corrupted blocks, and reduce request round-trip latency by 10ms.

• Incorporated data quality checks and error handling mechanisms and leveraged Spark for data transformations and efficient report analysis then stored reports in amazon s3 and Azure Blob. ACCOMPLISHMENTS

• Attained top 5 among 170 teams at the University of Texas at Dallas data visualization competition, employing visualization tools like Tableau to create impactful visualizations of Texas gas emission data, effectively communicating patterns and insights to the audience.

ADDITIONAL INFORMATION: Eligible to work in the US, under OPT full-time employment for 36 months.

Contact this candidate