Post Job Free
Sign in

Data Engineer Solutions Architect

Location:
Houston, TX
Posted:
October 29, 2024

Contact this candidate

Resume:

AKSHAY SINGH THAKUR

New York, USA Ph: +1-346-***-**** LinkedIn Email: *******************@*****.*** SUMMARY

With over 5 years of IT experience, I specialize in Big Data technologies, including Hadoop (Cloudera, Hortonworks), AWS, Azure, and GCP. I have extensive expertise in Hadoop/Spark environments (HDFS, YARN, Kafka) and cloud-native solutions like AWS EMR and GCP Dataproc. Skilled in Scala, Java, Python, and ETL tools (AWS Glue, SSIS), I excel in Spark Core, MLlib, and streaming technologies. Strong experience with RDBMS/NoSQL integration and cloud data services (AWS Redshift, Azure Data Lake, GCP BigQuery), and data visualization using Tableau, Power BI, Looker etc., SKILLS

Hadoop Distributions: Cloudera, Hortonworks Hadoop Ecosystem: HDFS, Hive, Pig, YARN, Spark, Spark SQL, MapReduce, Kafka, Sqoop

Data Processing: Apache Spark, Spark MLlib, Spark Streaming, Spark GraphX

Machine Learning: Linear/Logistic Regression, Naïve Bayes, Decision Trees, Random Forest, KNN, SVM, Gradient Boosting, PCA, LDA, Time Series Analysis

Languages: Python, Scala, Java, SQL, PL/SQL, UNIX Shell Scripting Operating System: Linux, Unix, Windows, Mac OS RDBMS & NoSQL: Teradata, Oracle, DB2, SQL Server, MySQL, MongoDB, Cassandra, HBase

Deep Learning: TensorFlow, Keras, Neural Networks (CNN, RNN), NLP (SpaCy, NLTK, Gensim)

Cloud Databases: Amazon Redshift, AWS Snowflake, PostgreSQL Visualization Tools: Tableau, Power BI Cloud Platform: AWS, Azure, GCP Python Visualization: Matplotlib, Seaborn ETL Tools: IBM InfoSphere, SQL Server Integration Services

(SSIS), Apache Sqoop

Development: PyCharm, Jupyter Notebook, IntelliJ IDEA, Eclipse, Visual Studio

Orchestration: Apache Airflow, AWS Glue, ADF, Autosys Version Control: Git, GitHub, JIRA Certification: AWS Solutions Architect Associate Other Tools: MS Office Suite, Google Analytics WORK EXPERIENCE

DATA ENGINEER GOLDMAN SACHS GLOBAL NEW YORK JUN 2023 – Current

• Implemented Apache Airflow Dynamic DAGs to retrieve data from multiple APIs and load it into Databricks’ Delta Live Tables using Autoloader via Amazon S3 and eventually to Redshift. Leveraged Databrick’s Unity Catalog to oversee Data Governance and Security.

• Built scalable and fault-tolerant pipelines using Apache Kafka and Amazon Kinesis for real-time data ingestion into Redshift. Reduced data processing latency by 25% by incorporating parallel processing. Administered Prometheus to track pipeline performance metrics.

• Orchestrated end-to-end migration from PostgreSQL to AWS Redshift through S3 using Talend transferring 10TB of data within a tight deadline of 3 weeks. Incorporated efficient data validation and mapping, securing 99% data accuracy alongside refined data governance.

• Maintained SCD Type 1 and 2 using Talend accommodating historic changes and ensuring data integrity. Conducted performance tuning to resolve bottlenecks in the SCD implementation. Enhanced Talend job execution time by optimizing SCD processing.

• Utilized Apache Flink's windowing and event-time processing in conjunction with Databricks’ Delta Lake and performed Time series analysis using Tableau to identify customer retention trends and churn rate, leading to a 10% reduction in churn rate.

• Implemented performance tuning techniques on Databricks to optimize query execution times. Reduced query response times by 40% through efficient data partitioning and caching strategies.

• Integrated AWS Lambda with Simple Queue Service (SQS) for data enrichment workflows, processing 100,000+ daily messages and, designed event-driven Lambda functions to boost user engagement by 20% through personalized real-time notifications via SNS.

• Gathered business requirements, and worked closely with business users, project leaders, and developers. Created logical and physical data modeling on OLAP on Star Schema for fact and dimension tables using ERWIN.

• Collaborated with the Development team to refactor SQL code, optimized queries by minimizing the use of joins/subqueries, analyzed query execution plans, and crafted indexing techniques, significantly improving performance.

• Aligned efforts with the DevOps team and maintained highly scalable AWS environments spanning multiple availability zones using Jenkins, Terraform, and CloudFormation. Written Terraform scripts to spin up Docker, Kubernetes clusters, and CloudWatch alerts.

• Applied robust data security measures to protect personal and financial information ensuring compliance with GDPR and HIPAA. DATA ENGINEER COFORGES HYD, INDIA JUL 2018 – DEC 2021

• Built real-time streaming solutions using Apache Kafka, Azure Stream Analytics, and Google Pub/Sub for time-sensitive data ingestion, processing, and analysis.

• Managed Azure services like Azure VMs, Azure Blob Storage, Azure Synapse Analytics, and GCP services like BigQuery and Cloud Storage, enhancing scalability and data availability while reducing operational costs by 20%.

• Migrated data environments from Azure to GCP, improving system performance and reducing costs by 20%.

• Designed data lake architectures using Azure Data Lake Storage and Google Cloud Storage, supporting the efficient retrieval and processing of large datasets, enhancing data accessibility by 30%.

• Leveraged SSIS to extract data from API, Flatfiles, and CSV to load data into the healthcare database. Performed efficient mapping and validation of the data. Used parallel processing and batch loading to achieve efficient data loading operations.

• Deployed Apache Airflow on GCP to orchestrate ETL workflows, improving scheduling efficiency, reducing execution times by 30%.

• Led the deployment of a ML model for fraud detection, reducing fraudulent transactions by 15% using GCP AI Platform and real-time data processing. Mentored junior engineers, leading to a 30% improvement in onboarding speed and skill development. EDUCATION

University of Houston Houston, Texas JAN 2022 - MAY 2023 Master of Science, Data Science

ICFAI University Hyderabad, India AUG 2014 - MAY 2018 Bachelors in technology, Electronic and Communications Eng



Contact this candidate