Lenin Ch
*****.****@*****.*** +1-469-***-**** Plano, Texas www.linkedin/in/lenin-ch40
PROFESSIONAL SUMMARY:
Around 8 years of experience in IT as a Sr. Data Engineer with expertise on cloud platforms like Azure and AWS, which helps to design, analyze, and develop ETL data pipelines. Experienced in developing both batch and real-time streaming data pipelines using cloud services, Python and PySpark Scripts.
Experienced in using AWS services like IAM, EC2, Glue, Glue Jobs, Glue Crawlers, Redshift, Lambda, SQS, CloudWatch, Code Artifact, Athena, Code Commit, Code Build, RDS, Quick Sight.
Utilized Lambda to set up triggers and destinations, used SQS to monitor the pipeline, and CloudWatch to check logs and alarms.
Streamlined ETL data pipelines using Azure Data Factory to extract, transform and load data from storage accounts like Azure Data Storage, Azure SQL, and Azure Blob Storage.
Excellent understanding and knowledge of Azure services like Azure Databricks, Azure Stream Analytics, Azure Synapse Analytics, Azure Log Analytics, Azure Security Center, Azure Event Hubs, HDInsight, Azure Logic Apps, Triggers, Azure Cosmos DB.
Used AWS Kinesis and Apache Kafka in AWS and utilized Kafka and Azure Stream Analytics to create real- time data pipelines in Azure.
Used Azure Security Center, Azure Log Analytics and Splunk to monitor, automate and secure the data Pipeline.
Experienced in working with Hadoop components like HDFS, Map Reduce, Spark, HBase, Oozie, Pig, Flume, and Kafka.
Implemented Scala for connecting and retrieving data from NoSQL databases such as MongoDB, PIG, Hive, Cassandra, and HBase.
Analyzed the SQL scripts and created solutions using Scala and utilized SQL Queries, Store Procedures, Triggers, and functions for MySQL Databases.
Implemented portioning and Bucketing on Hive tables for Hive Query Optimization. Expertise in converting Hive/SQL queries to spark transformations using Spark RDD and Pyspark concepts.
Created DAGs in Airflow to schedule and run the jobs and automate the job process.
Experienced in using Data Visualization tools like AWS Quick Sight, Tableau, and Power BI to generate and develop reports, dashboards, and worksheets.
Built and managed CI/CD pipelines using Code Pipeline and Azure DevOps.
Worked with Azure Boards, which is an Agile project management tool that allows tracking and managing tasks. Azure Boards provide features such as Kanban boards and sprint planning, which help the team to work together effectively. Strong understanding of Software Development Lifecycle (SDLC). TECHNICAL SKILLS:
AWS Services IAM, EC2, Glue, Glue Jobs, Glue Crawlers, Redshift, Lambda, SQS, CloudWatch, Amazon Aurora, AWS Quick Sight, Code Pipeline.
Azure Services Azure Data Factory, Azure Databricks, Azure Stream Analytics, Azure Data Storage Accounts, Azure Synapse Analytics, Azure Log Analytics, Azure Security Center, Azure Event Hubs, HDInsight, Azure Logic Apps Triggers, Azure Cosmos DB. Big Data Technologies Apache Hadoop, Hive, Sqoop, Spark RDDs, Scala, HDFS, MapReduce Program. Data Warehouses Redshift, Azure Synapse Analytics
Programming Languages and
Databases
Python, Pyspark, SQL, Azure SQL, PostgreSQL, Cosmos DB, Linux, Bash, MongoDB, Cassandra, and HBase.
CI/CD Tools Code Pipeline, Azure DevOps
Other Technologies AWS Quick Sight, Tableau, Power BI. PROFESSIONAL EXPERIENCE:
Client: Citi Bank New Castle, Delaware. September 2022- Present Role: Sr. AWS Data Engineer
Citi bank is an American multinational investment bank and financial services corporation. Our project deals with the banking sector, which handles large amounts of customer data. I am working as an Sr. AWS Data Engineer in call data analytics regarding Banking transactions related issues and involved in transforming the real-time data which we receive from APIs like Member Account Information, Recent Transactions data, etc. Responsible for Integrating data and performing ETL data pipelines using both batch compute processing jobs and real-time processing jobs.
Using Spark, I created data processing jobs such as reading data from other sources, integrating the resulting data, doing data enrichment, and loading it into data warehouses.
Used spark applications and developed Spark codes to ingest batch data from RDBMS, SQL, and Teradata and used Amazon Kinesis and Apache Kafka to ingest real-time streaming data into S3 buckets.
Utilized S3 buckets and Glacier for storage and backup on AWS and created EC2 instances to access data in S3 buckets and IAM roles to have access to all AWS Services.
Worked on creating ETL pipelines using GLUE by extracting data from S3, integrated, transformed data, and stored enriched data in Redshift.
Utilized Glue Jobs to create ETL data pipelines and Glue Crawlers to convert files to datasets. Developed AWS Lambda to set up Triggers and Destinations to invoke Glue Job as soon as a new file is available in the Inbound S3 bucket.
Utilized SQS (Simple Queue Service) to monitor the data pipeline and used CloudWatch to check the logs and alarms.
Performed SQL queries by creating Databases that use database platforms such as SQL, PostgreSQL, Amazon Aurora, and MariaDB. Used Amazon Aurora, which provided up to 5 times better performance than typical SQL and 3 times better performance than PostgreSQL database.
Developed dashboards and stories using AWS Quick Sight and worked on Tableau.
Created Airflow DAGs (Directed Acyclic Graphs), which we write in Python, helps to execute tasks in the specified order, and provide flexibility in scheduling and automating the ETL pipelines.
Performed CI/CD operations with Code Commit, Code Build and Code Pipeline.
Used Code Artifact to develop and create packages in repositories and used Code Commit to track and manage the code.
Used Code Build to perform continuous integration, which helps to build artifacts and deploy the data to production.
Strong understanding of Software Development Lifecycle (SDLC) and used Kanban boards for Agile Methodology.
Tech Stack: IAM, EC2, Glue, Glue Jobs, Glue Crawlers, Redshift, Lambda, SQS, CloudWatch, Code Artifact, Code Commit, Code Build, SQL, Amazon Aurora, Maria DB, PostgreSQL, AWS Quick Sight, Tableau, Airflow DAGs, Code Pipeline.
Client: Barclay’s Group Services Whippany, New Jersey. August 2021 – September 2022 Role: Azure Data Engineer
Barclays is a British Universal Bank that offers life and health insurance, mortgages, credit cards, loans, and bank accounts. I worked as an Azure Data Engineer to manage and process large datasets of customers and integrated data from multiple sources and store them in Azure Data Storage. We identify and clean inaccurate data and null data to filtered data for developing reports by creating ETL data pipelines.
Ingested all real-time streaming data using Apache Kafka and Azure Stream Analytics to Azure Storage Accounts and used for creating pipelines.
Created and managed pipelines using Azure Data Factory, extracted data from Azure Data Storage Accounts, and stored it in the source database, which was further transformed and loaded to Azure Synapse Analytics using Azure Data Factory.
Developed Azure Security Center and Azure Log Analytics, Splunk to monitor, automate and secure the data Pipeline.
Used Azure Databricks to develop, deploy and monitor the ETL pipelines and worked on integrating Azure Databricks with Azure Services such as Azure SQL Database and Azure Data Lake Storage.
Used Azure Event Hubs, which provided a highly scalable solution for data ingestion, enables collection, storage, is used for real-time data processing.
Developed and maintained Azure Logic Apps triggers for orchestrating ETL workflows.
Created Clusters using Azure Databricks, where these clusters are processed using HDInsight clusters which handle Big Data in Azure and are used to store streaming data from Apache Kafka to Apache HBase.
Proficient in writing complex SQL queries and developing stored procedures for data analysis.
Developed Spark applications for data extraction, transformation, and aggregation from numerous file formats for analysis using PySpark and Spark-SQL in Databricks to modify and aggregate source data before importing it into Azure Synapse Analytics for reporting.
Designed and developed user-defined functions, stored procedures, and triggers for Azure Cosmos DB which is used to store catalog data.
Generated reports and dashboards using Azure synapse analytics and experienced visualization tools like Power BI and Tableau.
Developed, managed, and deployed Azure DevOps pipelines and Azure DevOps services for creating CI/CD pipelines.
Worked with Azure Boards, which is an Agile project management tool that allows tracking and managing tasks. Azure Boards provide features such as Kanban boards and sprint planning, which help the team to work together effectively.
Tech Stack: Azure Data Factory, Azure Databricks, Apache Kafka, Azure Stream Analytics, Azure Data Storage Accounts, Azure Synapse Analytics, Azure Log Analytics, Azure Security Center, Azure Event Hubs, HDInsight, Spark, Azure Logic Apps Triggers, Azure Cosmos, Power BI, Azure DevOps, and Azure Boards. Client: GSPANN Technologies Inc, Hyderabad, India. March 2019 – August 2021 Role: Hadoop developer
GSPANN Technologies Inc is a technology services firm that offers digital transformation, data analytics, and e-commerce solutions. As a Hadoop Developer, implemented comprehensive data and analytical model solutions to satisfy the demands of a fast-paced corporate environment. We assisted in teams in gathering, analyzing, and transforming data to support business objectives and maximize the value of big data practices and technology.
Installed and configured Apache Hadoop to test the maintenance of log files and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
Involved in converting Hive/SQL queries and MapReduce jobs into Spark transformations using Spark RDDs and PySpark concepts.
Developed code to import data from SQL Server into HDFS and created Hive, which reduced the complexity of the MapReduce program in HDFS using Spark in Python.
Utilized Hadoop Map Reduce to process large data sets. Imported and exported data from MySQL/Oracle to HiveQL using Sqoop.
Used Hive partitioning and Bucketing, as well as several types of joins on Hive tables and also involved in the creation of Hive external tables to execute ETL on daily produced data and this produced data is integrated and ingested into Hive for further filtering and cleansing.
Used Apache Flume to Collect and aggregate huge amounts of log datasets and used HDFS to stag data for further analysis.
Used Scala for connecting and retrieving data from NoSQL databases such as MongoDB, PIG, Hive, Cassandra, and HBase.
Tech Stack: Apache Hadoop, Hive, Sqoop, Spark RDDs, Pyspark, Scala, SQL, HDFS, MapReduce Program, HiveQL, MongoDB, Cassandra, and HBase.
Client: Risk Edge Solutions Hyderabad, India. April 2017 – March 2019 Role: SQL Developer
Risk Edge is a risk and predictive analytics firm that uses complex algorithms to deliver better business insights. We work on Risk Edge applications and provide customers with the best facilities possible to solve their business problems. We worked with vast data sets where we broke down information, gathered relevant insights, and solved advanced business problems using SQL.
Developed SQL packages, Procedures, and Functions for loading data into database tables based on user requirements.
Developed PL/SQL and shell scripts to automate processes.
Utilized joins and subqueries to simplify complicated queries involving several tables and used optimizing procedures and triggers in production.
Utilized complex Stored Procedures, Triggers, Tables, and SQL joins for applications. Wrote and executed T- SQL to store data and maintain the database efficiently in MS SQL Server.
Designed and built high-performance ETL packages to migrate and manipulate data from Oracle, MS Excel, and SQL servers using SSIS.
Data was transferred from OLTP (Online Transaction Processing) databases to the staging area and then into the data warehouse using SSIS (SQL Server Integration Service) and T-SQL stored procedures.
Experienced in creating interactive dashboards, worksheets, and scorecards utilizing Power BI and Tableau. Tech Stack: SQL, T-SQL, SSIS, Stored Procedures, SQL Joins, OLTP, Power BI, and Tableau. Client: LatentView Analytics Hyderabad, India. August 2015 – April 2017 Role: Data Analyst
LatentView Analytics is a global leader in analytics and decision sciences, providing solutions that assist businesses in driving digital transformation and using data to create a competitive advantage. Developed and implemented high-impact analytics solutions to a wide range of business challenges involving product, marketing, and customer insights.
Tableau functional reports were designed, produced, tested, and maintained depending on user needs.
Extracted transactional data to metrics using SQL and Python and loaded data to a visualization dashboard Tableau which automated working efficiency by 75%.
Analyzed requirements and provided optimized system solutions and system design documentation and developed detailed project plans by extracting the business requirements from end users and keeping them in mind for application and documenting it for developers.
Created Stored Procedures, Views, and SQL queries to import data from the SQL server to Tableau.
Used Pandas and SQL to clean, transform, prepare, and normalize data for analysis.
Created and implemented actions, filters, parameters, and calculated fields in Tableau to generate dashboards and worksheets.
Created ad-hoc reports as requested by clients and management. Tech Stack: Tableau, SQL, Python, and Stored Procedures.