Big Data Engineer

Location:

United States

Posted:

November 07, 2023

Contact this candidate

Resume:

Page * of *

SUMMARY:

Over **+ years of professional IT experience including 8+ years experience in designing, developing and implementing strategic methods to efficiently solve Big Data processing requirements and Orchestrating data pipelines.

AWS Certified Solutions Architect Associate and Microsoft certified Azure Data engineer Associate.

Proven ability to collaborate effectively with cross-functional teams, including data scientists, business analysts, and software engineers to deliver data solutions that meet business needs.

Designed and implemented data solutions on AWS, Azure and Snowflake cloud for collecting, processing and analyzing large datasets from multiple sources.

Strong background in designing and implementing complex data pipelines and ETL processes to extract, transform, and load data from various sources into data warehouses.

Expertise in developing and deploying big data technologies to manage and process large volumes of data.

Experience in building scalable and highly available data architectures, including data modeling, data governance, and data security.

Deep knowledge of distributed computing, data partitioning, and data sharding, along with experience in optimizing and scaling data systems.

Proven ability to collaborate effectively with cross-functional teams, including data scientists, business analysts, and software engineers to deliver data solutions that meet business needs.

Demonstrated providing technical guidance, training and driving continuous improvement in data engineering practices.

Experience of designing clouds models for Infrastructure-as-a-Service(Iaas),Platform-as- a-Service(Paas) and Software-as-a-Service(Saas).

Experience in Application Migration and Data migration from On-premise to Cloud services.

Experience in data storage and management with Amazon S3 for object storage, Amazon DynamoDB for NoSQL databases, and Amazon RDS for relational databases.

Experience in designing serverless architectures using AWS Lambda and API Gateway, enabling event-driven and cost-efficient application development.

Experienced with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Sprak Streaming, MLib and worked explicitly on Pyspark and Scala.

Experience in Design and Development of Real time stream processing applications using Spark, Kafka, Scala and Hive to perform streaming ETL and apply Machine Learning.

Proficient in dev ops in automating, building, deploying and releasing code from one environment to another environment, working knowledge of branching and merging code lines in GIT and resolving conflicts during merges.

Experience with integration of NoSQL database like Hbase with Map Reduce to move bulk amount of data into HBase.

Extensive experience in developing ER Conceptual/Logical/Physical data model for Big Data, transactional systems(OLTP) and Data ware house systems(OLAP).

Strong understanding of Data warehouse concepts, data modeling experience using Normalization, Business Process Analysis, Reengineering, physical & logical data modeling.

Experience in Object Oriented Analysis and Design(OOAD) and development of software using UML design patterns.

Experience in Design and Development of Real time stream processing applications using Spark, Kafka, Scala and Hive to perform streaming ETL and apply Machine Learning.

Experience in building Proof of concepts/Micro Service Architectures/Reusable Frame Works/Continuous Integration and Continuous Deployment(CI/CD )

Ashok Vinoda rao

Azure Data Architect

LinkedIn : https:// www.linkedin.com/in/ashok-vinodarao Phone : 480-***-**** Email : *************@*****.*** Page 2 of 6

Experience in devops in automating, building, deploying and releasing code from one environment to another environment, working knowledge of branching and merging code lines in GIT and resolving conflicts during merges.

AREA OF EXPERTISE:

Programming Languages: Python, SQL, C#, Java, Apex Big Data Ecosystems: Hadoop, Map Reduce, Spark, HDFS, HBase, Hive, Oozie Cloud Computing: AWS, Azure, Snowflake.

ETL/ELT: Apache Nifi, Apache Airflow, DataStage

AWS Services: Amazon Athena, Amazon Aurora,AWS Batch, AWS CDK, Amazon CloudWatch, AWS CodeDeploy, AWS Data Pipeline, AWS Elastic Beanstalk, Amazon EFS, EC2, Amazon EMR, AWS Glue, AWS Glue DataBrew, AWS Lake Formation, AWS Lambda, Amazon Open Search Service, Amazon Redshift, Amazon RDS, Amazon S3,Amazon Kinesis, Amazon Quicksight.

Azure Services: Azure Blob Storage, Azure Cosmos DB, Azure Data Factory, Azure Databricks, Azure Data Lake Storage, Azure Functions, Azure Monitor, Azure Synapse Analytics, Azure Stream Analytics, Azure HDInsight, Azure SQL,Power BI Embedded.

Scripting Languages: UNIX Shell Scripting, Python, JavaScript Data Visualization: Tableau, Power BI.

Web Technologies: Rest API, SOAP Web Services, Rest Api,C#.Net, Sales Force,Postman Version Control: Visual Source Safe, Git, CVS, SVN Middle Ware Technologies: Kafka, MQ, Mule Soft

Business Intelligence Tools: Tableau, SQL Server Reporting Services, Active Reports, Crystal reports IDE’s: Eclipse, VS Code

Build Tools: Jenkins, Copado, Auto Rabbit, Maven, Ant Databases: Hive, Hbase, SQL Server, IBM DB2, Oracle Methodology: Agile, Waterfall, Object-Oriented Analysis/Design, UML Operating Systems: UNIX, Linux, Windows

EDUCATION:

Master of Information Technology (Software Development) Griffith University Australia 2001 CERTIFICATIONS:

AWS Certified Solution Architect Associate

AWS Certified Data Analytics Specialty

Microsoft Certified Azure Data Engineer Associate

PROFESSIONAL EXPERIENCE:

Sr Big Data Solutions Architect American Express Mar 2018–July 2023 Responsibilities:

Designed and implemented ETL pipelines for a petabyte-scale dataset using AWS Glue, reducing data processing time by 50% and increasing data consistency and integrity.

Utilized AWS Glue DataBrew to cleanse, normalize, and transform the incoming data, resulting in a significant improvement in data quality and accuracy.

Developed and deployed AWS Lambda functions to handle real-time data processing and automate data transformations, increasing data processing efficiency by 30%.

Built a data pipeline using S3 and AWS Data Pipeline to transfer and process data between disparate systems, enabling seamless data integration and reducing manual intervention.

Configured CloudWatch alarms to monitor and alert on pipeline failures and bottlenecks, improving pipeline reliability and reducing downtime and ensuring high availability and performance of data processing pipelines and systems.

Implemented complex transformations on streaming data using AWS Glue and AWS Lambda, improving real- time data processing speed by 40% and reducing pipeline latency.

Developed custom data connectors using AWS Glue to extract data from diverse data sources, such as SQL databases and NoSQL databases, improving data accessibility and increasing data source compatibility.

Leveraged AWS Glue to extract, transform and load data from unstructured data sources, such as log files enabling the organization to analyze and derive insights from previously untapped data sources. Page 3 of 6

Utilized Amazon Open Search Service for efficient indexing and searching of large volumes of unstructured data, improving data accessibility and query performance.

Built a data warehousing solution using Amazon Redshift, enabling efficient and cost-effective storage and analysis of large volumes of structured data.

Configured Amazon RDS to store and manage transactional data, providing a highly available and scalable relational database solution for mission-critical applications.

Built real-time analytics dashboards using Amazon Quicksight, providing actionable insights for business stakeholders and enabling data-driven decision-making.

Worked with multiple data formats, including JSON, CSV, and Parquet, to build data pipelines and ensure data consistency and compatibility across different systems and tools.

Designed and implemented data ingestion processes for multiple data sources, including APIs, databases, and file systems, using AWS Lake Formation and AWS Kinesis, ensuring seamless data.

Performing various benchmarking steps to optimize the performance of sparkjobs and thus improve the overall batch processing.

Implemented continuous deployment pipelines using AWS CodeDeploy, enabling seamless and automated software updates and releases.

Configured and maintained Amazon EFS for highly available and scalable shared file storage, enabling seamless data access and sharing across different compute instances and workloads.

Developed NiFi Ingestion to read from Kafka Queues and transform the xml data to load to Hbase.

Used Kafka APIs and tools such as Kafka Connect, Kafka Streams, KSQL, etc. to implement data pipelines and stream processing applications.

Collaborated with DevOps teams to automate and streamline the deployment and management of data infrastructure and applications, utilizing tools such as Ansible, Terraform, and Kubernetes, resulting in improved efficiency and reduced downtime.

Implemented ORC file format and employed various optimization techniques, such as partitioning and bucketing in Hive, to enhance query performance for improved data processing and analysis.

Designed and implemented a CI/CD pipeline utilizing Gitlab/Jenkins, ensuring efficient and seamless deployment of code changes.

Developed an Ingestion Framework using Apache NIFI to ingest files from SFTP to HDFS, enabling the efficient transfer of financial data and facilitating better data processing.

Employed AWS Command Line Interface (CLI) commands proficiently to execute file transfers seamlessly to and from Amazon S3 for efficient data transfer.

Utilized Glue jobs to perform data transformations via the PySpark script, and effectively stored the transformed data in S3, ensuring data consistency and integrity.

Optimized EMR usage by establishing highly scalable and fault tolerant ETL pipelines to extract data from diverse sources and store it in the S3 data lake, resulting in a significant boost in output efficiency.

Implemented AWS CodeDeploy to automate the deployment of pipeline updates, ensuring consistency and reliability across all instances.

Migrated data from different databases, such as Amazon RDS and Amazon Aurora, to the pipeline, ensuring data consistency and maintaining data integrity during the migration process.

Used Python packages like Pandas and NumPy for data modification purposes.

Conducted performance tuning and optimization activities on data migration workflows to improve migration speed, reduce costs, and enhance the user experience.

Configured and managed Airflow clusters, optimizing performance and improving scalability.

Collaborated with cross-functional teams to develop custom DAGs (Directed Acyclic Graphs) and workflows to automate ETL (Extract, Transform, Load) processes, resulting in increased efficiency and reduced manual effort.

Created custom plugins and operators to extend Airflow’s functionality and enable seamless integration with other tools and services.

Maintained Airflow metadata database and ensured data integrity and security, utilizing best practices for database management and access control.

Worked with Snowflake's SnowSQL command-line interface and Snowflake Web UI to interact with data and manage Snowflake resources.

Page 4 of 6

Implemented Snowflake's automation and orchestration features, such as Snowflake Tasks and Snowflake Streams, to streamline data processing workflows and reduce manual efforts. Environment:Hadoop, Amazon Web Services (AWS),Microsoft Azure Services,Databricks, Snowflake,Apache NIFI, Apache Airflow, Python, JAVA, Python,HDFS, HiveQL, PySpark, MySQL,Kafka,Spark Streaming, UNIX Shell Scripting,Oozie, Docker, Jenkins, Ansible, Terraform, Kubernetes and Tableau Sr Data Engineer State Street Corporation Boston MA Feb 17 - Mar 2018 Responsibilities:

Designed and developed ETL pipelines using Azure Data Factory to extract data from various sources, transform it as per business requirements, and load it into the target database for analytics purposes.

Created Azure Stream Analytics jobs to process real-time data and generate alerts based on predefined criteria.

Utilized Azure Databricks to build and deploy Spark clusters to perform data processing and analytics tasks.

Developed data models using Azure Cosmos DB, Azure SQL, and Azure Synapse Analytics to support analytics and reporting requirements.

Configured and managed Azure Data Lake Storage to store and manage large volumes of structured and unstructured data.

Implemented Azure Functions to automate data processing tasks and trigger events based on predefined criteria.

Utilized Azure HDInsight to run Hadoop clusters and process big data workloads at scale.

Monitored and optimized data solutions using Azure Monitor to identify and resolve performance issues.

Created pipelinejobs, scheduling triggers, Mapping data flows using AzureDataFactory(V2) and using KeyVaults to store credentials.

Provided production support for cluster maintenance and its tuning.

Triggered workflows based on time or availability of data using oozie.

Proficient in using Databricks notebooks for data exploration, data cleaning, and data transformation.

Designed ETL/ELT process to move data through the data processing pipeline for internal and external large data sets in Hadoop environment along with DataStage ETL tool.

Developed and deployed machine learning models using Databricks MLflow and other frameworks such as Scikit-learn and TensorFlow.

Implemented real-time data processing and streaming solutions using Databricks Structured Streaming and Kafka.

Conducted performance tuning and optimization of Databricks clusters and Spark jobs for efficient and scalable data processing.

Developed framework to read the data via API call and process the data via python.

Setting-up Kafka brokers, Producers and Consumers. Orchestrating Data pipelines to Hadoop with Kafka.

Worked on DataStage tools like DataStage Designer, DataStage Director, worked on data migration, data cleansing process.

Developed a data ingestion pipeline using Avro as the serialization format to efficiently handle large volumes of data.

Supported Map Reduce Programs those are running on the cluster. Managing and reviewing Hadoop log files to identify bugs.

Scheduling Oozie workflow and Spring Batch for configuring the workflow for different jobs like Hive, MapReduce.

Implemented Agile development practices in data engineering workflows, including source control management, testing, and deployment automation.

Environment: Hadoop,Microsoft Azure Services, Databricks, DataStage 8.5, HDFS,Hive, MapReduce, Oozie, Java, Python, Snowflake, Talend, MySQL,PostgreSQL,Spark SQL,HiveQL,Power BI,Node.js, Avro, UNIX Shell Scripting, Kafka.

Bigdata Developer American Express Phoenix AZ Mar 2015 - Feb 2017 Responsibilities:

Worked on a hadoop cluster running on a MapR distribution.

Created Hive tables, loaded with data and writing hive queries using Hive QL.

Used Sqoop to ingest data from various source systems in to HDFS. Page 5 of 6

Wrote and executed PIG scripts using Grunt shell and Big data analysis using Pig and User defined functions

(UDF).

Worked with HiveQL on big data of logs to perform a trend analysis of user behavior on various online modules.

Developed UDF's to implement complex transformations on Hadoop. Worked on optimizing Shuffle and Sort phase in Map Reduce Phase.

Involved in moving all log files generated from various sources to HDFS for further processing through Flume.

Migrated the existing data to Hadoop from RDBMS(DB2 and SQL server) using sqoop for processing the data.

Analyzed the weblog data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on site.

Participated in data visualization and provided the files required for the team by analyzing the data in hive and developed Pig scripts for advanced analytics on the data.

Developed transformations and aggregated the data for large datasets using Pig and Hive scripts.

Worked on partitioning and used bucketing in Hive Tables and running the scripts in parallel to improve the performance.

Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.

Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.

Developed Oozie workflows and scheduled on a monthly basis.

Used Sqoop to import the data from RDBMS to HDFS and later analyzed the imported data using Hadoop components.

Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.

Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability.

Developed Pyspark scripts, UDF's using both data frames/SQL and RDD/Map Reduce in Spark for data Aggregation, queries and writing data back into RDBMS through Sqoop.

Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark.

Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.

Load the data into Spark RDD and performed in-memory data computation to generate the output response.

Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, paired RDD’s, Spark YARN.

Managed and reviewed Hadoop log files to identify issues when job fails.

Exported the Analyzed data to Relational Database (DB2 ) using Sqoop for visualization and to generate reports for the BI team.

Environment: Hadoop - 2.0, MapReduce,Spark,Scala, Python,HDFS,Hive-0.10.0,Pig-0.11.1,Sqoop- 1.4.3, Flume- 1.2.0, Hue,Java,Eclipse,SVN,Oozie-3.3.2, DB2,Linux Tech Lead American Express Phoenix AZ Nov’ 10 - Mar 2015 Responsibilities:

Participated in architecture, design and code reviews.

Performing code review to maintain reusability, maintainability, centralization and best practices of coding.

Participated on daily onsite/offshore meetings, provided knowledge transfer to offshore team members and code reviews.

Designed the SQL server database model for the project and involved in writing the stored procedure and database views.

Extensively used UML in the design process- Use cases, Process diagrams, Sequence diagrams, Class diagrams.

Used ASP.NET Microsoft AJAX Toolkit controls like Script Manager, Update Panel within some web pages, thus reducing the number of round trips to the server per page and improved user experience.

Used ADO.NET objects like Dataset, Data View, Data Adapter and Data reader for connecting, retrieving and modifying the data in the database.

Wrote stored procedures, Triggers and functions to improve the throughput of application using optimized queries.

Performed Unit Testing using tools like Nunit Frame work, integration testing with QA team. Environment: Visual Studio 2005, .Net2.0, C#, SQLServer 2000,ADO.Net,ASP,ADO, TSQL, XML, Java Script, HTML/DHTML,PVCS

Page 6 of 6

Sr .Net/SSIS/SSRS Developer Center to Promote Health Care Access Inc Apr’ 06 – Oct ‘10 Responsibilities:

Designed GUI for user interface and data layer for accessing database through entity frame work with Database first approach and developed business layer where all the business objects and rules are written using C#.NET

,ADO.Net and .NET Framework 3.0.

Used ASP.NET session state variables that enable to store and retrieve values for a user, as the user navigates ASP.NET pages in the application.

Implemented Exception Handling using Try Catch, Finally Blocks for the custom error output.

Developed SSIS packages to perform Extract, Transform and Load data from various heterogeneous sources and destinations like flat files, CSV, SQL Server using tasks and transformations provided by SSIS.

Used SSRS to generate reports and integrated with asp.net webpage using report viewer.

Developed test cases and performed Unit Testing to identify and resolve functional and usability issues. Environment: Windows 2000 Server, Microsoft .NET Framework 3.5/4.0,C#,ASP.NET, WCF,SOAP,ADO.NET, HTML, XML, XSL, XSLT, VBScript, JavaScript, UML,ADO, Visual Studio 2008, IIS 7.5, SQL Server 2008 R2, SVN,,SSIS,SSRS

.Net Developer Suncorp Metway Brisbane Australia Jan ‘04 - Feb ‘06 Responsibilities:

Designed and developed UI in Visualstudio.Net 2005 Environment.

Various re-usable classes/components have been created using C#. Front-end applications are designed using HTML/DHTML and validated using ASP.NET validation controls along with VBScript, and JavaScript.

Used Data Adapters, Datasets and Data reader Objects of ADO.NET to manipulate Data in the SQL Server Database.

Environment: Visual Studio 2005, .Net2.0, C#, SQLServer 2000,ADO.Net,ASP,ADO, TSQL, XML, Java Script, HTML/DHTML,PVCS

Contact this candidate