Big Data Cloud Engineer

Location:

Rodessa, LA, 71069

Posted:

September 10, 2025

Contact this candidate

Resume:

Teja Garikapati

****************@*****.***

603-***-****

SUMMARY

• Dynamic designing Warehouse and data / Data motivated intensive Mart, Data IT applications professional Visualization, with using Reporting, around Hadoop and 8 Ecosystem, years Data of Quality experience Big Data solutions. as Analytical, a Big Data Cloud Engineer Data with engineering, expertise Data in

• In-Application depth knowledge Master, Resource of Hadoop Manager, architecture Task and Tracker, its components and MapReduce like YARN, programming HDFS, Name paradigm. Node, Data Node, Job Tracker,

• Extensive as Apache experience Spark, MapReduce, in Hadoop HDFS, led to Sqoop, the development PIG, Hive, HBase, of enterprise-Oozie, Flume, level solutions NiFi, Kafka, utilizing Zookeeper, Hadoop and components YARN. such

• Profound Strong the MapReduce knowledge experience programming of performing distributed paradigm Data systems Ingestion and architecture Spark and execution data and Processing parallel framework. processing, (Transformations, as well as enrichment, an in-depth and understanding aggregations)of .

• Experienced Context, Spark-with SQL, Spark Data improving frame API, the Spark performance Streaming, and Mllib, optimization Pair RDD’s of and the worked existing explicitly algorithms on in PySpark Hadoop and using Scala. Spark

• Handled Hive tables. using Map Flume Experience ingestion Reduce, interceptors. and importing of data then from loaded streaming different data into data data HDFS. into sources HDFS Managed into using HDFS Sqoop Flume using jobs sources with Sqoop and incremental Flume, Flume performed sinks load and to populate transformations transforming HIVE the external using data

• Experience control flows. in Oozie and workflow scheduling to manage Hadoop jobs by a Direct Acyclic Graph (DAG) of actions with

• Implemented KDC server setup, the security creating requirements realm /domain, for and Hadoop managing. and integrated them with Kerberos authentication infrastructure-

• Experience optimize performance. with Partitions, Experience bucketing with concepts different in file Hive, formats and like designing Avro, parquet, both Managed ORC, JSON, and and External XML. tables in Hive to

• Expertise common Cloud Storage Operators in creating, Object in Sensor, Debugging, Airflow GoogleCloudStorageToS3Operator. – Python Scheduling, Operator, and monitoring Bash Operator, jobs using Google Airflow Cloud and Storage Oozie. Download Experienced Operator, using the Google most

• Hands-as used MongoDB, Phoenix on experience Hbase, to create Cassandra, an handling SQL SQL layer database server, on Hbase. and issues PostgreSQL. and Created connections Java apps with to handle SQL and data NoSQL in MongoDB databases and Hbase. such I

• Experience Triggers, and designing Transactions. and creating RDBMS Tables, Views, Created Data Types, Indexes, Stored Procedures, Cursors,

• Expert Migration in and designing Transformation ETL data from flows Oracle/using Access/creating Excel mappings/Sheets workflows using SQL Server to extract SSIS. data from SQL Server and Data

• Expert file set, in Complex designing flat Parallel file, modify, jobs using Aggregator, various and stages XML. like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup

• Hands-Scaling, on CloudWatch, experience SNS, with SES, Amazon SQS, Lambda, EC2, Amazon EMR, and S3, Amazon other services RDS, VPC, of the IAM, AWS Amazon family. Elastic Load Balancing, Auto

• Created setting for and multiple configured Denodo a new node, batch and job created in Denodo load scheduler balance to with improve email performance notification activity. capabilities, implemented Cluster

• Instantiated, to environments created, and applications. and maintained Worked CI/CD on (continuous various automation integration tools & deployment) like GIT, Terraform, pipelines and and Ansible. applied automation

• Experienced dimension) dimensional modeling (Star schema, Snowflake schema), transactional modeling, and SCD (Slowly changing

• Experienced applications using with JSON-Python-based integrated RESTful IDEs web like services Sublime and Text XML/and QML-PyCharm. based SOAP web services. I also worked on various

• Developed web-based applications using Python, DJANGO, QT, C++, XML, CSS3, HTML5, DHTML, JavaScript, and jQuery.

• Experience Reshape2, PyTest rjson, in using plyr, various pandas, packages numpy, in seaborn, Python scipy, like ggplot2, matplotlib, caret, scikit-dplyr, learn, Rweka, Beautiful gmodels, Soup, Rcurl, Rpy2, tm, SQLAlchemy, C50, twitteR, PyQT, NLP,

• Building learning, and or other productionizing data mining predictive techniques. models on large datasets by utilizing advanced statistical modeling, machine

• Developed relationships, intricate strengthen algorithms longevity, based and on personalize deep-dive statistical customer analysis interactions. and predictive data modeling to deepen TECHNICAL SKILLS

Big Data Technologies Hadoop, Zookeeper, MapReduce, Yarn, Apache HDFS, Spark, Sqoop, Mahout, PIG, Sparklib Hive, Hbase, Oozie, Flume, NiFi, Kafka, Databases Oracle, Cosmos. MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL, Teradata, Programming PySpark, Scala, Java, C, C++, Shell script, Perl script, SQL Cloud Technologies: AWS, Microsoft Azure

Frameworks Tools Django PyCharm, Analyzer, REST SQL Eclipse, framework, Server Visual Management Studio, MVC, Hortonworks. SQL*Studio, Plus, SQL SQL Assistance, Developer, Eclipse, TOAD, SQL Postman Navigator, Query Versioning tools SVN, Git, GitHub.

Operating Network Security Systems Windows Kerberos 7/8/XP/2008/2012, Ubuntu Linux, MacOS. Database Monitoring Modelling Tool Dimension Monitoring Apache Airflow Tool: Modeling, Apache ER Airflow. Modeling, Star Schema Modeling, Snowflake Modeling Visualization/ Reporting Tableau, ggplot2, matplotlib, SSRS, and Power BI Machine Learning

Techniques Linear Associative & Logistic Rules, NLP, Regression, and Clustering. Classification and Regression Trees, Random Forest, PROFESSIONAL EXPERIENCE

State Of Maryland, Maryland Oct 2024 to Till Date

AWS Data Engineer

Responsibilities:

• Designed reporting and of voluminous, set up Enterprise rapidly Data changing Lake data. to support various use cases, including analytics, processing, storing, and

• Responsible and ensuring for Integrity maintaining in a relational quality reference environment data in by source working by closely performing with operations the stakeholders such as & cleaning, solution transformation architect.

• Designed and DynamoDB. and developed a Security Framework to provide fine-grained access to objects in AWS S3 using AWS Lambda

• Set testing up and of HDFS, worked Hive, on Pig Kerberos and MapReduce authentication to access principals cluster to for establish new users. secure network communication on cluster and

• Performed and S3. end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift,

• Implemented specific item. We the can machine automatically learning suggest algorithms using using the kinesis Python firehose to predict and the S3 quantity data lake. a user might want to order for a

• Used such as AWS Amazon EMR to Simple transform Storage and Service move (large Amazon amounts S3) and of Amazon data into DynamoDB. and out of other AWS data stores and databases,

• Used Spark SQL for Scala & amp and the Python interface that automatically converts RDD case classes to schema RDD.

• Import generate data the from output different response. sources, such as HDFS/Hbase, into Spark RDD, perform computations using PySpark, and

• Creating resources. Lambda functions with Boto3 to deregister unused AMIs in all application regions to reduce the cost for EC2

• Importing Packages). & exporting database using SQL Server Integrations Services (SSIS) and Data Transformation Services (DTS

• Coded Teradata BTEQ scripts to load and transform data, fix defects like SCD 2 date chaining, and clean up duplicates.

• Developed utilizing Spark a reusable Data Sources framework and Hive for future data objects. migrations that automates ETL from RDBMS systems to the Data Lake

• Conducted on the Tableau data server. blending and preparation using Alteryx and SQL for Tableau consumption and published data sources

• Developed Elasticsearch Kibana for near Dashboards real time based log analysis on the of Log monitoring stash data End and to End Integrated transactions. different source and target systems into

• Implemented data to S3, training AWS Step the ML Functions model, to and automate deploying and it orchestrate for prediction. the Amazon Sage Maker-related tasks, such as publishing

• Integrated Maker. Apache Airflow with AWS to monitor multi-stage ML workflows with the tasks running on Amazon Sage Environment: Kafka, HIVE, SQOOP, AWS EMR, MapReduce, S3, RDS, Snowflake, Redshift, Lambda, Apache Boto3, Pig, Python, DynamoDB, SSRS, Tableau Amazon Sage Maker, Apache Spark, Hbase, Apache AT&T, Atlanta, GA May 2022 to Sep 2024

Azure Data Engineer

Responsibilities:

• I SQL worked DB) and on Azure apply Data transformations Factory to integrate to load it data back from to Azure on-prem Synapse. (MY SQL, Cassandra) and cloud (Blob storage, Azure

• Managed, configured, and scheduled resources across the cluster using Azure Kubernetes Service.

• Monitored SQL Datawarehouse Spark cluster and improved using Log the Analytics query performance. and Ambari Web UI. Transitioned log storage from Cassandra to Azure

• Involved SQL. I also in worked developing with data Cosmos ingestion DB (SQL pipelines API and on Mongo Azure HDInsight API). Spark cluster using Azure Data Factory and Spark

• Develop with a focus dashboards on Microsoft and visualizations products like to SQL help Server business Reporting users Services analyze (data SSRS) and and provide Power insight BI. to upper management

• Performed configured data the migration pipelines, of and extensive loaded data data from sets ADLS to Databricks Gen2 to Databricks (Spark), created using and ADF administered pipelines. cluster, loaded data,

• Created DB various pipelines to load the data from the Azure data lake into Staging SQLDB and followed by to Azure SQL

• Created on Databrick. Databrick notebooks to streamline and curate the data for various business use cases and mounted blob storage

• Utilized and other Azure services Logic like Apps HTTP to build requests, workflows email triggers, to schedule etc. and automate batch jobs by integrating apps, ADF pipelines,

• Worked Triggers, extensively and migrating on data Azure factory data factory, pipelines including to higher data environments transformations, using ARM Integration Templates. Runtimes, Azure Key Vaults,

• Ingested to perform data streaming in mini-batches analytics and in performed Data bricks. RDD transformations on those mini-batches of data using Spark Streaming Environment: HDInsight, Blob Azure storage, SQL Apache DW, Databrick, Spark. Azure Synapse, Cosmos DB, ADF, SSRS, Power BI, Azure Data Lake, ARM, Azure Nata Consultancy Services Pvt Ltd, Hyderabad Sep 2020 to Dec 2022 Big Data Engineer / Hadoop Developer

Responsibilities:

• Interacted scalable distributed with business data solutions partners, using Business the Hadoop Analysts, ecosystem. and product owners to understand requirements and build

• Developed state complete Spark transformations. Streaming programs to process near real-time data from Kafka and process data with stateless and

• Worked bucketing, with and HIVE wrote data and warehouse optimized the infrastructure- HQL queries. created tables, distributed data by implementing partitioning and

• Built which and reduced implemented 60% of execution automated time. procedures to split large files into smaller batches of data to facilitate FTP transfer,

• Worked FLUME and on developing SQOOP, and ETL performed processes structural (Data Stage modifications Open Studio) using to Map load Reduce, data from HIVE. multiple data sources to HDFS using

• Developing data back into Spark RDBMS scripts through and UDFS Sqoop. using Spark DSL and Spark SQL query for data aggregation, querying, and writing

• Written from Codecs numerous like multiple gZip, file Snappy, MapReduce formats Lzo. including Jobs using Parquet, Java API, Avro, Pig, XML, and Hive JSON, for CSV, data ORCFILE, extraction, and transformation, other compressed and file aggregation formats

• Strong Hive to understanding optimize performance. of Partitioning, bucketing concepts in Hive and designed both Managed and External tables in

• Developed PIG Loaders. PIG UDFs for manipulating the data according to Business Requirements and also worked on creating custom

• Developing against Snowflake. ETL pipelines in and out of data warehouse using Python and Snowflakes Snow SQL Writing SQL queries

• Experience parameterized, in report Cascading, writing Conditional, using SQL Table, Server Matrix, Reporting Chart, and Services sub-reports. (SSRS) and creating reports like drill down,

• Used DataStax Spark connector to store the data in the Cassandra database or get the data from the Cassandra database.

• Wrote jobs. oozie scripts and set up workflow using Apache Oozie workflow engine for managing and scheduling Hadoop

• Worked sends them on implementing to a Kafka and a Zookeeper-log producer based in Scala log collection that watches platform. for application logs, transforms incremental logs, and

• Used reporting Hive on to the analyze dashboard. data ingested into Hbase by using Hive-Hbase integration and compute various metrics for

• Transformed and scheduled the the data job and using crawler AWS Glue using dynamic the workflow frames feature. with PySpark; cataloged the transformed data using Crawlers

• Worked and slots on configuration. installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning,

• Developed visualization data and pipeline generation. programs with Spark Scala APIs, data aggregations with Hive, and data formatting (JSON) for Environment: ETL, UDF, MapReduce, AWS, Cassandra, Snowflake, PySpark, Apache Apache Pig, Python, Spark, Hbase, Java, SSRS Apache Kafka, HIVE, SQOOP, FLUME, Apache oozie, Zookeeper, Navit Software Solutions Pvt Ltd, Hyderabad Apl 2017 to Aug 2020 AWS Python Developer

Responsibilities:

• Implemented end-to-end solution for hosting the web application on AWS cloud with integration to S3 buckets.

• Worked on AWS CLI Auto Scaling and Cloud Watch Monitoring creation and update.

• Allotted permissions, policies, and roles to users and groups using AWS Identity and Access Management (IAM).

• Developed Python and a Bash, fully which automated saved continuous $85K YOY. integration system using Git, Jenkins, MySQL, and custom tools developed in

• Worked on familiar on servers AWS Elastic such Beanstalk as Apache. to quickly deploy various applications developed with Java, PHP, Node.js, and Python

• Developed Compute Cloud server-of side Amazon software Web modules Services (and AWS)client-. side user interface components and deployed them entirely in the

• Implemented DynamoDB data. Lambda to configure Dynamo DB Autoscaling feature and implemented Data Access Layer to access AWS

• Automated reducing the nightly effort by build 70%to . run quality control using Python with BOTO3 library to ensure pipeline does not fail,

• I worked on AWS services like AWS SNS to send automated emails and messages using BOTO3 after the nightly run.

• I implementation worked on developing of basic failover tools that among automate regions AWS through server AWS provisioning, SDK. automated application deployments, and the

• Created Amazon AWS VPCs. Lambda, EC2 instances provisioning on AWS environment, implemented security groups, and administered

• Used Created Jenkins Pods pipelines and managed to drive using all Kubernetes. micro-services builds out to the Docker registry and then deployed to Kubernetes,

• Involved and testing with playbooks developing on AWS Ansible instances. playbooks with Python and SSH as wrapper for managing AWS node configurations

• Developed asynchronously Python execute AWS the serverless callable. lambda with concurrent and multi-threading to make the process faster and

• Implemented CloudTrail to capture the events related to API calls made to AWS infrastructure.

• Monitored containers in AWS EC2 machines using Datadog API and ingest, enriching data into the internal cache system.

• Chunking up data processing. the data, converting it from more extensive data sets to smaller chunks using Python scripts, will help speed Environment: Cloud Watch, Docker, AWS, S3, Kubernetes EC2, LAMBDA, EBS, IAM, Datadog, CloudTrail, CLI, Ansible, MySQL, Python, Git, Jenkins, DynamoDB, Education

• • Bachelor Master of of Science Engineering in Computer in Civil Information Engineering, Systems, Bapatla New Engineering England College, College, Bapatla, Henniker, India. New Hampshire. Certifications

• https://www.credly.com/badges/bb728a82-6d35-4999-9140-b2bb3f6c4b23/linked_in_profile

• https://www.credly.com/badges/8b13eea2-a508-4b59-8c0a-3d3341a16e0d/linked_in_profile

Contact this candidate