Abhishek Saraswathi Bhatla
BigData Developer
*********@*****.***
Professional Summary:
Having 10+ years of experience in IT, specializing in Software Development with over 6+ years as a Big Data Developer. Adept at leveraging cloud solutions, automation, and scalable infrastructure to support data-driven decision-making.
●Extensive expertise in Terraform for Infrastructure as Code (IaC), automating the deployment and management of AWS infrastructure to enhance consistency and reduce manual errors. This includes deploying EC2 instances, S3 buckets, and other AWS resources, resulting in a 35% reduction in manual configuration tasks.
●Developed and maintained Terraform modules for automating cloud resource management, ensuring reusable infrastructure setups for multi-environment deployments.
●Strong experience in building CI/CD pipelines using Git, Jenkins, and Terraform, enabling efficient and automated delivery of infrastructure and application changes.
●Proficient in utilizing AWS Cloud services such as Lambda, S3, EC2, DynamoDB, Glue, Redshift, and CloudFormation for developing and managing cloud-based applications, ensuring high availability and performance.
●Implemented AWS Identity and Access Management (IAM) policies and roles to ensure secure and controlled access to resources, enhancing the security posture of applications.
●Designed and deployed serverless architectures using AWS Lambda, enabling cost-effective and scalable solutions for real-time data processing and event-driven applications.
●Experienced in configuring and optimizing Amazon RDS for relational database management, ensuring high availability, automated backups, and performance tuning.
●Utilized Amazon CloudFront for content delivery, optimizing application performance and improving user experience through caching and low-latency access.
●Solid experience with DynamoDB, implementing data models, managing throughput, and optimizing queries to handle large-scale data workloads effectively.
●Strong understanding of Agile methodologies, participating in sprints and collaborating with cross-functional teams to drive project success and deliver high-quality solutions.
●Proven track record in data migration and infrastructure optimization across AWS, enhancing operational efficiency and reducing costs.
●Proficient in Git for version control, effectively managing collaborative codebases and ensuring seamless integration across development teams.
●Experienced in implementing Git workflows such as feature branching, pull requests, and code reviews to streamline collaboration and enhance code quality.
●Developed automated scripts for continuous integration and deployment, integrating Git with CI/CD tools to improve deployment cycles and reduce downtime.
●Strong analytical and problem-solving skills, capable of adapting to changing requirements in fast-paced environments and managing multiple tasks effectively.
Technical Skills:
Cloud Technologies: AWS Kinesis, Lambda, EMR, EC2, SNS, SQS, DynamoDB, Step Functions, Glue, Athena, CloudWatch, Azure Data Factory, Azure Data Lake, Functions, Azure SQL Data Warehouse, Databricks and HDInsight
DBMS: Amazon Redshift, Postgres, Oracle 9i, SQL Server, IBM DB2 And Teradata
ETL Tools: Data Stage, Talend and ABInitio
Reporting Tools: Power BI, Tableau, QlikView and Qlik sense
Deployment Tools: Git, Jenkins, Terraform and CloudFormation
Programming Language: Python, Scala, PL/SQL and Java
Scripting: Unix Shell and Bash scripting
Professional Experience:
State Farm, Bloomington, IL August 2021 – Till Date
Data Engineer/Big Data Engineer
Responsibilities
●Developed Spark data pipelines to process data from various sources, transforming it to multiple targets.
●Created streams using Spark and processed real-time data into RDDs and DataFrames, generating analytics with Spark SQL.
●Designed a Redshift-based data delivery layer for business intelligence tools to operate directly on AWS S3.
●Implemented Kinesis data streams to read real-time data and loaded them into S3 for downstream processing.
●Automated the provisioning of AWS infrastructure using Terraform, ensuring reproducible and scalable cloud environments across EC2, S3, and networking services.
●Created reusable Terraform modules to standardize the infrastructure deployment process, reducing errors and improving development speed by 30%.
●Designed and implemented Terraform scripts to manage VPCs, subnets, and security groups, ensuring secure and highly available infrastructure for data processing applications.
●Optimized infrastructure cost and performance by using Terraform to manage scaling policies for auto-scaling groups, balancing load and resource utilization across cloud resources.
●Managed Terraform state files to ensure consistent infrastructure deployment, while collaborating with DevOps teams to integrate Terraform into CI/CD pipelines for continuous cloud resource management.
●Integrated Terraform with AWS Lambda for automating serverless deployments, enabling infrastructure as code for event-driven applications
●Written MapReduce jobs to process data from AWS S3 to DynamoDB and HBase.
●Created databases and tables using Redshift and DynamoDB, and written complex EMR scripts to process terabytes of data into AWS S3 clusters.
●Performed real-time analytics on transactional data using Python to create statistical models for predictive and reverse product analysis.
●Actively participated in client meetings, gathering requirements, and providing insights.
●Worked within an Agile framework, understanding and implementing requirements from user stories.
●Written ETL workflows to efficiently process large-scale data transformations.
●Involved in designing and optimizing ETL processes for seamless data migration across AWS and Hadoop ecosystems.
AT&T, Dallas, TX Jun 2019 – August 2021
Data Engineer/Big Data Engineer
Responsibilities
●Full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
●Conferring with data scientists and other Qlik stream developers to obtain information on limitations or capabilities for data processing projects
●Designed and developed automation test scripts using Python
●Creating Data Pipelines using Azure Data Factory.
●Automating the jobs using Python.
●Creating tables and loading data in Azure MySQL Database
●Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
●Analyze SQL scripts and design the solution to implement using Pyspark
●Developed Spark code using Python (Pysaprk) for faster processing and testing of data.
●Used SparkAPI to perform analytics on data in Hive
●Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing or other advanced techniques.
●Data Cleansing, Integration and Transformation using PIG
●Involved in exporting and importing data from local file system and RDBMS to HDFS
●Designing and coding the pattern for inserting data into Data lake.
●Moving the data from On-Prem HDP clusters to Azure
●Building, installing, upgrading or migrating petabyte size big data systems
●Fixing Data related issues
●Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they operate at their peak performance at all times.
●Created Hive tables, and loading and analyzing data using hive queries
●Reading and translating data models, data querying and identifying data anomalies and provide root cause analysis.
●Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
●Engage in project planning and delivering to commitments.
●POC’s on new technologies (Snowflake) that are available in the market to determine the best suitable one for the Organization needs
MTA NYCT (Metropolitan Transportation Authority)., New York Aug 2017 – May 2019
Data Engineer/Bigdata Engineer
Responsibilities:
●Set up and built AWS infrastructure with various services available by writing cloud formation templates (CFT) in JSON and YAML.
●Developed Cloud Formation scripts to build EC2 on demand
●With the help of IAM created roles, users and groups and attached policies to provide minimum access to the resources.
●Updating the bucket policy with IAM role to restrict the access to user.
●Configured AWS Identity Access Management (IAM) Group and users for improved login authentication.
●Created topics in SNS to send notifications to subscribers as per the requirement.
●Involved in full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
●Moving data from Oracle to HDFS using Sqoop
●Data profiling on critical tables from time to time to check for the abnormalities
●Created Hive Tables, loaded transactional data from Oracle using Sqoop and Worked with highly unstructured and semi structured data.
●Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data.
●Created and worked Sqoop jobs with incremental load to populate Hive External tables
●Scripts were written for distribution of query for performance test jobs in Amazon Data Lake.
●Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
●Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
●Working on CDC (Change Data Capture) tables using Spark Application to load data into Dynamic Partition Enabled Hive Tables.
●Analyzed the SQL scripts and designed the solution to implement using Pyspark
●Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
●Uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
●Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into s3.
●Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
●Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
●Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
●Monitored workload, job performance and capacity planning using Cloudera Manager.
●Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
FCA., Auburn Hills, MI Jan 2017 – Jul 2017
Big Data Engineer/Hadoop Developer
Responsibilities:
●Full life cycle of the project from Design, Analysis, logical and physical architecture modeling, development, Implementation, testing.
●Conferring with data scientists and other qlikstream developers to obtain information on limitations or capabilities for data processing projects
●Designed and developed automation test scripts using Python
●Creating Data Pipelines using Azure Data Factory.
●Automating the jobs using Python.
●Creating tables and loading data in Azure MySql Database
●Creating Azure Functions, Logic Apps for Automating the Data pipelines using Blob triggers.
●Analyze SQL scripts and design the solution to implement using Pyspark
●Developed Spark code using Python (Pyspark) for faster processing and testing of data.
●Used Spark API to perform analytics on data in Hive
●Optimizing and tuning Hive and spark queries using data layout techniques such as partitioning, bucketing or other advanced techniques.
●Data Cleansing, Integration and Transformation using PIG
●Involved in exporting and importing data from local file system and RDBMS to HDFS
●Designing and coding the pattern for inserting data into Data Lake.
●Moving the data from On-Prem HDP clusters to Azure
●Building, installing, upgrading or migrating petabyte size big data systems
●Fixing Data related issues
●Loading data to DB2 data base using Data Stage.
●Monitoring the functioning of big data and messaging systems like Hadoop, Kafka, Kafka Mirror makers to ensure they operate at their peak performance at all times.
●Created Hive tables, and loading and analyzing data using hive queries
●Communicating regularly with the business teams to ensure that any gaps between business requirements and technical requirements are resolved.
●Reading and translating data models, data querying and identifying data anomalies and provide root cause analysis.
●Support "Qlik Sense" reporting, to gauge performance of various KPIs/facets to assist top management in decision-making.
●Engage in project planning and delivering to commitments.
●POC’s on new technologies (Snowflake) that are available in the market to determine the best suitable one for the Organization needs
Computer Sciences Corporation., Hyderabad, INDIA Jul 2013 – Jul 2015
Hadoop Engineer
Responsibilities:
●Participated in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
●Developing Managed, external and partition tables as per the requirement.
●Ingested structured data into appropriate schemas and tables to support the rule and analytics.
●Developing custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
●Developing Pig Scripts, Pig UDF's and Hive Scripts, Hive UDF's to load data files.
●Responsible for building scalable distributed data solutions using Hadoop.
●Involved in loading data from edge node to HDFS using shell scripting
●Implemented scripts for loading data from UNIX file system to HDFS.
●Load and transform large sets of structured, semi structured and unstructured data.
●Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
●Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
●Developed the presentation layer using HTML, CSS, JSPs, Boot Strap, and AngularJS.
●Adopted J2EE design patterns like DTO, DAO, Command and Singleton.
●Implemented Object-relation mapping in the persistence layer using hibernate framework in conjunction with spring functionality.
●Generated POJO classes to map to the database table.
●Configured Hibernate's second level cache using EHCache to reduce the number of hits to the configuration table data.
●ORM tool Hibernate to represent entities and fetching strategies for optimization.
●Implementing the transaction management in the application by applying Spring Transaction and Spring AOP methodologies.
●Written SQL queries and stored procedures for the application to communicate with Database
●Used Junit framework for unit testing of application.
●Used Maven to build and deploy the application.
Education:
Masters in Computer Science from Virginia International University, Fairfax, VA
Bachelors in Computer Science from Ganapathy College of Engineering, Warangal, India