Data Engineer Big

Location:

Irving, TX

Posted:

April 20, 2025

Contact this candidate

Resume:

Sai Naga Raju Sakhamuri

Data Engineer

*********@***********.*** +1-940-***-**** Irving, TX

SUMMARY

● Data Engineer with around 5 years of extensive experience specializing in the development and optimization of enterprise-level applications using a diverse range of technologies and tools.

● Hands-on expertise in scripting languages such as Korn Shell, Batch Scripts, PL/SQL, T-SQL, and Python to automate ETL pipelines, process large-scale unstructured datasets, and handle real-time data ingestion workflows.

● Experienced in transitioning existing AWS infrastructure to a Serverless architecture, leveraging services like AWS Lambda, API Gateway, Route 53, Kinesis, and S3 buckets for scalable and cost-effective solutions.

● Skilled in cloud computing technologies including AWS, Azure, and GCP, with a strong focus on configuration management, infrastructure automation, and CI/CD pipelines for seamless application deployment.

● Deep understanding and application of Agile development methodologies, ensuring iterative delivery and collaboration across teams.

● Extensive experience with Big Data ecosystems, utilizing tools such as Hadoop, Spark, HiveQL, Sqoop, Pig Latin, and Python scripting, combined with expertise in NoSQL databases for robust data solutions. SKILLS

Methodology: SDLC, Agile, Waterfall

Programming Language: Python, R, SQL, PySpark, Scala IDEs: PyCharm, Jupyter Notebook, Databricks Notebook, VS Code ML Algorithms: Linear Regression, Logistic Regression, Decision Trees, Supervised Learning, Unsupervised Learning, Classification, SVM, Random Forests, Naïve Bayes, KNN, K Means, CNN, Hyperparameter Tuning, ANOVA Big Data Ecosystem: Hadoop, MapReduce, HiveQL, Apache Spark, Pig, Flink, Apache NiFi, Apache Beam, Talend, Informatica Cloud Services: AWS (S3, EC2, Redshift, Lambda, Athena, AWS Glue), Azure (ADLS Gen2, ANSI SQL, KQL, IBM DataStage, Synapse Analytics), GCP

(BigQuery, Dataflow)

Frameworks & ETL/Reporting Tools: Kafka, Airflow, Snowflake, Docker, Kubernetes, Tableau, Power BI, SSRS, SSIS, DBT, Jenkins Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, TensorFlow Database: MongoDB, MySQL, PostgreSQL

Other Tools: Git, MS Office, Atlassian Jira, Confluence, Jenkins, Peoplesoft, ALM, Postman Other Skills: Data Cleaning, Data Wrangling, Communication Skills, Presentation Skills, Problem-Solving and Critical Thinking Operating Systems: Windows, Linux, Unix

EXPERIENCE

Intuit, USA Data Engineer Feb 2024 - Current

● Applied Agile methodologies throughout the Software Development Life Cycle, ensuring efficient and iterative delivery of software projects.

● Worked with Amazon Web Services (AWS)cloud infrastructure services and involved in ETL, Data Integration and Migration.

● Designed and developed a critical ingestion pipeline to process over 100TB of data into a Data Lake.

● Integrated Hadoop into traditional ETL, accelerating the extraction, transformation, and unstructured data.

● Created a Docker container for the ML model, packaging ML model and dependencies into a portable container for consistent deployment across environments.

● Developed SSIS packages for supporting Informatica deficiencies.

● Conducted comprehensive data analysis using SQL, PySpark, MongoDB, and RESTful APIs driving data-driven decision making and delivering actionable insights that propelled operational efficiency by 10%.

● Designed, developed, tested, and maintained Tableau functional reports based on user requirements.

● Streamlined data integration by designing Kafka connectors, improving data flow efficiency by 30% through seamless integration of diverse data sources into Kafka topics.

Iview Labs Pvt Ltd, India Data Engineer Jan 2019 - July 2022

● Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.

● Architected & implemented medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).

● Continued working with them to created and led an application DBA team for SQL Server, Oracle, and DB2 UDB.

● Developed data pipelines using Spark, Hive, Pig, python, Impala, and HBase to ingest customer.

● Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.

● Worked on a direct query using PowerBI to compare legacy data with the current data and generated reports and dashboards.

● Designed SSIS Packages to ETL existing data into SQL Server from different environments for the SSAS cubes(OLAP).

● Extensively Worked with ETL tools like Informatica PowerCenter and Talend to extract, transform, and load data into data warehouses.

● Set up and coordinated legacy data loads from flat files into SAS staged data repositories using UNIX shell scripts and CRON. EDUCATION

Master of Science in Information Systems and Technology University of North Texas, Denton, TX Bachelor of Science in Information Technology Vignan University, Vadlamudi, Guntur, India CERTIFICATION

Google Data Analytics

Contact this candidate