Vijith Reddy
Data Engineer - Mobile - +1-805-***-****
Email id: ***********@*****.***
LinkedIn: linkedin.com/in/vijith-reddy-32b8591b2
Professional Summary:
Highly skilled Data Engineer and Database administrator with around 8 years of experience designing, developing, and optimizing scalable data solutions. Proficient in ETL pipelines, data warehousing, data modeling, and data visualization tools like Power BI and Tableau. Expertise in Big Query, Dataflow, Dataproc, Kinesis, Looker, Cloud functions, Triggers, GCS, and other GCP services. Strong foundation in SQL, Python (Pandas, NumPy, Lambda, Matplotlib), PySpark, and Bash scripting. Experienced in cloud-native technologies like Docker, Kubernetes, and orchestration. Strong in data orchestration (Apache Airflow), visualization (Power BI, Tableau), and streaming technologies (Kafka, Spark). Experienced in API integration, CDP, Docker, and secure infrastructure automation
Data Engineer US Bank (Irvine, CA) (Feb 2024 – Present)
● Migrated Oracle databases to Big Query, leveraging scalability and performance, and utilized Power BI for creating
● comprehensive and interactive reports.
● Drafted and automated ETL pipelines using GCP Composer, Apache Airflow, and Python scripts, ensuring efficient and error-free data processing.
● Revamped existing ETL methodologies leveraging Python scripts alongside Apache Airflow schedules, resulting in precise hourly updates that bolstered reporting frequency.
● Applied pandas, Spark Data Frames, and sophisticated data cleansing routines to maintain data accuracy and reliability across all systems.
● Engineered advanced data transformations in pipelines by employing operators like Joiner, Lookup, Aggregator, and Sequence Generator, enhancing data usability.
● Deployed applications on GKE and OpenShift using Docker containers, implementing Kubernetes CI/CD automation for seamless integration and deployment.
● Upgraded SQL queries and implemented data compression techniques in Big Query, reducing storage and query costs by 25%.
● Advanced Kafka producers and consumers using Spark to handle high-volume event streams, ensuring robust real-time data processing capabilities.
● Generated Big Query views with row-level security and leveraged Google Cloud APIs for monitoring, cost analysis, and maintenance of effective data governance.
● Integrated data sources into a customer data platform (CDP) on GCP, enabling real-time data flows and creating unified customer profiles to support analytics.
Data Science Intern Target, CA (Remote) (Mar 2022 - Dec 2022)
● Migrated on-premises Hadoop systems to GCP, utilizing Hive SQL, Presto SQL, and Python for Spark and ensuring seamless transition and data integrity.
● Drafted scalable ETL processes and last-mile delivery pipelines for customer-facing data stores, optimizing data.
● Real-time data pipelines using Kafka, and Spark to enable efficient ingestion and transformation of web data.
● Managed Docker containers and Kubernetes clusters on GCP, deploying, scaling, and balancing workloads using Kubernetes for optimized performance.
● Use Oracle Enterprise Manager (OEM) and other tools to monitor performance.
● Applied data governance and security standards, implemented customer data protection measures, and ensuring compliance with industry regulations.
● Built complex physical and logical data models with ERWIN, managed metadata repositories, and optimized data workflows using advanced SQL procedures and functions.
● Configured Kubernetes clusters and managed containerized applications on OpenShift, using Docker and Helm for streamlined deployments and efficient resource management.
● Executed robust security measures such as role-based access control, data encryption, and multi-factor authentication to safeguard sensitive data assets.
● Analyze and resolve slow queries, deadlocks, and resource utilization issues. 1
ETL Developer InnoEye Technologies (Hyderabad, India) (Oct 2019- Aug 2021)
● Analyzed, designed, and developed ETL strategies, authored ETL specifications, and conducted Informatica development and administration to meet project requirements effectively.
● Developed Informatica processes to extract data from internal systems like check issue platforms and leveraged Informatica Power Exchange to extract data from operational systems such as Datacom.
● Built, published, and scheduled customized interactive reports and dashboards using Tableau Desktop and Tableau Server to support business analytics and decision-making.
● Enhanced Hive queries and improved performance using Hadoop, YARN, Python, and Spark, adhering to best practices and configuring appropriate parameters.
● Set up Informatica B2B Data Exchange, including Endpoint creation, Scheduler, Partner setup, Profile setup, and Event attributes, to facilitate seamless data integration and management.
● Enhanced and fine-tuned advanced SQL scripts to improve performance and coordinated complex data workflows across multiple systems.
Data Base Admin Inmar Technologies (Hyderabad, India) (June 2016- Oct 2019)
● Gathered, understood, and analyzed client requirements to perform descriptive analytics, enabling data-driven decision-making processes.
● Conducted data analysis, pre-processing, and feature engineering to build a comprehensive and enriched feature set for analytic models.
● Flourished analytic models and solutions using Python 3, leveraging libraries and frameworks to address complex business challenges.
● Cooperated with data warehouse projects by performing logical and physical data modeling in an Oracle environment and delivering accurate DDL to development teams.
● Created real-time visualization graphs using Matplotlib, facilitating actionable insights from high-volume data for team focus and efficiency.
● Performed large-scale data ingestion by importing transaction logs using Flume into HDFS. Education:
● Master’s in computer science from California State University San Bernardino- 2023
● Bachelors in electrical and Electronics Engineering from BVRIT Hyderabad - 2016 Certificates:
● GCP Cloud Data Engineer Certificate.
● Admin Microsoft 2016 Server Certificate.
Technical Skills:
Big Data Eco System: HDFS, MapReduce, Hive, HBase, Kafka, Airflow, Zookeeper, Sqoop, Scala Languages: Python, Scala, Shell Scripting.
Software Methodologies: Agile, SDLC Waterfall.
Databases: MySQL, Oracle, PostgreSQL, Data Bricks, Snowflake. ETL/BI: Power BI, Tableau, Informatica.
Version control: GIT, Bitbucket
Cloud Technologies: GCP (Big Query, Dataflow, Pub/Sub, Cloud storage) 2