Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • DevOps Lifecycle
  • DevOps Roadmap
  • Docker Tutorial
  • Kubernetes Tutorials
  • Amazon Web Services [AWS] Tutorial
  • AZURE Tutorials
  • GCP Tutorials
  • Docker Cheat sheet
  • Kubernetes cheat sheet
  • AWS interview questions
  • Docker Interview Questions
  • Ansible Interview Questions
  • Jenkins Interview Questions
Open In App
Next Article:
What is Datafusion in Google Cloud Platform (GCP) ?
Next article icon

What is Datafusion in Google Cloud Platform (GCP) ?

Last Updated : 09 Jul, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Let's start with an introduction to Cloud Data Fusion. Cloud Data Fusion provides a graphical user interface and APIs that increase time efficiency and reduce complexity. It is user-friendly. Cloud Data Fusion provides you with user user-friendly graphical interface to build data pipelines with NO CODE.

  1. It supports parallel query execution, which significantly helps in the multi-processing of data.
  2. You can use existing templates, connectors to Google Cloud, and other Cloud service providers.
  3. There is a variety of transformations present to help you get your desired quality and format of the data.
  4. Cloud Data Fusion is extensible. This includes the ability to integrate it with Apache Airflow, SQL Engine and many more.

Benefits of Using Data fusion

The following are the benefits of using data fusion:

  1. It reduces complexity by providing a simplified graphical user interface.
  2. It supports multiple triggers and extensions to integrate multiple sources.
  3. It supports multi-core processing which fastens the query execution.

Primary terminologies related to Datafusion in GCP

The following are the Primary terminologies related to GCP Datafusion :

  1. Transformations (Transform)
  2. Sink
  3. Source
  4. Error Handlers
  5. Wranglers

1. Transformations

When creating a Datafusion pipeline, Transformation is a process of changing the source data by imposing some rules to transform it into the desired result.

Example: CSV Formatter, Compressor.

Transformations

2. Sink

  • Sink is the terminology used in Datafusion to refer Target objects. Target objects can be of different types.

Example: Bigquery, GCS

Sink

3. Source

  • Source is the terminology used in Datafusion to refer Source objects. Source objects can be of different types.

Example : Excel, Bigtable

Source objects

4. Error Handlers

  • Error Handlers in Datafusion is used to deal with errors occured in the pipelines which ensures robust data processing and query execution.
Error Handlers

5. Wranglers

  • Wrangling in Datafusion provides tools for data preparation includes harvesting of data (cleaning, structuring, enriching raw data) into desired format of the data in no time.
Wranglers

How to use Data Fusion in Google Cloud Console?

Step 1: In the Cloud console, from the Navigation menu select Data Fusion.

Data fusion

Step 2 : Click the Create an Instance link at the top of the section to create a Cloud Data Fusion instance.

  • In the Create Data Fusion instance page that loads:
Creating Cloud Datafusion Instance


Step 3: A pictorial representation of the pipeline appears in the user i, which is a graphical interface for developing data integration pipelines.

Data Integration pipeline

Step 4: In the top right menu, there are several options click Deploy. This will submit the pipeline to Cloud Data Fusion.

Options to Deploy a Pipeline


What are alternate options for Datafusion in GCP?

The following are the services which you can use as an alternative way of Datafusion.

  1. Dataproc
  2. Dataflow

1. Dataproc

Cloud Data Fusion offers the ability to create ETL jobs using their graphical pipeline UI representation whereas Dataproc lets us run manually created Spark/Hadoop/Hive jobs depending upon your requirement. Also, If you focus on the data transformation/wrangling with low/no code solution, Data fusion is the solution.

2. Dataflow

Dataflow is a Google Cloud service that provides unified stream and batch data processing at scale.If systems are Hadoop dependent, then it is wise to choose Dataproc over Dataflow.


Next Article
What is Datafusion in Google Cloud Platform (GCP) ?

D

diksha_chourasiya
Improve
Article Tags :
  • Google Cloud Platform
  • DevOps

Similar Reads

    What is Google Cloud Platform (GCP)?
    Google Cloud Platform (GCP) is a cloud computing service by Google that helps businesses, developers, and enterprises run applications, store data, and manage workloads on a secure, scalable, and high-performance infrastructure. Whether you're building a website, handling large datasets, or running
    15+ min read
    Features of Google Cloud Platform
    Google Cloud Platform (GCP) is Google’s cloud computing service that helps businesses build, deploy, and scale applications on a secure, global infrastructure. It offers powerful features like virtual machines, cloud storage, databases, AI, machine learning, and big data tools. GCP reduces infrastru
    5 min read
    Google Cloud Platform (GCP) Interview Questions 2025
    Amongst the most prominent cloud service providers, Google Cloud Platform (GCP) has grown rapidly through offering an extensive selection of solutions and services tailored to various business needs. It can be hard to get ready for a GCP interview if you are a beginner who only recently started out
    15+ min read
    What Is Google Cloud SQL:Complete Tutorial
    Google Cloud SQL is a completely managed relational database service. It provides high obtainability and automatic failover, which confirms that our database never fails and is available for application. If a server administrator is not available, then, with the help of Cloud SQL, users can easily d
    8 min read
    Google Cloud Platform - Working with External Data in BigQuery
    In BigQuery it's also possible to query data stored externally or outside BigQuery. In this article, we're diving into these external data sources. It's possible to leave your data in any place and use BigQuery as your query engine. These sources are called external or federated data sources. This f
    4 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences