Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • School Learning
    • Practice Coding Problems
  • Courses
    • DSA to Development
    • Get IBM Certification
    • Newly Launched!
      • Master Django Framework
      • Become AWS Certified
    • For Working Professionals
      • Interview 101: DSA & System Design
      • Data Science Training Program
      • JAVA Backend Development (Live)
      • DevOps Engineering (LIVE)
      • Data Structures & Algorithms in Python
    • For Students
      • Placement Preparation Course
      • Data Science (Live)
      • Data Structure & Algorithm-Self Paced (C++/JAVA)
      • Master Competitive Programming (Live)
      • Full Stack Development with React & Node JS (Live)
    • Full Stack Development
    • Data Science Program
    • All Courses
  • DevOps Lifecycle
  • DevOps Roadmap
  • Docker Tutorial
  • Kubernetes Tutorials
  • Amazon Web Services [AWS] Tutorial
  • AZURE Tutorials
  • GCP Tutorials
  • Docker Cheat sheet
  • Kubernetes cheat sheet
  • AWS interview questions
  • Docker Interview Questions
  • Ansible Interview Questions
  • Jenkins Interview Questions
Open In App
Next Article:
Google Cloud Platform - Loading Data to BigQuery
Next article icon

Google Cloud Platform - Working with External Data in BigQuery

Last Updated : 30 Mar, 2023
Comments
Improve
Suggest changes
Like Article
Like
Report

In BigQuery it's also possible to query data stored externally or outside BigQuery. In this article, we're diving into these external data sources. It's possible to leave your data in any place and use BigQuery as your query engine. These sources are called external or federated data sources. This functionality is currently supported for data residing in Google Drive, Cloud Storage, Cloud SQL, and Bigtable. 

Before we look closer at how to query these sources, let's discuss a few notable differences you'll experience with external data sources:

  • First to note is that the query performance for external data sources may not equal the query performance of data stored in BigQuery. So if query speed is a priority in your case, you may want to load the data directly into BigQuery.
  • Secondly, when querying an external source, BigQuery cannot predict the amount of data to be processed. So you will only know after you run your query.
  • Finally, the results are not cached like they would be when querying data stored in BigQuery. Caching is a great way to save costs and improve performance on repeated queries when the underlying data hasn't changed.

Overall, this feature is best for short-term, less frequently accessed data. For example, you could use external data sources to support loading and transforming your data in just a single pass. In this workflow, you query the external source, transform the data as part of the query, and then write the results as a permanent table in BigQuery storage.

 Another use case is for joining small amounts of frequently changing data with data stored in BigQuery. By keeping the frequently changing data as an external data source, it does not need to be reloaded into BigQuery every time it is updated. An example here is querying data that lives in a Google Sheet. Even as the Sheet is edited in real-time, you can run queries over the data and the results will reflect the live, up-to-date information. 

So now let's look at how to set up external data sources in BigQuery.  In this example, we'll run a query over a collection of JSON files located in a Cloud storage bucket. These files have the flight performance data for all domestic flights in the United States from 2014.

Step 1: Starting in the console, create a new data set. Highlight your project name in the left-hand nav and click Create Data Set. Name the data set flight_performance, choose the US for the location, and click Create Data Set.

Step 2: Now highlight your new data set and click Create Table.

Under Source, choose Google Cloud Storage.

Note: If your data was located in Google Drive, such as in a Google Sheet or in Bigtable, you would choose those alternate sources in the dropdown.

The flight performance data is located in a public bucket which any GCP user can access with the URI address. Paste the URI  into the GCS bucket field. You'll see there's a wild card character in the URI, which indicates to BigQuery to include all the JSON files that adhere to the specific naming convention. Next, under Destination, make sure you set the table type to external. Name the table 2014. And you can auto-detect the schema in this case. Finally, click Create Table. 

Since there is no data ingestion involved, you'll see the table immediately populate under the data set. Look at the table details where you can see the external configuration that you just set up and table size of 0 bytes because your external table does not use any BigQuery storage.

Now you can run a query that references the external table. In our query we're selecting account of all flights by the carrier. As we discussed earlier, you cannot see the amount of data processed until after the query completes. In this example, you created a permanent for the external data source. However, you can also query an external source using a temporary table which is useful for one-time ad hoc queries or for ETL processes. 


Next Article
Google Cloud Platform - Loading Data to BigQuery

D

ddeevviissaavviittaa
Improve
Article Tags :
  • Google Cloud Platform
  • DevOps
  • Cloud-Computing

Similar Reads

    Google Cloud Platform Tutorial
    Google Cloud Platform (GCP) is a set of cloud services provided by Google, built on the same technology that powers Google services like Search, Gmail, YouTube, Google Docs, and Google Drive. Many companies prefer GCP because it can be up to 20% cheaper for storing data and databases compared to oth
    8 min read

    Introduction

    What is Google Cloud Platform (GCP)?
    Google Cloud Platform (GCP) is a cloud computing service by Google that helps businesses, developers, and enterprises run applications, store data, and manage workloads on a secure, scalable, and high-performance infrastructure. Whether you're building a website, handling large datasets, or running
    15+ min read
    Introduction to Google Cloud Platform
    Google Cloud Platform (GCP) is an initiative by Google to provide cloud computing services to customers. These services run on the same infrastructure and platform on which Google services such as Gmail, YouTube, etc run. GCP was launched on April 7, 2008, and the complete set of services and the pl
    5 min read
    Cloud Storage in Google Cloud Platform (GCP)
    Google Cloud Storage is a secure, scalable, and high-performance storage solution that lets businesses store, manage, and retrieve data effortlessly. It’s designed for big data analytics, media storage, backups, and disaster recovery, making it a go-to option for enterprises looking for cost-effecti
    8 min read
    Features of Google Cloud Platform
    Google Cloud Platform (GCP) is Google’s cloud computing service that helps businesses build, deploy, and scale applications on a secure, global infrastructure. It offers powerful features like virtual machines, cloud storage, databases, AI, machine learning, and big data tools. GCP reduces infrastru
    5 min read
    Google Cloud Platform - Introduction to Qwiklabs
    Qwiklabs provides lab learning environments that help developers and IT professionals get hands-on experience working with leading cloud platforms and software. Qwiklabs provides temporary credentials to Google Cloud Platform and Amazon Web Services so that you can get a real-life experience by work
    3 min read

    Compute Services

    Google Cloud Platform - Compute Services
    To create and run a Virtual Machine in the Google Cloud Platform, one needs Compute Services to perform certain operations. Google Cloud Platform’s Compute Engine provides a variety of computing options according to users’ needs. Whether you’re looking for virtual machines, serverless or a managed p
    8 min read
    Cloud Functions in GCP
    Cloud Functions are a serverless computing service offered by Google Cloud Platform (GCP). They provide a simple way to run code in response to events with minimal configuration and maintenance. Cloud Functions are event-driven, meaning they can be triggered by events such as changes in data, new me
    5 min read
    How to Use Google Cloud Function with Python ?
    Google Cloud Functions provides a way to run small pieces of code in response to cloud events without managing servers. If you're a developer looking to automate tasks, process data or build APIs, Python is a great language for working with Google Cloud Functions.In this article, we will look into h
    6 min read
    Difference Between Google Cloud Compute Engine and App Engine
    Google Cloud Platform provides a wide range of computing services that target broad categories of user needs. The Google Cloud Platform provides mainly 6 types of compute options: -App EngineCompute EngineKubernetes EngineCloud FunctionsCloud RunVMware EngineNow let's talk about some of these servic
    4 min read
    Google Cloud Platform - Automatic Vs User-Managed Replication Policy
    In this article, we will look into the GCP Secret Manager’s global secret names and regional replication policies. This article will help you to choose between the user-managed and the automatic process. In Secret Manager, secret names are project global resources. This is because secrets rarely dif
    3 min read

    Storage and Database Services

    Google Cloud Platform - Cloud Storage
    Google Cloud Storage is unified object storage. In reality, the GCS is the place where you can store and serve static binary assets either for your app to use or directly to your users. But as straightforward, as it sounds, there is a lot going under the hood. Google Cloud Storage The GCP has Bucket
    2 min read
    Google File System
    Google Inc. developed the Google File System (GFS), a scalable distributed file system (DFS), to meet the company's growing data processing needs. GFS offers fault tolerance, dependability, scalability, availability, and performance to big networks and connected nodes. GFS is made up of a number of
    3 min read
    Introduction to Google Cloud Bigtable
    Google Cloud Bigtable is a highly scalable NoSQL database designed for handling large volumes of data efficiently. It is built to store and manage terabytes to petabytes of structured data while ensuring low-latency performance. This makes it an excellent choice for applications requiring high throu
    11 min read

    Networking Services

    Google Cloud Platform Networking Services
    Google Cloud Platform offers a suite of networking services that can help you manage and build complex network architectures, reduce network latency, and simplify network administration. To learn more about the GCP Networking Services, read on!GCP Networking Services offers IP transit service in pee
    8 min read

    Security Services

    Google Cloud Platform Security
    Cloud computing is now the backbone of apps, services, and businesses we use daily—Gmail and Google Docs to large enterprise systems. At its core is Google Cloud Platform (GCP), a robust cloud service used by startups, global enterprises, and governments. Great power, however, brings great responsib
    15+ min read
    Access Control for Disaster Avoidance in Google Cloud IoT Core using IAM Policy
    Internet of Things(IoT) is today's one of the most used technologies to establish the network between physical devices. In the case of the Cloud IoT, the cloud technology has added extra value by providing massive support to the modern IoT automation to make it more secure, managed, scalable and so
    4 min read

    Data Integration and Analytics Services

    Introduction to Databricks
    Databricks is a cloud-based platform for managing and analyzing large datasets using the Apache Spark open-source big data processing engine. It offers a unified workspace for data scientists, engineers, and business analysts to collaborate, develop, and deploy data-driven applications. Databricks i
    5 min read
    Google Cloud Platform - Introduction to BigQuery
    Google BigQuery is a fully managed, serverless data warehouse designed to help businesses store and analyze large volumes of data quickly and efficiently. Whether you're dealing with massive datasets or real-time analytics, BigQuery allows you to run complex queries and get insights in seconds witho
    8 min read
    Google Cloud Platform - Introduction to BigQuery Sandbox
    BigQuery sandbox gives you free access to try out BigQuery and use the UI without providing a credit card or using a billing account. It's a quick way to get started and try out some BigQuery concepts. To get started, click on this link and follow along with the rest of the article. If you're a new
    2 min read
    Google Cloud Platform - Tables in BigQuery
    Tables in BigQuery or any database for that matter is used to store data in a structured manner. In this article, we will explore the concepts of the three types of table available in BigQuery: Temporary TablesPermanent TablesViews (Virtual Tables)Temporary Tables: Just as BigQuery automatically sav
    3 min read
    Google Cloud Platform- BigQuery(Running Queries, advantage and disadvantage)
    In this article, we're going to look into how to run a query in BigQuery. Running queries is one of the most fundamental parts of discovering insights from your data. So let's ask an outrageous question to BigQuery here and ask it "what is the best jersey number you should choose in order to improve
    7 min read
    Google Cloud Platform - User Defined Functions in BigQuery
    SQL has many built-in functions for performing calculations on data. But sometimes, your systems might need to handle data, such as string or date values, uniquely. User-defined functions are an efficient way to have these custom calculations at your fingertips when analyzing data. In this article,
    4 min read
    Google Cloud Platform - Working with External Data in BigQuery
    In BigQuery it's also possible to query data stored externally or outside BigQuery. In this article, we're diving into these external data sources. It's possible to leave your data in any place and use BigQuery as your query engine. These sources are called external or federated data sources. This f
    4 min read
    Google Cloud Platform - Loading Data to BigQuery
    In this article, we will look into how to load and analyze your own data in BigQuery. As it is better to understand the concept with examples, we will be answering the age-old question "Which is better, cats or dogs?" If you want to analyze data that are not already available as part of the public d
    5 min read
    Google Cloud Platform - Implementing Authorized View in BigQuery
    In this article, we will look into how you can implement an Authorized view in BigQuery.You can follow along in your own BigQuery sandbox, which you can set up for free. For this, we're using two sandboxes in order to represent the perspectives of the data admin. As a data admin follow the below ste
    3 min read
    Google Cloud Platform - Query History vs Saved Query vs Shared Query in BigQuery
    The process of writing and running SQL queries doesn't always follow a straight line. A particular query can be in constant iteration while you use it to explore and clean up your data, or as you fine-tune it to optimize its performance. In this article, we will highlight the ways to save and share
    3 min read
    Google Cloud Platform - Managing Access using IAM in BigQuery
    While big data brings us valuable insights and opportunities, it also brings the responsibility to ensure that data is secure, meaning that only the right data is shared with the right people. In this article, we're talking about how to use Google Cloud's Identity and Access Management Service to de
    5 min read
    Google Cloud Platform - Data Visualization in BigQuery
    Whether you're exploring a data set for the first time or summarizing the findings of your analysis to an audience, you can use data visualization to make large, complex data sets easier to understand and internalize. In this article, we will look into visualizing your BigQuery data. Data visualizat
    4 min read
    Google Cloud Platform - Data Security in BigQuery
    One of the benefits of a data warehouse, like BigQuery, is the improved simplicity and speed of bringing data to your analysts and decision-makers. Data needs to vary across a company based on organizational function, geography, and more, so it's important to be able to provide customized access to
    3 min read

    Management tools and Monitoring Services

    Google Cloud Platform - High Level Overview of Migrate for Anthos
    In this article, we will introduce you to Migrate for Anthos. Migrate for Anthos is a set of tools that inspects existing workloads running in virtual machines and automatically creates the needed container artifacts for modernization. Let us break down that last sentence into two parts and talk sep
    3 min read

    GCP DevOps

    Google Cloud Platform - Using Config Sync for Managing Kubernetes
    In this article, we will look into how we can manage Kubernetes using Config Sync. To do so let's create a problem statement and resolve the same. Problem Statement: Ravi has a new role, Platform Administrator, and he is tasked with ensuring all the infrastructure created by all of his company's tea
    3 min read
    Google Cloud Platform - Deploying Django & its Content Management Systems
    Django is a web framework written in Python that handles serving web pages for you. You define data models as Python objects, and Django simplifies communicating these to a database. Cloud Run is a managed serverless platform, where each server runs stateless. No data is stored on the servers themse
    4 min read

    Miscellaneous

    Difference Between Google Cloud and AWS
    Google Cloud Platform: It is a suite of cloud computing services developed by Google and launched publicly in 2008. Google Cloud Platform provides IaaS, PaaS, and serverless computing environments. A comparatively new Google Cloud Platform has all the tools and services required by developers and pr
    3 min read
    How To Share File From Host Machine(Windows) To Guest Machine(Linux)
    We need to have Ubuntu installed in our Virtual Box for the purpose of this experiment. The host machine is Windows 10 in the following experiment. Transfer File From Host Machine(Windows) To Guest Machine(Linux) 1. Method 1: Installing SSH on Ubuntu Terminal and allowing Firewall blockage Open Term
    4 min read
    Deployment Models in OpenStack
    Pre-requisite: OpenStack OpenStack has a set of software tools for providing various cloud computing platforms for public and private clouds. OpenStack is managed by the OpenStack Foundation, a non-profit that oversees both development and community-building around that project. OpenStack is the fut
    4 min read
    How to Build G Suite Add-ons with Google Apps script?
    G Suite is a Google service that provides access to a core set of applications like Gmail, Calendar, Drive, Docs, Sheets, Slides, Forms, Meet, etc. Add-ons means the extension given to the pre-existing G Suite products (mentioned above). Developers can add many extra features to such products. Add-o
    3 min read
    Google Cloud Platform - Introduction to PhoneInfoga an OSINT Reconnaissance Tool
    PhoneInfoga is one of the most advanced tools which one can use to scan phone numbers and get detailed information about them using only free resources. The motive is to gather basic information such as country, area, line, and carrier on any international phone numbers with very good accuracy. Then
    3 min read
    Generating API Keys For Using Any Google APIs
    Like most software giants, Google provides its enthusiastic developers community with its APIs, SDKs and Services. These APIs from Google are hosted on their cloud platform, popularly known as Google Cloud Platform (GCP). Software such as Google Maps, YouTube, Gmail, etc., use the same APIs and now
    3 min read
    Google Cloud Platform - Understanding Federated Learning on Cloud
    Crowdsourcing has a wide range of benefits. Whether it's restaurant reviews that help us find a perfect place for dinner or crowdfunding to bring our favorite TV show back to life, these distributed contributions combined to make some super useful tools. We can also use that same concept to build be
    3 min read
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences