Google Cloud Platform - Loading Data to BigQuery
Last Updated : 30 Mar, 2023
In this article, we will look into how to load and analyze your own data in BigQuery. As it is better to understand the concept with examples, we will be answering the age-old question "Which is better, cats or dogs?"
If you want to analyze data that are not already available as part of the public data sets program or hosted publicly by another BigQuery user, you'll need to load your own data into BigQuery. How you load the data depends on your analytics needs and your data pipeline. If your data slowly changes or needs to be loaded in a one-time analysis, you may be fine with loading the data into BigQuery in batch. But if you need to ingest and analyze data close to real-time, in this case, you may need to stream your data into BigQuery.

BigQuery has options for loading data that cover both of these scenarios. So let's get started by loading data in a batch. So what data do you need in BigQuery in order to officially crown the winner of the cats versus dogs battle? We're going to determine the champion by analyzing college basketball tournament games in order to see who wins when dog mascots and cat mascots go head to head.
To run this analysis, we'll need two things. First, we'll need to have the tournament results data, which is already available in BigQuery as part of the public NCAA basketball public data set. Second, we'll need a list that has teams with dog and cat mascots, which we have available as a local CSV file. To join these two data sets together for analysis, we'll need to load the mascot CSV file into BigQuery and create a table.
First, create a home for the mascot table in BigQuery. BigQuery organizes data into containers called data sets. These data sets function somewhat like top-level folders that manage underlying tables.
Now follow the below steps to load the data:
Step 1: To create a new data set, select the project name on the left-hand nav and click the Create Data Set button.
Step 2: Then we'll give the data set a name and then decide on a location. In this case, we need to co-locate the data set with the NCAA public data set, which is located in the US multi-region. We'll need to reference both tables in one query by performing a join. And this can only be done with tables residing in the same geographical location. You can always view the data set the location by clicking on the Details tab in the web UI. Click Create Data Set and the new data set will appear in the left-hand nav.
Step 3: Now it's time to create a new table within the data set by loading the mascot CSV file. Highlight the data set and click Create Table.
This dialog allows us to directly upload files from our local machine up to 10 megabytes in size and containing less than 16,000 rows. If you have something larger you can just upload it to Cloud storage and then select it from there. Since our CSV file is pretty small, we can skip that and use the browse functionality to select the file from our local machine.
Step 4: Give the table a name and then define the schema. The schema is a list of each column and its data type. We can define the schema manually by clicking on Add Field or check the box to have BigQuery auto-detect it.
Step 5: Click Create Table and a load job will be created. Once the data has finished loading, you can navigate to view the table details, review the schema, and preview the data right in the console. Our mascots table is ready to query.
Step 6: We'll paste in a query that uses the mascot table to analyze cat versus dog tournament game match-ups. In our query, we've started with our table of tournament games and then used a series of SQL joins with the mascots table to lookup an animal classification for the winning and losing teams. We then sum up the number of wins for cats and the number of wins for dogs in the specific cat versus dog match-up games. Our query will be as given below:
#standardSQL WITH matchups AS { SELECT g.win_team_id ,g.lose_team_id ,(SELECT win_masc.tax_genus FROM 'analytics-testing-321.basketball.mascots' win_masc WHERE win_masc.id = g.win_team_id) AS tax_genus_winner ,(SELECT lose_masc.tax_family FROM 'analytics-testing-321-basketball.mascots' lose_masc WHERE lose_masc.id = g.lose_team_id) AS tax_family_loser ,(SELECT win_masc.tax_family FROM 'analytics-testing-321.basketball.mascots win_masc WHERE win_masc.id =g.win_tean_id) AS tax_family_winner , (SELECT lose_masc.tax_genus FROM 'analytics-testing-321-basketball.mascots lose_masc WHERE lose_masc.id = g.lose_team_id) AS tax_genus_loser FROM 'bigquery-public-data.ncaa_basketball.mbb_historical_tournament_games' g ) SELECT SUM(IF(tax_family_winner = "Felidae" AND tax_genus_loser = "Canis", 1, 0)) AS num_cat_wins, SUM(IF(tax_genus_winner = "Canis" AND tax_family_loser = "Felidae", 1, 0)) AS num_dog_wins FROM matchups
Step 7: Now run our query. And there we have it, with 43 of the wins, dog mascots win in the realm of college basketball tournament games. You can also see how dogs and cats perform on other metrics or join this data with other data sets to test new ideas.
Similar Reads
Google Cloud Platform Tutorial Google Cloud Platform (GCP) is a set of cloud services provided by Google, built on the same technology that powers Google services like Search, Gmail, YouTube, Google Docs, and Google Drive. Many companies prefer GCP because it can be up to 20% cheaper for storing data and databases compared to oth
8 min read
Introduction
What is Google Cloud Platform (GCP)?Google Cloud Platform (GCP) is a cloud computing service by Google that helps businesses, developers, and enterprises run applications, store data, and manage workloads on a secure, scalable, and high-performance infrastructure. Whether you're building a website, handling large datasets, or running
15+ min read
Introduction to Google Cloud PlatformGoogle Cloud Platform (GCP) is an initiative by Google to provide cloud computing services to customers. These services run on the same infrastructure and platform on which Google services such as Gmail, YouTube, etc run. GCP was launched on April 7, 2008, and the complete set of services and the pl
5 min read
Cloud Storage in Google Cloud Platform (GCP)Google Cloud Storage is a secure, scalable, and high-performance storage solution that lets businesses store, manage, and retrieve data effortlessly. Itâs designed for big data analytics, media storage, backups, and disaster recovery, making it a go-to option for enterprises looking for cost-effecti
8 min read
Features of Google Cloud PlatformGoogle Cloud Platform (GCP) is Googleâs cloud computing service that helps businesses build, deploy, and scale applications on a secure, global infrastructure. It offers powerful features like virtual machines, cloud storage, databases, AI, machine learning, and big data tools. GCP reduces infrastru
5 min read
Google Cloud Platform - Introduction to QwiklabsQwiklabs provides lab learning environments that help developers and IT professionals get hands-on experience working with leading cloud platforms and software. Qwiklabs provides temporary credentials to Google Cloud Platform and Amazon Web Services so that you can get a real-life experience by work
3 min read
Compute Services
Storage and Database Services
Networking Services
Security Services
Google Cloud Platform SecurityCloud computing is now the backbone of apps, services, and businesses we use dailyâGmail and Google Docs to large enterprise systems. At its core is Google Cloud Platform (GCP), a robust cloud service used by startups, global enterprises, and governments. Great power, however, brings great responsib
15+ min read
Access Control for Disaster Avoidance in Google Cloud IoT Core using IAM PolicyInternet of Things(IoT) is today's one of the most used technologies to establish the network between physical devices. In the case of the Cloud IoT, the cloud technology has added extra value by providing massive support to the modern IoT automation to make it more secure, managed, scalable and so
4 min read
Data Integration and Analytics Services
Introduction to DatabricksDatabricks is a cloud-based platform for managing and analyzing large datasets using the Apache Spark open-source big data processing engine. It offers a unified workspace for data scientists, engineers, and business analysts to collaborate, develop, and deploy data-driven applications. Databricks i
5 min read
Google Cloud Platform - Introduction to BigQueryGoogle BigQuery is a fully managed, serverless data warehouse designed to help businesses store and analyze large volumes of data quickly and efficiently. Whether you're dealing with massive datasets or real-time analytics, BigQuery allows you to run complex queries and get insights in seconds witho
8 min read
Google Cloud Platform - Introduction to BigQuery SandboxBigQuery sandbox gives you free access to try out BigQuery and use the UI without providing a credit card or using a billing account. It's a quick way to get started and try out some BigQuery concepts. To get started, click on this link and follow along with the rest of the article. If you're a new
2 min read
Google Cloud Platform - Tables in BigQueryTables in BigQuery or any database for that matter is used to store data in a structured manner. In this article, we will explore the concepts of the three types of table available in BigQuery: Temporary TablesPermanent TablesViews (Virtual Tables)Temporary Tables: Just as BigQuery automatically sav
3 min read
Google Cloud Platform- BigQuery(Running Queries, advantage and disadvantage)In this article, we're going to look into how to run a query in BigQuery. Running queries is one of the most fundamental parts of discovering insights from your data. So let's ask an outrageous question to BigQuery here and ask it "what is the best jersey number you should choose in order to improve
7 min read
Google Cloud Platform - User Defined Functions in BigQuerySQL has many built-in functions for performing calculations on data. But sometimes, your systems might need to handle data, such as string or date values, uniquely. User-defined functions are an efficient way to have these custom calculations at your fingertips when analyzing data. In this article,
4 min read
Google Cloud Platform - Working with External Data in BigQueryIn BigQuery it's also possible to query data stored externally or outside BigQuery. In this article, we're diving into these external data sources. It's possible to leave your data in any place and use BigQuery as your query engine. These sources are called external or federated data sources. This f
4 min read
Google Cloud Platform - Loading Data to BigQueryIn this article, we will look into how to load and analyze your own data in BigQuery. As it is better to understand the concept with examples, we will be answering the age-old question "Which is better, cats or dogs?" If you want to analyze data that are not already available as part of the public d
5 min read
Google Cloud Platform - Implementing Authorized View in BigQueryIn this article, we will look into how you can implement an Authorized view in BigQuery.You can follow along in your own BigQuery sandbox, which you can set up for free. For this, we're using two sandboxes in order to represent the perspectives of the data admin. As a data admin follow the below ste
3 min read
Google Cloud Platform - Query History vs Saved Query vs Shared Query in BigQueryThe process of writing and running SQL queries doesn't always follow a straight line. A particular query can be in constant iteration while you use it to explore and clean up your data, or as you fine-tune it to optimize its performance. In this article, we will highlight the ways to save and share
3 min read
Google Cloud Platform - Managing Access using IAM in BigQueryWhile big data brings us valuable insights and opportunities, it also brings the responsibility to ensure that data is secure, meaning that only the right data is shared with the right people. In this article, we're talking about how to use Google Cloud's Identity and Access Management Service to de
5 min read
Google Cloud Platform - Data Visualization in BigQueryWhether you're exploring a data set for the first time or summarizing the findings of your analysis to an audience, you can use data visualization to make large, complex data sets easier to understand and internalize. In this article, we will look into visualizing your BigQuery data. Data visualizat
4 min read
Google Cloud Platform - Data Security in BigQueryOne of the benefits of a data warehouse, like BigQuery, is the improved simplicity and speed of bringing data to your analysts and decision-makers. Data needs to vary across a company based on organizational function, geography, and more, so it's important to be able to provide customized access to
3 min read
Management tools and Monitoring Services
GCP DevOps
Miscellaneous
Difference Between Google Cloud and AWSGoogle Cloud Platform: It is a suite of cloud computing services developed by Google and launched publicly in 2008. Google Cloud Platform provides IaaS, PaaS, and serverless computing environments. A comparatively new Google Cloud Platform has all the tools and services required by developers and pr
3 min read
How To Share File From Host Machine(Windows) To Guest Machine(Linux)We need to have Ubuntu installed in our Virtual Box for the purpose of this experiment. The host machine is Windows 10 in the following experiment. Transfer File From Host Machine(Windows) To Guest Machine(Linux) 1. Method 1: Installing SSH on Ubuntu Terminal and allowing Firewall blockage Open Term
4 min read
Deployment Models in OpenStackPre-requisite: OpenStack OpenStack has a set of software tools for providing various cloud computing platforms for public and private clouds. OpenStack is managed by the OpenStack Foundation, a non-profit that oversees both development and community-building around that project. OpenStack is the fut
4 min read
How to Build G Suite Add-ons with Google Apps script?G Suite is a Google service that provides access to a core set of applications like Gmail, Calendar, Drive, Docs, Sheets, Slides, Forms, Meet, etc. Add-ons means the extension given to the pre-existing G Suite products (mentioned above). Developers can add many extra features to such products. Add-o
3 min read
Google Cloud Platform - Introduction to PhoneInfoga an OSINT Reconnaissance ToolPhoneInfoga is one of the most advanced tools which one can use to scan phone numbers and get detailed information about them using only free resources. The motive is to gather basic information such as country, area, line, and carrier on any international phone numbers with very good accuracy. Then
3 min read
Generating API Keys For Using Any Google APIsLike most software giants, Google provides its enthusiastic developers community with its APIs, SDKs and Services. These APIs from Google are hosted on their cloud platform, popularly known as Google Cloud Platform (GCP). Software such as Google Maps, YouTube, Gmail, etc., use the same APIs and now
3 min read
Google Cloud Platform - Understanding Federated Learning on CloudCrowdsourcing has a wide range of benefits. Whether it's restaurant reviews that help us find a perfect place for dinner or crowdfunding to bring our favorite TV show back to life, these distributed contributions combined to make some super useful tools. We can also use that same concept to build be
3 min read