You need an All Access Membership on GCP Study Hub to access labs. Click here to purchase if you don't have one.
In this lab, you'll learn how to use BigQuery, Google Cloud's serverless data warehouse, to analyze data at scale. You'll create a dataset, load sample e-commerce sales data from a CSV file, write SQL queries to analyze it, and save query results. BigQuery can process billions of rows in seconds, making it perfect for data analysis and business intelligence.
lab_dataset existssales_data exists with at least 1,000 rowsquery_results exists (query results saved)How this works: You need a Google account that uses the same email address as your GCP Study Hub account. When you sign into that Google account, the system will be able to grant you temporary access to a GCP project when you start the lab.
Click the button below to start your lab:
Start LabOnce your lab starts, use the "Open GCP Console" button from the lab portal to open the Google Cloud Console with your temporary lab project already selected.
Alternatively, search for "BigQuery" in the search bar at the top of the console.
A dataset is a container for tables in BigQuery, similar to a database schema. Some teams organize datasets by department (sales, marketing), data source (app_logs, external_api), or environment (prod, staging), though the optimal organization strategy depends on your team structure and access control requirements.
lab_datasetYou should now see lab_dataset appear under your project in the Explorer panel.
First, download the sample e-commerce sales data CSV file to your computer.
The file will be saved to your Downloads folder. You'll use this in the next step.
Now you'll create a table in BigQuery and upload the CSV data. BigQuery supports loading data from multiple sources: Cloud Storage, local files, streaming inserts, or other BigQuery tables. For larger datasets, Cloud Storage is generally preferred over direct file uploads due to better reliability and the ability to parallelize the load process.
lab_datasetsample_sales_data.csv file you just downloadedlab_dataset (should be pre-filled)sales_dataOnce complete, you'll see the sales_data table under lab_dataset with 2,000 rows loaded.
Before running queries, it's good practice to preview your data to understand its structure and spot any data quality issues.
sales_data table in the Explorerquantity as INTEGER, revenue as FLOAT, order_date as STRING (could be DATE with proper formatting), and text fields as STRING.order_id - Unique order identifierproduct_category - Type of product sold (5 categories: Electronics, Clothing, Home & Garden, Books, Sports & Outdoors)quantity - Number of items per order (1-5)revenue - Total revenue for the order in dollarsorder_date - When the order was placed (2024 data)region - Geographic region (North, South, East, West)This dataset contains a year of e-commerce transactions across different regions and product categories, which provides a reasonable basis for practicing basic aggregation queries.
This query calculates revenue by region using standard SQL aggregation functions.
SELECT region, COUNT(*) as order_count, SUM(revenue) as total_revenue, ROUND(AVG(revenue), 2) as avg_revenue FROM `lab_dataset.sales_data` GROUP BY region ORDER BY total_revenue DESC
About columnar storage: BigQuery uses columnar storage rather than row-based storage. When querying specific columns (like region and revenue), only those columns are read from disk. This design can significantly reduce I/O for analytical queries, particularly on wide tables where you're selecting a small subset of columns. The performance benefit scales with dataset size, though actual query times depend on many factors including query complexity, data distribution, and available compute resources.
Sometimes you want to save query results as a permanent table instead of just viewing them. This is useful for creating summary tables, building data pipelines, or caching expensive query results to avoid re-scanning large datasets. Let's create a table showing product category performance.
SELECT product_category, COUNT(*) as total_orders, SUM(quantity) as total_units_sold, ROUND(SUM(revenue), 2) as total_revenue FROM `lab_dataset.sales_data` GROUP BY product_category ORDER BY total_revenue DESC
lab_datasetquery_resultsquery_results table appears under lab_dataset. This summary table can be queried directly without re-aggregating the raw data.Note on materialized aggregations: Some data teams maintain pre-aggregated summary tables to reduce query costs and latency for frequently-accessed metrics. This approach trades storage costs for reduced compute costs, and works best when the aggregation pattern is well-understood and relatively stable. The trade-off depends on your specific query patterns and data refresh requirements.
Click the button below to open the lab checker in a new window. Sign in with Google to verify your work and get your completion token.
Open Lab CheckerPaste your token below and click Validate. If valid, a button will appear to mark the lesson complete.
If your validation fails, check that:
lab_dataset (case-sensitive)sales_dataquery_results table by saving query results (not manually creating it)If the CSV file won't upload: