What is the Google Cloud Spanner?
Cloud Spanner is a relational database with transactional consistency at scale. As I was trying to read and learn about Cloud Spanner, I realized a need for a short Overview of Cloud Spanner. Based on my notes and understanding from the documentation, I put the following overview hoping that it will help others interested in this technology. Google Cloud Spanner is fully managed and employs automatic sharding (called splits), and replication to scale up to millions of nodes and trillions of database rows and still be highly available. Spanner is used at Google for large scale mission critical applications that require strong consistency, including Google AdWords.
When to use Google Cloud Spanner
Google Cloud Spanner can be used to meet the following requirements for your application:
- OLTP (Online Transactional Processing)
- Global scale
- Relational data model
- ACID/Strong or External consistency
- Low latency
- Fully managed and highly available
- Automatic replication
Google Cloud Spanner has been used in the following use cases:
Critical high load transactions
- Financial trading
- Telecom and billing
- Global call centers
- Supply-chain management and manufacturing
- Logistics and Transportation
- E-Commerce (High Availability)
Google Cloud Spanner is fully managed, and requires no administration overhead. Google Cloud Spanner maintains the following SLAs; multi-region deployments provide a 99.999% availability SLA — which equates to 5min downtime per year, and single region deployments provide a 99.99% availability SLA — 52.5min downtime per year. Database administration are only required to chose the region configuration when the database is created, and then resize compute resources to manage performance at scale. Other administrative functions such as replication and sharding are managed, and system updates are able to occur transparently without requiring database outages.
As a fully managed system, transient failures are managed internally, and do not need to be accounted for in the application layer. Transaction failures due to potential deadlocks or other reasons need to be considered in the application layer.
Instances can be either regional or multi-regional. In the case of regional instances data will be bound to that region to provide locality. Multi-regional makes use of, paxos based replication, TrueTime and leader election, to provide global consistency and higher availability. Google Cloud Spanner instances have:
- At least three read-write replicas of the database each in a different zone
- Each zone is a separate isolation fault domain
- Paxos distributed consensus protocol used for writes/transaction commits
- Synchronous replication of writes to all zones across all regions
- Database is available even if one zone fails (99.999% availability SLA for multi-region and 99.99% availability SLA for regional)
Google Cloud Spanner provides security through IAM integration, with permissions and access configurable for groups and users at the instance and database level. Data stored within Google Cloud Spanner is also encrypted at rest. Comprehensive audit logging is also provided for both Admin Activity and Data Access. Admin activity logs includes any operation that modifies the configuration or metadata of a resource. Data Access logs contain API calls that read the configuration or metadata of resources, as well as user-driven API calls that create, modify, or read database content.
Replication is used for both global availability and geographic locality, with fail-over between replicas being transparent to the client. Transactions are replicated using a Paxos distributed consensus protocol to ensure transactions are available in sufficient replicas before being committed. Google Cloud Spanner automatically reshards data into splits and automatically migrates data across machines (even across datacenters) to balance load, and in response to failures. Spanner’s sharding considers the parent child relationships in interleaved tables, and related data is migrated together to preserve query performance.
Google Cloud Spanner is exposed to applications through multiple channels:
- Client libraries
- Rest API, including support for instance and database management, as well as CRUD and more
- JDBC driver
- Google Cloud console, for full administration support, as well as control plane and data plane operations
- The gcloud command line tool, including instance and database management, full CRUD and also operations management
Data in Google Cloud Spanner can be queried/modified using SQL queries (ANSI 2011), Data Manipulation Language (DML) as well as mutations. Schema updates occur on the live database without requiring any downtime.
Google Cloud Spanner provides external consistency for all transactions, which is a stronger guarantee than both ACID and Serializability. It does this by using TrueTime, a globally distributed clock with high accuracy and availability. TrueTime consists of synchronised GPS and atomic clocks with minimal drift. TrueTime generates monotonically increasing timestamps, which allows globally consistent reads across the database at a timestamp without requiring locks. These highly accurate timestamps provide serialization of transactions minimising transaction contention and failures due to timestamp clashes.
When reading data in Google Cloud Spanner in either a read-only transaction or a single read call, you can set a timestamp bound, which tells Google Cloud Spanner how to choose a timestamp at which to read the data. The document on reads provides further information on the types of reads and when they would be useful.
How Cloud Spanner compares to traditional databases
- Google Cloud Spanner differs from traditional databases in some key ways, and have similarities in others.
- Relational tables: Relational with optimisation for Interleaved tables
- Foreign Keys: Foreign Keys and Interleaved tables
- SQL: SQL, DML and mutations
- JDBC: Supported with 2 drivers
- Client Libraries: Client Libraries for: C#, Go, Java, Node.js, PHP, Python, Ruby
- Stored procedures: Can use cloud functions to manage regular long running transactions
- Triggers: Triggers are not currently supported
- Cursors: Paged results if required
- Views: Not currently supported
- Data Definition Language: Data Definition Language
- Performance Schemas: Audit and Performance Schemas
- ACID compliance: External Consistency (even more than ACID)
- SSL support: SSL support Out of the box
- Query caching: Query Restart tokens with some caching
- Sub-SELECTs: Supports “sub-selects” and “with” clauses
- Replication support: Fully managed replication with 0 downtime failover
- Partitioned tables: Automatic table splits and sharding for performance and fail-over
- Clustering: Fully managed multi-zone replication
- Multiple storage engines: Fully managed optimised storage
- Schema updates with potential downtime: Live schema updates with no downtime
Google Cloud Professional Data Engineer Exam Dumps