Good question! So Request Units are allocated per second, so say if you have allocated 1000 Request units, that number of Request units are allocated to your container or database (depending on how you've provisioned your throughput) per second. Now for autoscale throughput, this works a little differently. You define a maximum throughput level that you can scale to, and your 'resting' throughput will be 10% of that. For example, if the maximum level is 1000 RUs, the minimum will be 100 RUs. Depending on how many requests are being performed against the container or database, you will scale between those two values. The Cosmos DB team have docs on how this works: learn.microsoft.com/en-us/azure/cosmos-db/provision-throughput-autoscale
Hi John! The query (insert or read) will fail. A 429 error will be returned to the user so on insert, the data won't be written to cosmos and on reads, results won't be returned to the user.