Thanks for the video! At 2:48, would it be recommended to use a transaction to first read if the eventId was already processed, then do the actual work in the cloud function and then write the eventId? This would mean a single transaction for checking for the eventId, doing the actual work (including reading and writing) and then writing the eventId after processing. This would ensure atomicity. Is this recommended?
I think your problem may occur. When u first read and check the unique event, it is not processed, then u proceed to update the system state at the code level. But when the system state is just about to be updated at the database level (not at the code level of Cloud Function), the event is already processed by some other function, so we need to use transaction to stop this update.
Apart from the "at-least-once" semantics and retries are enabled, there is another case. When you rename functions or changing the regions or trigger for functions that are handling production traffic, the same unique event can be processed by both old and new versions of the function.
Http functions don't have retries. Does it mean that it's guaranteed to be executed only once? Or might be called multiple times (at-least-once), but I can't request it?
HTTP functions have an at-most-once guarantee, which is different than background functions which have at-least-once. If the function fails, the client will find out via an error status code, and it will have to retry. cloud.google.com/functions/docs/concepts/exec#execution_guarantees
Because my onCall functions always return 'finished with status code: 204' before 'finished with status code: 200'. They have different eventId, am I being billed double?
Sorry, I don't understand why your callable function would run twice or return a 204 (which means "no content"). But if it gets invoked twice for any reason, then you will be billed for both invocations.
What are the chances of a function being triggered twice? Are we talking 1 out of 10 or rather 1 out of 1000 triggers? This can have an impact on the urgency of the implementation/refactor of code (given it's not written idempotent from the start, which obviously is best). Might also be an idea to put this in the Firebase documents? And is this something that will be resolved as soon as Cloud Function Triggers go out of beta?
My personal impression is that there is not a predictable, known probability of a retry happening. It likely depends on a variety of factors, in particular, how loaded Cloud Functions is overall, and how many transient errors are currently being observed. During high load and more frequent error conditions, I would expect more retries to happen. Also, it might depend on the individual product generating the events. In my personal experience, I have not observed any retries when my functions are correctly implemented. However, I have also not put massive production load on them. There is no way to fully solve the underlying issue. It's inherent in systems that communicate asynchronously and require coordination. The issue is referred to, in computer science, as the Two Generals problem. en.wikipedia.org/wiki/Two_Generals%27_Problem
Will enable retries apply to functions completing with connection error? We recently saw this problem upgrading some critical functions to node 8 and it was causing data inconsistency and problems for us so we downgraded, but if the retried applied in this case it would work. I was just a little scared to enable it as I wasn’t sure of the implications. But the codes for all are fine have been running 1 year + with no errors or timeouts before the node 8 update began causing the problem, just thought I’d mention that so I don’t get the ‘make sure all functions return a promise’ response ;) thanks
Retries (enabled at the Cloud console) apply whenever the function terminates with an error, and it doesn't matter what kind of error it is. It could be a rejected promise (rejected for any reason), or an uncaught exception, or even just a timeout. The function has to indicate successful completion (resolved promise) in order to not be retried. Even if the function completes successfully, that success message could get lost going to back to the system, and still get retried (because the system would think it timed out). I've never had to migrate any functions to a different runtime, but I can imagine that could lead to either some brief downtime, or some duplicated events, depending on how you choose to perform the migration.
Doug Stevenson ok thanks I see. So basically the only case it WONT retry is the ‘function completed successfully’ status. This is good to know as can give us a bit of a crutch in trying to do this migration (unfortunately the newer versions of a time parsing library we use has a bug running on CF in node 6) so we really want to be able to make the move, the constant connection errors were just making it impossible to leave them deployed in production. Thanks for clarifying.
What if the firebase function triggers multiple times before the first markEventProcessed() function fulfilled? Let's say markEventProcessed() takes 10 seconds lol and the firebase function triggered multiple times in that 10 seconds.