Dave Does Demos

52
122 223

I work in the UK data team at Microsoft and have created this channel to show customers how to achieve various things on the Azure platform. I'll also occasionally show other relevant technologies as part of these demos. Please don't take these as best practice, they are generally a simple introduction to a feature or concept and it is up to you to design and implement to your own requirements.

The opinions and views expressed in this blog are those of the author and do not necessarily state or reflect those of Microsoft.

Комментарии

@1009reaper 11 дней назад

This MDM split between Ops & Analytics just explained the confusion I’ve been experiencing 😂 thanks!

@Max-rl5mt Месяц назад

Question, how can you add two images into one url? Thanks!

@thigmotrope Месяц назад

you describe a map that's created on the operational side. I think you answered this in another comment, but it just seems obvious to sync that map to the analytics tier, so analytics doesn't have to get into all this fuzzy matching for deterministic cases. by the way, great video though. this really helped me a lot. content like this is somewhat hard to find because there's so much vendor created content. not sure if you have an identity resolution deep dive that you have considered doing. this video sort of is that in a way but I think it's trying to talk about the MDM space for Identity

@DaveDoesDemos Месяц назад

Thanks for the comment. Yes in an ideal world the mappings would just work. In the real world the operational systems are never good/complete enough to just pull in, and often the work happens in parallel. For instance, after a merger it's more important to show merged analytics than to join up the operational systems which will just carry on working separately. Often merging on the ops side is pointless since a migration will take place later on. Don't underestimate how much effort you might waste in trying to create a single thing - many have tried and outside of Amazon who wrote the whole stack from scratch, few have succeeded. Identity mapping is more art than science in my opinion. If you own all systems then just use your own IDs to link things. If you don't own all systems then it's back to black magic and prayer that you're actually linking the right accounts in many cases. Beware the GDPR here, it's very hard to do this legally and prosecutions are on the rise.

@meriemretiel1852 Месяц назад

I'm planning to take salesforce data architect exam, and I wanted to understand in a simple a way what's master data management is all about , what problems it deals with . You're video really helped me thank you

@StartDataLate 2 месяца назад

Love the song Soft kitty ❤ and the Dev ops pipeline actually helps me to get started with deployment

@StartDataLate 2 месяца назад

I cracked so much when I see the T-shirt “!False” and this T-shirt says “the idiots are coming? 🤣🤣🤣”

@Josemartinez-oz4es 3 месяца назад

Very helpful !! nice video

@valmirmeneses 3 месяца назад

Great ideas seeded here. Thanks for sharing.

@Lilninj3 3 месяца назад

Thanks Dave, very well and simply explained!

@paulvanputten2009 4 месяца назад

Hi Dave, thanks for the great video. I am trying to conceptually understand MDM. I have one question though. In the video you mention that there is one system that is the single source of truth. In the case of your example, this is the CRM system. Is it possible to design the ESB in such a way that if a record is created in the system that is not the 'main' system that via the ESB a record is created in the main system? So, a person creates an account in the web system and this person is not yet in the CRM system. Is it possible that a record is automatically created in the CRM system? The CRM will also add the email and phone number from the web system to this record. I hope the question is clear :)

@DaveDoesDemos 3 месяца назад

Hi Paul, thanks for the question. It sounds like you've fully understood the purpose of enterprise integration and ESBs already! Yes absolutely, that's kind of the purpose. The CRM is the system that owns the truth, but we do that by making sure all updates go to it regardless where they start. If I add a customer account in the web platform, the ESB takes that data and creates (or matches!) the account in the CRM system, which will then update any other systems that may need those details. If you do this well, then there is less work matching records when you ingest for analytics since you already know the data is matched and consistent across systems. What we're avoiding here is the customer having different accounts in different systems, and the same for other data like sales, stock, product catalog etc. In retail, product and offer SKUs in particular need to be consistent between systems and this can be very challenging between logistics and distribution where you deal with a pallet of X and the stock system with may deal with a tray of X and then the store system which deals with a single X. All the same product SKU in theory, but plenty of work to do to make the numbers match up. Long story short - your comment was spot on.

@paulvanputten2009 3 месяца назад

@@DaveDoesDemos Hi Dave, thanks for the response. So conceptually speaking, if you update data in system x this will also get updated in system y, even though system y is the systems that 'owns the truth'. Does this also work in practice? We are currently implementing MDM in my organisation, and currently the ESB is being developed in a way that only system y can update data and these updates will be communicated to other systems. If you update a field in system x, this will be denied. I am not sure if I agree with this method. Doesn't this go against the idea of MDM or is this a viable solution?

@Hellya38 4 месяца назад

that's a very good talk, just question about 7:06 when you mentioned about update only happens at single point when there is need to replace one of the services, I think that make sense if the event payloads stay the the same but probably not the case most of the time where you still have to update the logics in other service in order to publish/consume the new event payloads, or did I misunderstood what you were trying to describe?

@DaveDoesDemos 4 месяца назад

Thanks for the question. Usually you'd put a translation layer between service and ESB to make the data generic and usable. If you directly integrate then you need to make translation layers for all integrated services, but with the ESB you just write one translation layer to the bus and the interfaces from the bus to other services remain the same. Take a point of sale system, when a basket is processed it might have several fields, one being pID (productID). We might have several systems with fields called p_ID, product_ID, product, productName but if we translate on the way IN to the service bus to our common productID, each of them will have a standard connector to translate to their own language on the way OUT of the ESB. If we go direct, we need to rewrite them all if we replace the POS system. This is a very simple example but the same is true of data structure/schema too and we can translate into something generic and extensible then back again. Hope that makes sense?

@lxn7404 4 месяца назад

Maybe my question is stupid but wouldn't you plug your analytics to your ESB?

@DaveDoesDemos 4 месяца назад

Many people try and fail. ESB is for operational live information. Analytics is for historical information, and the two are very different in terms of the answers they provide. Live data shows the current state, which is often different from what has happened. While it is possible to take that live feed and process it onto the lake, this often leads to errors in data and is very expensive since you end up replicating your business rules in the analytics solution, doubling the required processing (and therefore cost). As I said though, people continuously try to make this work but I've yet to see it done successfully at scale.

@thommck 5 месяцев назад

You can't say you're "not a fashion person" with that awesome t-shirt on ;)

@contactbryson 5 месяцев назад

The insight the model can provide is so impressive! Great demo, thanks for putting it together.

@illiakaltovich 8 месяцев назад

Thank you for the high-quality video, it was really interesting and insightful

@wubinmatthew 8 месяцев назад

解释的非常清楚，感谢提供信息。

@jamesholloway9332 9 месяцев назад

Great video! 'Single source of truth' and 'too much Excel everywhere' being a related one. As you say it sounds very good and gets projects spun up but even after a new central BI platform is built I've rarely seen different departments building their own reports from a shared dataset. I've seen an IT director take away "single source of truth" as an action after boardroom arguments along the lines of "my figures are different therefore your figures are wrong"; which was more of a cultural issue in the boardroom than anything.

@amj8986 10 месяцев назад

Well explained! Thank you so much.

@ashishsangwan5925 Год назад

@dave - Can you send code for each loop ....for how to deploy multiple files from a folder ? It would be great help

@vinodnoel Год назад

thank you so much

@bunnihilator Год назад

how can i put all files in a new folder everytime, with datetime as name?

@balawalali679 Год назад

@dave I am trying to build a analytical DB. Data will be collected from multiple sources and each is connected to each other with some master Id. I have just learned master data management, but i wonder how i will design my analytic system according to MDM

@DaveDoesDemos Год назад

Thanks for the comment. The how is pretty easy, you just need IDs to link things together. The difficult part is the business rules that you use to master the data, deciding what to keep, what's overlap or duplication, what format you want each column in. Start with the system of record generally and work back from there, if there's a valid record then use that and match other systems to it. If you have a record in another system that doesn't have a match you can create a new record (so don't use the SoR ID Key in analytics!). You then optionally feed back that there's a mismatch, while dealing with it gracefully in analytics. You may choose to drop such records and call them invalid, but make sure you document that this is happening so that the data is trustworthy. A lot of this won't be the data team's job to complete, you'll need to work with business owners to understand what they need to see in the end result

@balawalali679 Год назад

Hi, Very informative, I love this video

@sameerr8849 Год назад

Simple daigram or flow chat well help really well to keep things in mind for longer time Dev so i was hope that will help.

@MrFCBruges4ever Год назад

Great insights! Thanks a lot!

@guillaumeneyret7978 Год назад

Hello Dave, Thank you for you documentation and all your work. I'm currently working on a project where I need to send realtime data (even with a minor delay) from some multiple Garmin Watches (Venu Sq 2) sensors (HR, HRV, Skin tempature, stress level ...) to my PC. Indeed, my project is to collect different "wellness" data from different user during a meditation to make a Data Vizualisation of the meditation for each user just after the meditation. So I need to send all my watches sensors' data to my computer. As I am using multiple watches as the same time, I think that I won't be able to use a mobile phone as a proxy device linking the watches to the computer. Have you a solution in mind ? Thank you by advance

@DaveDoesDemos Год назад

Hi thanks for the comment. In theory you could set up multiple watches to do this as the phone is just there to provide them Internet access. There will be a limit on this though and I don't know what that limit is in terms of numbers. You could use something like the NPE Wasp to connect sensors direct to the computer @dcrainmakerblog may have some thoughts on that method. If you get watches with wifi they would all be able to connect direct using the API. If you use my method, your data will end up in a database which you can then use either with PowerBI for visualisation and dashboards, or you could connect Excel or similar to it. Given the scenario I'd use PowerBI and give each watch an ID so you can see in real time each user.

@guillaumeneyret7978 Год назад

@@DaveDoesDemos Well, thanks a lot for your quick and clear answer ! As I would be using a new Garmin watch with wifi, it is such a good news to hear that I won't have to use a phone. I will check the NPE Wasp method and also try using your method. I'll let you know if I managed to make it work (or not ;)) ! Once again : thank you Dave !

@DaveDoesDemos Год назад

@@guillaumeneyret7978 please do check the API docs as I'm not 100% certain it works over wifi but think it does

@DaveDoesDemos Год назад

@@guillaumeneyret7978 the docs say it works on wifi but I have not tested it personally developer.garmin.com/connect-iq/api-docs/Toybox/Communications.html#makeWebRequest-instance_function

@Taletherapper Год назад

Thank you for this!

@user-wn6fw9bv3q Год назад

Excellent Dave, Thanks i love you videos, could you help me how can i use an open source MDM platform for my company?

@DaveDoesDemos Год назад

Hi thanks for the feedback. Unfortunately I'm only familiar with the Microsoft tooling so don't really know the open source options. They all work in a similar way though so the skills are transferrable.

@user-wn6fw9bv3q Год назад

@@DaveDoesDemos So Thanks for your reply, is it possible to help me how can i setup mdm with your manner (with ESB)? do your architecture define in video called Data Hub?

@maskgirl7769 Год назад

Can you please do a quick video on reverse process i.e. ADLS to Box via ADF

@DaveDoesDemos Год назад

Hi thanks for the comment. Box doesn't have a connector for ADF so you'd be left with using the API. Generally speaking I wouldn't use ADF for this activity, it's designed to orchestrate your data lake. Instead, you should have an integration layer that updates Box, for instance using a Logic App triggered by a service bus which gets a message when new data is ready (look up Enterprise Service Bus for general info on this approach). I am assuming that Box is being used to deliver data to a customer or partner organisation in this instance, if not feel free to share more detail.

@adebolaopeyemi1039 Год назад

weldone Dave! Quite what i needed.

@muralijonna5238 Год назад

Thanks for sharing such a wonderful demo can please create one demo how to create CI/CD pipeline for azure AD access token with service principal

@petergamma741 Год назад

The Meditation Research Institute Switzerland (MRIS) would like to thank Dave Does Demos for his great demos with the Garmin watch we offered to us on his RU-vid channel. He was one of the pioneers who solved this challenging problem to access sensor data from Garmin watches. Unfortunately we have to tell him that we have found now a solution with the Apple watch.

@DaveDoesDemos Год назад

Glad you found a solution in the end Peter, I hope the research goes well.

@AlejoBohorquez960307 Год назад

Thanks for sharing such a valuable piece of information. Quick question, I'm wondering what if my workspace is not accessible over Public Network and my Azure DevOps is using a Microsoft Self Hosted Pipeline? Any thoughts?

@DaveDoesDemos Год назад

In that case you'd need to set up private networking with vnets. The method would be the same, you just have a headache getting the network working. Usually there's no reason to do this though, I would recommend using cloud native networking, otherwise you're just adding operational cost for no benefit (unless you work for the NSA or a nuclear power facility...).

@AlejoBohorquez960307 Год назад

Yeah! we are facing that scenario (customer requirement). Basically, the Azure DevOps Microsoft hosted agent (and because of that the release pipeline) wherever it'll get deployed on demad, needs to be able to reach our private databricks cluster URL passing through our azure firewall. So far I haven't got any strategy working on this. Would appreciate if you know some documentation to take a glimpse. Thanks for answering. New subscriber!

@DaveDoesDemos Год назад

@@AlejoBohorquez960307 Sorry I missed the hosted agent part. Unfortunately I think you need to use a self hosted agent on your vnet to do this, or reconfigure the Databricks to use a public endpoint. It's very normal to use public endpoints on Databricks, we didn't even support private connections until last year and many large global businesses used it quite happily. I often argue that hooking it up to your corporate network poses more of a risk since attacks would then be targeted rather than random (assuming you didn't make your url identifiable, of course).

@daverook3346 Год назад

It feels odd to see so much data duplicated (in the operations side). I wonder what the advantage is of having duplicated/synced data vs references to a single source of truth - it also has a familiar feeling with Domain driven design (if I've understood it right). Thank you

@DaveDoesDemos Год назад

Data is always replicated on the operational systems. If you were starting from scratch and writing your own software then maybe you'd get away with it, but in the real world that doesn't happen (Amazon might be an exception there when they originally set up the book shop). As such, your warehouse system, stock system and POS system would all have their own lists of products as an example, and they usually can't use an external source for this. The ESB then gets used to ensure they all get up to date information as it changes - update any one system and the others get the changes. Single source of truth is more of a mantra than a reality, and it often causes more work than dealing with multiple copies of information. We may sometimes keep a reference set of data which would be the source of truth, but this is usually also updated by ESB. Some people then leap to a conclusion that systems should talk directly for updates, but this would multiply out the number of touchpoints and cause more work in the long run, hence we use an ESB to abstract each connection to a middleman system (the ESB) and then create a connector to each other system. We can then change out systems easily without rewriting code all over the place. The approach is also useful in larger businesses or after merger activities where you may have several of each type of system - nobody ever tidies up an environment fully! Hope that made sense, happy to add more detail.

@maheshkumarsomalinga1455 7 месяцев назад

@@DaveDoesDemos ..Thanks for this fantastic video. Talking about SSOT, could you help clarify on the below (quite a few queries...) 1) How is MDM different from SSOT? 2) Is MDM focussed only on master data such as Customers, Locations, Products etc ...whereas an SSOT can also contain transactional data? 3) I have come across articles mentioning SSOT as an aggregated version of the data. What does that mean exactly ? 4) If EDW was considered an SSOT earlier, why is it not so? 5) It would be great, if you could bring up a video on SSOT too in the future.. Thank you.

@DaveDoesDemos 7 месяцев назад

@@maheshkumarsomalinga1455 MDM means different things but ultimately it ends up with SSOT one way or another. Sometimes you may also see "master data" created from other sources as reference data separately to systems, and this is another valid use of the term, but generally this is used as a reference to check against or a more pure source rather than actively used. For instance you may have a master data list of your stores, which wouldn't include unopened new ones or ones that have permanently closed, but is a current master list of open active stores. You may choose to have multiple master data lists with different purposes too, so a store list including those that have closed or yet to open. SSOT is not usually aggregated, it's just the single place you go to for the truth - that might mean aggregation sometimes, but it could mean that sales system 1 is the SSOT for subsidiary 1 and sales system 2 is the SSOT for subsidiary 2 while you may also have a data warehouse which is the SSOT for both subsidiaries for reporting purposes. In all scenarios the SSOT is the defined place which has the correct version of data for the defined use-case. As explained in my other video (truth about data), the sales system might not have "the truth" that a CFO is looking for when speaking to the markets, since sales data can and does change over time with returns, refunds etc. EDW can be a SSOT for reporting purposes but never make the mistake of thinking it's a single SSOT. The systems of record are SSOTs for current live data, the EDW is a SSOT for historical facts. Importantly, if you have an item returned in retail a month after purchase, your EDW data will change retrospectively and therefore the truth will change. EDW may also report different truths - if you have an item sold then returned, you did still make a sale, so marketing need to know a sale was made. You also had a return, so you'd want to know there was a return so you could do analytics on that. You also didn't make money, so did you make a sale or not? There are lots of truths in data depending on your perspective, but the sales system will only care about the truth right now - you didn't make a sale. Then there's the stock system - is the returned item in stock? It was sold, so no. It was returned, so yes. It may be damages so....maybe? Check out my other video at ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-J9FdMuQutN8.html&t

@maheshkumarsomalinga1455 7 месяцев назад

@@DaveDoesDemos Thanks Dave for the detailed explanation ! In a way, it has made me think differently (rather broadly) about SSOT now, leading to more doubts. Let me read the details again to digest further...Your efforts are greatly appreciated. I went through your other video (truth about data) too...Found it helpful...

@Ikilledthebanks Год назад

Dev data should be representative, always missing fields for us.

@vkincanada5781 Год назад

Can you please make a video on "Databricks Code Promotion using DevOps CI/CD" using Pipeline Artifact YAML method please..

@DaveDoesDemos Год назад

Hi, the methods would be identical using YAML so in theory you should be able to Google for the code examples. I have a strong preference against YAML for data pipelines in any team that doesn't have a dedicated pipeline engineer. Data teams simply don't need the stress of learning yet another markup language just to achieve something there's a perfectly good GUI for. Data deployment pipelines don't change often enough to make YAML worthwhile in my opinion. The time is better spent doing data transformation and modelling work.

@murataydian Год назад

Thank you, Dave!

@murataydian Год назад

Thank you, Dave!

@optimastic811 Год назад

Please cover the rest data management applications like data lineage and refrence data management and metadata, thanks in advance

@Me-op9zm Год назад

Can you teach how to redirect the website after submit the form? Thanks

@DaveDoesDemos Год назад

Hi, if you wanted to do this I would recommend using javascript to submit the data rather than HTML, there are lots of examples of javascript forms which don't redirect the page when submitted. The demo was showing that very basic HTML forms can submit data in a very modern way, but I wouldn't necessarily do it this way in the real world. Redirecting isn't possible here since the form fires into Logic Apps and there's no way to then respond with a redirect. I created the demo to help people understand the connection between HTTP and APIs as many see them as very separate things but in reality they are all really basic fundamentals of the web. Thanks for the comment, I can't believe you're the first to mention this in three years as it's a really important topic and affects usability. The original demo I knocked up while on the stand at SQL Bits to show real time data processing and allowed the crowd to submit data, it was clunky but people loved the simplicity.

@omarrose1196 Год назад

Dave, I could kiss you. Thank you!

@nitindhingra2925 Год назад

Excellent Dave, Many of my queries got resolved. Keep it up.

@runilkumar3127 Год назад

Hi Dave, Thanks a lot. Can you please help me when import the notebook from databricks uat environment if am using below am getting error. If i comment below code then note book is creating without code. Please advice. # Open and import the notebook $BinaryContents = [System.IO.File]::ReadAllBytes($fileName) $EncodedContents = [System.Convert]::ToBase64String($BinaryContents)

@shibashishvlogging Год назад

I am facing issue while loading the data into power bi from cosmos gremlin . This error i am facing: This query does not have any columns with the supported data types. It will be disabled from being loaded to the model. Any suggestions why it is hppening. I followed the same steps which you showed.

@tarunacharya1337 Год назад

Awesome demo Dave, thanks a lot - I have replicated this and works ok with one notebook in the same environment - the file name is hardcoded - $fileName = "$(System.DefaultWorkingDirectory)/_Build Notebook Artifact/NotebooksArtifact/DemoNotebookSept.py", how can I generalise this for all the files and folders in the main branch and what happens to $newNotebookName in this case?

@DaveDoesDemos Год назад

Hi glad you enjoyed the demo. I'd recommend looking at using the newer Databricks methods which I've not had a chance to demo yet. These allow you to open a whole project at a time. For my older method you'd want to list out the contents of the folder and iterate through an array of filenames. In theory since you'll want your deploy script to be explicit you could even list them in the script using copy and paste, although this may get frustrating in a busy environment.

@sudheershadows1032 2 года назад

Could you please explain me more about the binary contents in power shell script

@gayathrivenkata621 2 года назад

I just want to know if i have to replace the SFTP with other another source.Is thata possible.actually the condition is you shouldn't use blobstorage.is there any other way..to replace the SFTP so that I can upload my data automatically to the azure.could you please help me with this.

@greatladyp6632 2 года назад

Do you do trainings?

@goofydude02 2 года назад

18K + views but 900 subscribers why? if you are watching the content, no harm to subscribe right?

@JohnMusicbr 2 года назад

Awesome. Thanks, Dave.