This was really a powerful solution instead of using API for streaming live data. You get the best of both worlds, History data and real time updates without hammering the API. The last thing is to have a rolling History of some time for certain timeframes. For example its a rolling History of 20 years for 1 month data, or 6 months for weekly data etc. Thanks for sharing. I have it all working except the last idea in this post. It's really neat to see this working so nicely.
just gonna leave a comment here. back when I was learning python. i was thinking this will be most efficient way for me to do backtesting. now I got the tutorial . thanbks man
thank you so much. ive been going through your videos since day 1 and I have learned so much from you. More than other schools like codeacademy and other "boot camp schools" I learn the best when I am applying the skills to something applicable and that is how I remember. Thank you for all that you do and I wish you keep on providing value! Much Love my friend.
I personally have a deep need to Data Base oriented videos, I really appreciate your kindly efforts, please keep on making this videos. this topics are so useful to us and the way you teaching is very enjoyable, if you add websocket to these DB videos it would be PERFECT, Thank you...
Great work. Been watching all your content since I subscribed. Im working on something similar - websockets + postgres db as a background service. Then just using the pandas dataframe with the real time updated table. I could have got postgres to listen with a socket, but this way I can include the EMA / AMA, RSI calculations that I want in python which is a lot easier to do.
for split and dividends we could add a check on the LAST day in the database for the close and compare the close saved on disk with the one we download (Adj. close). IF this is different then there has been dividends, splits, etc hence we should download the whole time series again and save it from scratch with the fresh data
@@Algovibes when I am done testing, I will spin up a mysql in my kubernetes cluster. And have my scripts run on daily schedule. And report to a discord channel :)
This is perfect, thanks for the video! may I ask a noob question? for financials data (stock price, financial statements) what kind of DB should we use? SQL or no-SQL? I tested SQLite before but I got worried due to it's not supporting async write. Also no-SQL like MongoDB can actually do SQL as well, but on the other hand SQL is difficult to scale horizontally. due to the factors above, in my mind the no-SQL choice is a no-brainer...but listening to a lot of opinions, that's not the case which confuses me. Also please please keep on making DB oriented videos..these contents are super awesome especially in conjunction with financials and trading topics! Cheers!
Thanks a lot for your comment man. Appreciate it! Well it's really hard to answer questions like that as it depends. I personally wouldn't use a NoSQL DB here but my view is biased as I primarily work both privately and professionally with SQL DBs. But could you elaborate on the async topic? Does it not work because the Database is locked or what exactly are you referring to?
@@Algovibes sorry for my lack of understanding...i meant to say "it does not support concurrent writes" and yes it's due to the locking contention. I've come across newSQL (~ distributed SQL) recently and I'm trying the free-tier of CockroachDB. Looks interesting as it got the best of both worlds: the structure of SQL and the horizontal scaling capability of no-SQL.
Is there any way to offset time in df/database for the one I am currently in? For example, in Amsterdam it will be UTC + 1hour. I found a workaround adding line to 'getdata' function: frame.index = frame.index + pd.DateOffset(hours=1). This works for pandas df, but I got this error when 'crypto_SQL_updater' function is executed: ValueError: Length mismatch: Expected axis has 0 elements, new values have 5 elements. So dataframe is being created correctly, but it does not want to write to SQL DB by some reason. FUll code is: def getdata(symbol, start): frame = pd.DataFrame(client.get_historical_klines(symbol, '1m', start)) frame = frame.iloc[:,:5] frame.columns = ['date', 'open', 'high', 'low', 'close'] frame.set_index('date', inplace=True) frame.index = pd.to_datetime(frame.index, unit='ms') # frame.index = frame.index + pd.DateOffset(hours=1) frame = frame.astype(float) return frame # df = getminutedata('DOGEUSDT', '2022-11-12').to_sql('DOGEUSDT', engine) print(pd.read_sql('DOGEUSDT', engine)) def crypto_SQL_updater(symbol): max_date= pd.read_sql(f'SELECT MAX(DATE) FROM {symbol}', engine).values[0][0] print('MAX Date from DB before update is: ' + max_date) new_data = getdata(symbol, max_date) print(new_data) print('-----------------------------') new_rows= new_data[new_data.index > max_date] new_rows[:-1].to_sql(symbol, engine, if_exists='append') print(str(len(new_rows[:-1])) + ' new rows imported to DB') crypto_SQL_updater('DOGEUSDT')
First of all, amazing content....thanks as always! My question is regarding stocks which have stock splits and dividends that change the OHLC data historically (ie, data in the db already). If not too much trouble, could you talk a bit about that also? Thanks 🙏
Thank you very much mate! Very good question. I actually thought about covering that in this video. In some cases you might be interested to change rows which have changed and update them with new data. Also interesting but I am not sure if it's worth making a video on that.
Must watch indeed. But what are the advantages of having a database? If you can pull all the data you need with the yahoo APIs, why you need to store them?
I touched on that topic in the beginning of the video: You don't want to exceed API limits. Furthermore: Let's say you want to backtest 500 stocks. You would need to run 500 api requests which is time consuming.
Cooles Video, weißt du was Performance technisch besser ist. Über Datenbank oder über csv? Hab aktuell ein Docker am laufen mit dem ich csv auslese und dann einen append mit den neuen Daten mache. Lösche vorher auch den letzten Eintrag da ich auch Minuten Daten ziehe. Danke
You can just pull the interval instead. Or if you want to apply multiple intervals you have to add like an argument for the interval and change table names like BTC_1m or so. Hope that helps!
Sir Very nice video, my previous problem resolved. Sir one problem is that with TvDatafeed(tredingview API), here are bar number instant start date. Pls take this case and make video on this. Thanks
Isn’t there some better method for checking if table already exist? I mean, “try-except” works well, but it may supress also any other error that may occure when updating table and create new one…
Hi, much appretiate you took an effort exploring this topic. Actually, some time ago I tried to achieve this, but I took another approach (and failed, haha)- I was trying to save minute data from finance websocket live stream. Main issue of that was that CPU was hammered all the time with usage between 70-80% all the time (on i7 CPU). And my raspberryPi was under 99%load. So I dropped this idea. I can share the code(its essentially your code from unicorn_binance tutorial ;-) if you would like to take a look and maybe improve on that.
Great intro tutorial. I have always wondered why you stick to yfinance. I have moved to yahooquery as it's way more performant imho. What do you think?
do you know of a function call in the binance api that can retrieve the date/time of the very first (oldest) data point in their api in order to request all available 1min klines from the beginning of any pair/coin price history?
Not exactly but you can construct that. Just pull data back to before crypto were a think and check for the first row of all pulled cryptos and with that you are getting the tradable data for a bunch of cryptocurrencies.
nice work, thank you! can't i use instead of if_exists='append' >>>>> ['append' if symbol in engine.table_names() else 'create'][0], i mean instead of using try and except
Thank you for making this Video. Do u also have a Video where u describe how i can get a program python "localhost"? I see your Google Cloud environment but not how to set up a localhost. I want to start trying to create crypto bots :) Thank you!
@@Algovibes in your browser it says localhost at the top how can I create such an environment in which I can program in my network? how can i run a python bot 24/7 without renting a server?
Sir Great video again, very well explain. Sir requesting you please create a video on trailing stop loss during live market data and how to apply it during live market data. will be very thankful to as not getting detail on the internet.
That was so good and useful video, Thank you, but it would be so perfect if you had used websocket to update data base, i would appreciate if you add this content to your video...
hey algovibes, i have trial/error but I keep getting an TypeError: Input type must be a str. (Maybe because I am using mysql as my db?) The problem seems to stem from here. max_date returns a numpy.datetime64('2023-08-28 ...') object. Therefore, I am unable to compare the times between max_date and new_data.index. Can anyone please kindly figure out a solution? Any help would be much appreciated! max_date = pd.read_sql(f'SELECT MAX(TIME) FROM BTCUSDT', engine).values[0][0] print(max_date)
i solved the error! thanks for challenging a beginner like me! if you using mysql, must insert ``` max_date = pd.read_sql(f'SELECT MAX(TIME) FROM {symbol}', engine).values[0][0] max_date_str = np.datetime_as_string(max_date, unit='ms', timezone='UTC') print(max_date) new_data = getdata(symbol, max_date_str) new_rows = new_data[new_data.index > max_date] ```