Like most things, if I think its going to be easy I usually find some stones I need to hop over to make some progress. Last time, we left off with me using Tradingview’s websockets to grab some quote data. The result of which looks a bit like this:
{"i":0,"v":[1648224000.0,44345.51290507,44590.0,44050.0,44452.0,541.3003276299921]},
{"i":1,"v":[1648238400.0,44452.74643167,44636.0,44275.0,44337.0,318.5197989300029]},
{"i":2,"v":[1648252800.0,44336.0,44477.0,44112.0,44441.0,193.30515490000215]},
{"i":3,"v":[1648267200.0,44434.07106376,44559.28387061,44379.0,44534.02163765,236.48303291000155]},
{"i":4,"v":[1648281600.0,44521.0,44598.0,44329.0,44335.0,158.88092032999873]},
{"i":5,"v":[1648296000.0,44335.0,44406.0,44165.0,44241.0,146.44034252999973]}],
"ns":{"d":"","indexes":[]},"t":"s1","lbs":{"bar_close_time":1648310400}}},
What the eff does that stuff mean? Let me elaborate.
The first line : 1648224000.0,44345.51290507,44590.0,44050.0,44452.0,541.3003276299921 — Unix timestamp, Open, High, Low, Close, Volume
So that timestamp would be – Friday March 25th 2022 11am DST. There’s a lot of numbers after the decimal, I only need two — so that entry would really read:
Open: 44345.51, High: 44590.00, Low: 44050.00, Close: 44452.00 Volume: 541.30
I was able to specify 240 minute (4-Hour) bars which makes it pretty easy to get a whole day in just a few entries, since 6 bars = 24 hours. Seems easy, right? All I need to do now is to parse that and write it in a way that makes sense for my encoder.
Here it is looking a bit more formatted:
Index,Date,Open,High,Low,Close,Volume
0,"03/25/2022, 03:00:00",43911.22112839,44654.67307908,43606.0,44600.0,915.298048519977
1,"03/25/2022, 07:00:00",44601.0,45082.0,44236.0038521,44346.0,1818.5557992299166
2,"03/25/2022, 11:00:00",44345.51290507,44590.0,44050.0,44452.0,541.3003276299921
3,"03/25/2022, 15:00:00",44452.74643167,44636.0,44275.0,44337.0,318.5197989300029
4,"03/25/2022, 19:00:00",44336.0,44477.0,44112.0,44441.0,193.30515490000215
5,"03/25/2022, 23:00:00",44434.07106376,44559.28387061,44379.0,44534.02163765,236.48321299000156
6,"03/26/2022, 03:00:00",44521.0,44598.0,44329.0,44335.0,158.88092032999873
7,"03/26/2022, 07:00:00",44335.0,44406.0,44165.0,44211.0,196.8329842700001
8,"03/26/2022, 11:00:00",44207.0,44472.0,44152.89142732,44367.0,138.23807569000022
9,"03/26/2022, 15:00:00",44364.0,44785.0,44257.0,44473.0,372.51797744998487
Next steps will be getting more than one quote at a time, which should be possible. In the end I’ll have quite a few of them interleaved among each other, which means I need to lean hard on regular expressions to sift through the sea of data.
(Some time later)
I’m deep into Regular Expressions, a way to sift through the alphabet soup of data and pick out the things that I want. There’s some nice tools out there to help, like regex101 dot com, but its still pretty arcane syntax-wise.
Making some progress, but its getting tricky. Let me explain. I’m using websockets, so I see the data coming from the server and it gets dumped to the console. Problem is, using Regex means it parses whatever it gets its little grubby hands on, which means it could be influenced by debug messages I dump to the console too – a bit like double-dipping into a stream.
So I have to figure out how to debug the program without messing up the datasource. Or at least I think I do at this point. Its messing with my head 🙂
I might be able to get ahead of it by flagging my debugging messages in a way so that it will ignore that, but work on the other data. Maybe… or… split out the results and save them to a file so it doesn’t “pollute” the same stream of data I’m trying to parse.
(Which is really what I should be doing, I think.)
Hoo boy, my head hurts. But I think I have it finally.
This is the data that I’ve been dealing with — just so you have an idea what it looks like raw from the websocket itself:
quote_session ID generated qs_yngjrgxzshkg
chart_session ID generated cs_vfiuozaqmkwh
~m~361~m~{"session_id":"<0.18544.193>_sfo-charts-18-webchart-5@sfo-compute-18_x","timestamp":1648480416,"timestampMs":1648480416379,"release":"registry.xtools.tv/tvbs_release/webchart:release_205-53","studies_metadata_hash":"79c6b847bdfc53283f5b5f6e28f71f7baa91e9f2","protocol":"json","javastudies":"javastudies-3.61_2183","auth_scheme_vsn":2}
~m~484~m~{"m":"qsd","p":["qs_yngjrgxzshkg",{"n":"BITFINEX:BTCUSD","s":"ok","v":{"volume":4537.63954264,"update_mode":"streaming","type":"crypto","short_name":"BTCUSD","rtc":null,"rchp":null,"pro_name":"BITFINEX:BTCUSD","pricescale":10,"original_name":"BITFINEX:BTCUSD","minmove2":0,"minmov":1,"lp_time":1648480412,"lp":47748.0,"is_tradable":true,"fractional":false,"exchange":"BITFINEX","description":"Bitcoin / Dollar","current_session":"market","currency_code":"USD","chp":2.0,"ch":935.0}}]}~m~65~m~{"m":"quote_completed","p":["qs_yngjrgxzshkg","BITFINEX:BTCUSD"]}
So after many attempts that failed, I finally came up with some regex that could filter it into this:
BITFINEX:BTCUSD
Volume: 4537.63979476
Price: 47748.46935417
BITFINEX:BTCUSD
Volume: 4537.67328922
Price: 47748.0
BITFINEX:BTCUSD
Volume: 4537.94068011
Price: 47734.0
BITFINEX:BTCUSD
Volume: 4538.05830511
Price: 47737.0
BITFINEX:BTCUSD
Volume: 4538.90654622
Price: 47732.0
BITFINEX:BTCUSD
Volume: 4539.06445621
Price: 47727.62184819
It took quite a bit to get that all working. Here’s a sample of some of the regex I used:
priceRegex = '\"lp\":(\d+.\d+)'
Make sense to you? Me either, which is why I’m super-glad that sites like regex101 dot com exist. Next, I’ll have to figure out duration between two dates in order to calculate how much quote data to ask for historically.
Until next time…