There I was, puttering right along and getting some quote data when the unthinkable happened — Tradingview got wise to my quote-scraping ways, and the historical data I successfully was getting turned into error messages from the server. Dang.
Oh, cruel fate.
I had to regroup and try to salvage something. I had a bunch of regular expressions for filtering Tradingview’s price data, and I didn’t want them to go to waste! After a bit of thinking, I decided to do it the old fashioned way — scrape the Tradingview site directly.
This would require learning yet-another-skill, using a module called “Selenium” for Python. This clever library allows you to dive into page source code for stuff, or open up web pages and even simulate user “clicks” and data entry – for say, logging in.
I got to work, and soon I had something going:
browser.get(url) browser.implicitly_wait(3) ## Have to use CSS selector when class names have spaces - replace with '.' login_button_class = "tv-header__user-menu-button" user_menu_class = "item-4TFSfyGO" sign_in_email_class = "tv-signin-dialog__toggle-email" user_name = "username" browser.find_element(by=By.CLASS_NAME, value=login_button_class).click() # Click on user login time.sleep(1) browser.find_element(by=By.CLASS_NAME, value=user_menu_class).click() # Click on dropdown for email time.sleep(1) browser.find_element(by=By.CLASS_NAME, value=sign_in_email_class).click() # Click on email for User/Pass dialog time.sleep(1) input_username = browser.find_element(by=By.NAME, value=user_name).click() # Click on email for User/Pass dialog pyautogui.typewrite(user) # works pyautogui.press("tab") # Tab to next field pyautogui.typewrite(password) pyautogui.press("enter") time.sleep(1)
There’s more setup involved prior to these actions, but I wanted to show the core of what I was doing. Clicking buttons, signing in, all of that. There was a way to cache the result — so I didn’t have to log in every single time, but for some reason that code didn’t work for me. I plowed ahead, undaunted.
A few bazillion log-ins later, and a bunch of “did you just log in from a new device” emails, I was at the page where you could add symbols to a watchlist, and it would helpfully display them on the right hand side of the page. I was set! (Or so I thought.)
No my friends, no such luck. Turns out the Tradingview chaps are quite resourceful. Let me explain.
In the “old days” you could look at a website’s source, it would have its data embedded in the page like “lastprice=46624.50”, which was trivial to scrape.
Well, websites are now reactive and do all kinds of things, which means what I was searching for was deep in the source. And I mean DEEP. Take a look at this relative path here:
And that is just for ONE quote, mind you. (20-plus levels deep!)
Even if you got down there, Tradingview made sure to make it as hard as possible. How? Well, if you weren’t paying much attention, you’d pull up your watchlist and it would have some symbols with prices, like this:
DXY 98.62 BTCUSD 46639.25
So just dive down into the source and get it, right? Well, its more complicated than that. They don’t just display the prices in one go — oh no — some evil genius over there decided on any up/down tick to color a RANDOM portion of the quote green or red.
Which means a simple quote of:
<span class="inner-ghhqKDrt">4610<span class="minus-ghhqKDrt">5.2</span></span>
So what, right?
It turns out that its monumentally harder to scrape a quote when the style of that quote changes on a whim. So, part of it is white, some of it is red/green at any given point. By splitting the quote apart in a random way, it turns out regular expressions that you’d use to grab it only get a fraction of the “normal” part:
4610 -- instead of -- 46105.20
And since I wouldn’t know which part of the quote is being colored a given style, I couldn’t make precise regular expressions that captured it precisely. This is what is known as a “needle-in-a-colored-haystack” kind of problem.
But — not all is lost. I learned a LOT about grabbing things from pages, so I’m sure that skill will come in handy down the line. After realizing that scraping the Tradingview site was a non-starter, I did some digging and found that my charting program I use has data formatted locally on my drive I could parse.
You live and learn, I suppose.
All I can say though is — whoever designed Tradingview’s quote display system is an evil bastard genius.
And I’d buy them a beer.
More to come…