Take the following HTML file for example:
There are obviously two A links there, but one of them is being modified by JS. This dynamic modification of elements is extremely common today. So what happens if you use the old method of getting A links?
Well that obviously didn't work... How about using the BeautifulSoup python module?
Also no...The best way i've found to do it is to actually have a browser engine parse the entire file and execute the JS, and then grab all the a links by issuing a command to the JS interpreter. I wrote the following script to do exactly that. It uses the Chrome browser in headless mode to perform all the parsing, and then via selenium, issues a JS statement to grab all the A links:
Running this results in:
That's more like it.
PS. This is still not "perfect" since certain frameworks will change content via certain event handlers. This handles some (e.g. DOMContentLoaded), but not others (e.g. onclick events). You kinda just have to deal with that. Making a script to identify changes based on all event handlers would likely be extremely risky.