top of page

Should we be worried about what the internet knows about us?

Diane Hall


Web code

We all know that our devices are constantly listening to us, and that cookies built into websites track our cursor’s every move. It’s how marketing unfolds in 2021, and how AI and algorithms tailor the advertising content we see.

Now, think about your website and social media profiles and what someone could glean from them. I know that people wouldn’t have to dig too far to learn that I’m married, my husband’s name and those of my children, where we live, my job role and employers, and my age. Maybe these things aren’t ground-breaking, but there’s probably more…information that could be valuable to someone looking to sell to me—or steal from me.

A few weeks ago, a hacker gleaned such ‘harmless’ details from LinkedIn and created a database containing 700 million users’ information. He sold this list for less than £4,000 a time (to more than one buyer).

The term for this is ‘data scraping’. Whereas hackers are assumed to be criminals that power through firewalls and break into cyber security systems, this particular hacker did nothing illegal—the information he collected, all 700 million entries, was in the public domain. He broke no protocols and didn’t do anything someone else couldn’t do. He simply ‘scraped’ the info from the front-facing side of the web rather than the back end that criminals usually pay attention to.

Though the hacker couldn’t be drawn on what the information he farmed would specifically be used for, he did admit that it would likely be part of ‘malicious hacking campaigns’.

According to Dave McKay, writing for Cloud Savvy IT, ‘In 2020, the number of personal records scraped from YouTube was 4 million. The figure for TikTok was over ten times higher, at 42 million. That same year, 191 million personal records were scraped from Instagram. All of these platforms prohibit the scraping of data.’

Should we be worried about what strangers can learn about us?

Probably. Experts say that scraping isn’t always malicious. It can be used to tailor advertising content even further, in the hope that we buy more.

And, whilst phishing scams are nothing new, an influx of personalised data could make them appear more genuine. Knowing more about us can also make it easier to guess our passwords.

The hacker mentioned was able to harvest from LinkedIn: users’ location data, email addresses, phone numbers and usernames for other social media platforms. If nothing else, such data provides more methods and channels that scammers can exploit to contact us.

Apart from specialist bots that continually trawl your webpages like spiders, looking for scrapers, there’s not much anyone can do to prevent this happening. You could keep such information to yourself, but as most social media platforms require some, or all, of this kind of data to create your profile, it’s difficult to use and interact on them without giving it away. Of course, you could come off social media altogether and remove every mention of yourself, but there’s that old saying that what’s put on the web never dies. Where would be the fun in that anyway, in 2021? Interacting with family, friends, brands, and even strangers, online is ingrained in our lives nowadays.

This is what Erik Fair, Software and Network Engineer, said on the matter: ‘By the very act of putting data on a web server with a public IP address, anyone and everyone can copy (and thus process) it.

You could make your entire website a series of images (GIF, JPEG, PNG, whatever) with image map links for navigation. However, you would likely incur rather greater web service bandwidth costs with this approach (quite aside from how annoying it would be). Also, your website would effectively be invisible to web search engines.

Fundamentally, if you have Intellectual Property (IP) rights to your data, this becomes a patent or copyright enforcement problem, just as in book publishing: anyone can photocopy a book, bind it, and sell it. That's a violation of the publisher's copyright, but it's up to the publisher to enforce his rights. For IP rights holders, the problem with the internet and digital media in general is that it's much easier and cheaper to copy ‘bits’ than paper, hence the copyright violations problem is worse.’

Want your article or story on our site? Contact us here

bottom of page