Semalt Expert Tells How Web Data Scraping Was Legalized With A Court Ruling
While it may be illegal to scrape data from websites without the explicit permission of the owners of the site, a judge has recently ruled otherwise under certain circumstances. hiQ Labs recently filed a lawsuit against LinkedIn for preventing them from extracting data from LinkedIn pages.
It came as a rude shock to most people that LinkedIn was told to give the startup free access to its web pages. hiQ used its algorithms to detect when a LinkedIn user is looking for a job based on the changes the user makes to his/her public profile.
The algorithms run on data extracted from the LinkedIn web pages. As expected, LinkedIn didn't like it and countermeasures were put in place to prevent hiQ from further data extraction. Apart from the technical barriers that were put in place, strongly worded legal warnings were issued too.
The startup had no choice but to take the issue up legally. hiQ had to seek legal redress. The company wanted LinkedIn ordered to remove its technical barriers. hiQ also wanted its data extraction process on LinkedIn legalized.
Fortunately for the startup, it got what it wanted. The ruling was in favor of hiQ. LinkedIn was ordered to remove all the countermeasures hindering hiQ from scraping its(LinkedIn) web pages and also give hiQ free hand as the act is totally legal. The judge hinged his ruling on the fact that what hiQ wants to scrape is data that have been displayed for public view.
The judge did not only order the defendant to remove all the preventive mechanism put in place against hiQ, but he also ordered that defendant should desist from such acts in future.
Promoting open web data
While the ruling is still a temporary injunction, it is heartwarming to hear that the law supports open web data and free access to information on the Internet as this ruling confirms that. Even if the final decision gets to favor the defendant, this fact has already been established.
The judge promoted this policy by shutting down virtually all LinkedIn's arguments. While LinkedIn tried to establish that the plaintiff was breaching its privacy, the judge countered it with the fact that the defendant is also selling the data.
When the argument did not hold water, the defendant also stated that hiQ's act was in gross violation of Computer Fraud and Abuse Act (CFAA) because the startup accessed their servers to harvest data illegally. Again, the argument was punctured. It was rejected on the ground that hiQ was only scraping content on the public, non-protected pages.
The judge analogized the case as someone walking into an open store during business hours. Such a person cannot be said to be trespassing. So, hiQ was not trespassing. Interestingly, the judge went further to explain why his ruling is in the public interest.
In a nutshell, the court accepted that it is in the public interest to allow data to be crawled, extracted, and analyzed. So, it will be a detrimental policy to encourage placement of barriers to free flow of information.
What you should learn from the ruling
While you may not have reasons to extract data directly from LinkedIn, you should learn from the ruling. It is better to play safe by reading and respecting the robots.txt file of all websites. Remember, the ruling is still a temporary injunction. It could eventually go in favor of LinkedIn.
While the ruling may not affect you directly, it is gladdening that a federal court upholds the policy of keeping the web open for the public. So, information should be available and accessible to those that can search and make good use of it.
Web data is extremely useful to everyone, especially media analysts, developers, data scientists and some other professionals. As such, the ruling is a welcome development.