Building a No-Code Web Scraper
No matter what your role is, we're all trying to do more with less. Waves of automation are rippling through the workplace, starting with engineering teams, and now making its way into sales, marketing and finance departments. Unfortunately, many who work in traditionally "non-technical" fields believe they're not skilled enough to apply this sort of leverage to their work day.
The rise of the software platform changes all this, dramatically lowering the bar on the technical skills required to build powerful automations and workflows. As these platforms get more powerful, sales and marketing teams are waking up to realise they don't need to be developers to automate things like lead generation and competitor analysis. This can all be done with a web scraper, which thanks to platforms like Datagrab, require no coding skill whatsoever.
Platforms like Datagrab provide a visual editor, where you can simply point, click and extract the data you need, when you need it. Better yet, as a cloud platform, you don't need to install anything or work about managing servers to deliver information to you. In today's post we’re going to be taking Datagrab for a spin, building a web scraper for the crowdfunding website Seeders.com in under 5 minutes, with no code required whatsoever.
Tracking Investment Opportunities and Competitors with Seeders.com
Seeders is a crowdsourced investment platform that pairs companies looking to raise capital, with individuals, looking to make an investment. Startups are increasingly raising finance in this way, which make crowdfunding sites like Seeders a great place to find potential investment opportunities, or even monitor the site for competitors looking to raise capital. A well capitalised competitor is definitely worth getting advanced warning of if you're established in the market, but rather than visiting the site each week or even each day, we can automate this data collection with Datagrab.
Building a Web Scraper with No Code
To get started, we’ll open a free Datagrab account and click to create our first scraper. Step 1 in the process is to tell DataGrab which website to visit to extract data from. This step takes a bit of thinking through, because tools like Datagrab, you won’t be able to navigate through the website with clicks or scrolling (yet).
In order to find a good URL to start with for the Seeders website, you can click on “Invest” and view the investment opportunities by scrolling down. As you begin to scroll down, you’ll notice the URL in your browser starts to include new search parameters.
As we scroll down the page, the Seeders site is adding “current page”, “last page”, “sort” and “view” parameters onto the end of the URL, and we can use this to construct a good starting URL for Datagrab. As it turns out, the default view and sort parameters work well here, however we will want to change the "current page" and "last page parameters. If we change both of these to a higher number, in this case 10, we can force the Seeders site to load all the results when we navigate to the site using this link. Entering in the Start URL, we can click next to begin the setup.
In Step 2, Datagrab will visit the website and create a view that we can use to select which elements to extract from the page. Using the tools on the left hand side, we can can tell Datagrab what type of element we’re selecting, whether it is Text, a Date or Time element, a Link, or a link to another page that Datagrab should visit and extract data from.
Setting Up the Template
To build the Seeders scraper, firstly we’ll select the Text tool, and begin by clicking on the names of the businesses looking to raise funding on the Seeders platform. You can see a helper box pops up, which shows the CSS selector used to select these. You don’t need to select all the business names on the page, however when multiple results exist on the page, I tend to at least select 2 results. Datagrab will infer from your choices that you want to select all business names on the page and extract all of these for you.
Next, we’ll select the blurb, days left for raise, valuation, funding target and number of investors to give us the complete listing. Each of these can be added one by one using the Text selector tool.
Finally we’ll use the Link Tool to select and extract the link for the individual page listing. We may decide to visit these page later, or we could even use the Detail Page tool and set up a further web scraping template to extract information on those pages. For the purposes of this tutorial we’ll finish here and run a preview extraction using the Preview option.
Running the Preview, we can see the type of data we would get if this extraction were to run. With all of this looking good, we can move on to the next step and see up our extraction frequency.
Finishing Off The Datagrab Scraper
In the final step, we’ll tell Datagrab whether there are any limits on the number of results we want from the page, as well as how often we want the scrape to run. The scape can run hourly, daily, weekly or monthly.
Lastly, we’ll tell Datagrab where we want the data sent to, the format and whether we want to start the extraction right away after submitting. With that, our scraper is complete, in under 5 minutes, without writing a single line of code.
Final Thoughts
Automating tasks like collecting data on the web doesn’t need to be hard, and doesn’t require code with the rise of platforms that can take a lot of this complexity away. Of course, with reduced complexity, there are always tradeoffs. Datagrab isn't guaranteed for every site on the web, particularly as anti-web scraping is becoming more common. However, with its ease of use and speed, Datagrab makes a great tool for simple web scraping that lets you build and test scrapers fast.