The Best open-source Web Crawling Frameworks in 2020 On my hunt for the right back-end crawler for my startup I took a look at several open-source systems. After some initial research, I narrowed the choice down to the 10 systems that seemed to be the most mature and widely used: Scrapy (Python), Heritrix (Java), Apache Nutch (Java), …