Scrapy 0.9 DocumentationServices [http://aws.amazon.com/associates/]) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide /tor/\d+. For extracting data we’ll use XPath [http://www.w3.org/TR/xpath] to select the part of the document where the data is to be extracted. Let’s take one of those torrent pages: http://www.mininova.org/tor/2657665 using pickle [http://docs.python.org/library/pickle.html]: import pickle class StoreItemPipeline(object): def process_item(self, spider, item): torrent_id = item['url'].split('/')[-1]0 码力 | 204 页 | 447.68 KB | 1 年前3
Scrapy 0.9 Documentation(such as Amazon Associates Web Services) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide for the links to follow: /tor/\d+. For extracting data we’ll use XPath to select the part of the document where the data is to be extracted. Let’s take one of those torrent pages: http://www.mininova.org/tor/2657665 serializes and stores the extracted item into a file using pickle: import pickle class StoreItemPipeline(object): def process_item(self, spider, item): torrent_id = item['url'].split('/')[-1] f = open("torrent-%s0 码力 | 156 页 | 764.56 KB | 1 年前3
Scrapy 0.24 Documentation(such as Amazon Associates Web Services) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide start URLs. • parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method meanings: • /html/head/title: selects theelement, inside the element of a HTML document • /html/head/title/text(): selects the text inside the aforementioned element. 12 Chapter 0 码力 | 222 页 | 988.92 KB | 1 年前3
Scrapy 0.24 DocumentationServices [http://aws.amazon.com/associates/]) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide start URLs. parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method their meanings: /html/head/title: selects theelement, inside the element of a HTML document /html/head/title/text(): selects the text inside the aforementioned element. //td: selects 0 码力 | 298 页 | 544.11 KB | 1 年前3
Scrapy 0.22 Documentation(such as Amazon Associates Web Services) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide start URLs. • parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method meanings: • /html/head/title: selects theelement, inside the element of a HTML document • /html/head/title/text(): selects the text inside the aforementioned element. • //td: 0 码力 | 199 页 | 926.97 KB | 1 年前3
Scrapy 0.20 Documentation(such as Amazon Associates Web Services) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide start URLs. • parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method meanings: • /html/head/title: selects theelement, inside the element of a HTML document • /html/head/title/text(): selects the text inside the aforementioned element. 12 Chapter 0 码力 | 197 页 | 917.28 KB | 1 年前3
Scrapy 0.14 DocumentationServices [http://aws.amazon.com/associates/]) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide 3.5. index modules | next | previous | Scrapy 0.14.4 documentation » Installation guide This document describes how to install Scrapy on Linux, Windows and Mac OS X. Requirements Python [http://www start URLs. parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method0 码力 | 235 页 | 490.23 KB | 1 年前3
Scrapy 0.14 Documentation(such as Amazon Associates Web Services) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide read the tutorial and join the community. Thanks for your interest! 2.2 Installation guide This document describes how to install Scrapy on Linux, Windows and Mac OS X. 2.2.1 Requirements • Python 2 start URLs. • parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method0 码力 | 179 页 | 861.70 KB | 1 年前3
Scrapy 0.12 DocumentationServices [http://aws.amazon.com/associates/]) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide 3.5. index modules | next | previous | Scrapy 0.12.0 documentation » Installation guide This document describes how to install Scrapy on Linux, Windows and Mac OS X. Requirements Python [http://www start URLs. parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method0 码力 | 228 页 | 462.54 KB | 1 年前3
Scrapy 0.12 Documentation(such as Amazon Associates Web Services) or as a general purpose web crawler. The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide read the tutorial and join the community. Thanks for your interest! 2.2 Installation guide This document describes how to install Scrapy on Linux, Windows and Mac OS X. 2.2.1 Requirements • Python 2 start URLs. • parse() is a method of the spider, which will be called with the downloaded Response object of each start URL. The response is passed to the method as the first and only argument. This method0 码力 | 177 页 | 806.90 KB | 1 年前3
共 62 条
- 1
- 2
- 3
- 4
- 5
- 6
- 7













