Scrapy 0.18 Documentation>>> proc = Compose(lambda v: v[0], str.upper) >>> proc(['hello', 'world']) 'HELLO' 3.6. Item Loaders 53 Scrapy Documentation, Release 0.18.4 Each function can optionally receive a loader_context parameter [follow] INFO: Crawled 343 pages (at 4140 pages/min), scraped 0 items (at 0 items/min) 2013-05-16 13:08:53-0300 [follow] INFO: Crawled 410 pages (at 4020 pages/min), scraped 0 items (at 0 items/min) 2013-05-16 38 scrapy.contrib.linkextractors.sgml, 38 scrapy.contrib.loader, 46 scrapy.contrib.loader.processor, 53 scrapy.contrib.logstats, 131 scrapy.contrib.memdebug, 132 scrapy.contrib.memusage, 131 scrapy.contrib0 码力 | 201 页 | 929.55 KB | 1 年前3
Scrapy 1.5 DocumentationXML response body, returning a list of Selector objects (ie. a SelectorList object): 3.3. Selectors 53 Scrapy Documentation, Release 1.5.2 sel.xpath("//product") 2. Extract all prices from a Google Base logstats] INFO: Crawled 198 pages (at 3840 ˓→pages/min), scraped 0 items (at 0 items/min) 2016-12-16 21:18:53 [scrapy.extensions.logstats] INFO: Crawled 254 pages (at 3360 ˓→pages/min), scraped 0 items (at 0 method), 91 css() (scrapy.selector.Selector method), 52 css() (scrapy.selector.SelectorList method), 53 CSVFeedSpider (class in scrapy.spiders), 39 CsvItemExporter (class in scrapy.exporters), 215 custom_settings0 码力 | 285 页 | 1.17 MB | 1 年前3
Scrapy 1.3 DocumentationExtract all prices from a Google Base XML feed which requires registering a namespace: 3.3. Selectors 53 Scrapy Documentation, Release 1.3.3 sel.register_namespace("g", "http://base.google.com/ns/1.0") logstats] INFO: Crawled 198 pages (at 3840 ˓→pages/min), scraped 0 items (at 0 items/min) 2016-12-16 21:18:53 [scrapy.extensions.logstats] INFO: Crawled 254 pages (at 3360 ˓→pages/min), scraped 0 items (at 0 __nonzero__() (scrapy.selector.Selector method), 52 __nonzero__() (scrapy.selector.SelectorList method), 53 A adapt_response() (scrapy.spiders.XMLFeedSpider method), 38 add_css() (scrapy.loader.ItemLoader0 码力 | 272 页 | 1.11 MB | 1 年前3
Scrapy 1.2 Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.5 Item Loaders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . price = scrapy.Field() stock = scrapy.Field() last_updated = scrapy.Field(serializer=str) 3.4. Items 53 Scrapy Documentation, Release 1.2.3 Note: Those familiar with Django will notice that Scrapy Items [scrapy] INFO: Crawled 343 pages (at 4140 pages/min), ˓→scraped 0 items (at 0 items/min) 2013-05-16 13:08:53-0300 [scrapy] INFO: Crawled 410 pages (at 4020 pages/min), ˓→scraped 0 items (at 0 items/min) 2013-05-160 码力 | 266 页 | 1.10 MB | 1 年前3
Scrapy 0.24 Documentationdata from the given value using extract_regex() method, applied before processors 3.5. Item Loaders 53 Scrapy Documentation, Release 0.24.6 Examples: >>> from scrapy.contrib.loader.processor import TakeFirst [follow] INFO: Crawled 343 pages (at 4140 pages/min), scraped 0 items (at 0 items/min) 2013-05-16 13:08:53-0300 [follow] INFO: Crawled 410 pages (at 4020 pages/min), scraped 0 items (at 0 items/min) 2013-05-16 scrapy.webservice.JsonRpcResource method), 83 get_value() (scrapy.contrib.loader.ItemLoader method), 53 get_value() (scrapy.statscol.StatsCollector method), 146 get_xpath() (scrapy.contrib.loader.ItemLoader0 码力 | 222 页 | 988.92 KB | 1 年前3
Scrapy 0.20 Documentation>>> proc = Join(’
’) >>> proc([’one’, ’two’, ’three’]) u’one
two
three’ 3.6. Item Loaders 53 Scrapy Documentation, Release 0.20.2 class scrapy.contrib.loader.processor.Compose(*functions, * [follow] INFO: Crawled 343 pages (at 4140 pages/min), scraped 0 items (at 0 items/min) 2013-05-16 13:08:53-0300 [follow] INFO: Crawled 410 pages (at 4020 pages/min), scraped 0 items (at 0 items/min) 2013-05-16 64 scrapy.contrib.linkextractors.sgml, 64 scrapy.contrib.loader, 47 scrapy.contrib.loader.processor, 53 scrapy.contrib.logstats, 133 scrapy.contrib.memdebug, 134 scrapy.contrib.memusage, 133 scrapy.contrib0 码力 | 197 页 | 917.28 KB | 1 年前3
Scrapy 0.9 DocumentationCollection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Sending e-mail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ', value) Get global stat value: >>> stats.get_value('spiders_crawled') 8 4.2. Stats Collection 53 Scrapy Documentation, Release 0.9 Get all global stats (ie. not particular to any spider): >>> stats0 码力 | 156 页 | 764.56 KB | 1 年前3
Scrapy 1.4 Documentationfor that. Let’s show an example that illustrates this with GitHub blog atom feed. 3.3. Selectors 53 Scrapy Documentation, Release 1.4.0 First, we open the shell with the url we want to scrape: $ scrapy logstats] INFO: Crawled 198 pages (at 3840 ˓→pages/min), scraped 0 items (at 0 items/min) 2016-12-16 21:18:53 [scrapy.extensions.logstats] INFO: Crawled 254 pages (at 3360 ˓→pages/min), scraped 0 items (at 0 method), 52 272 Index Scrapy Documentation, Release 1.4.0 re() (scrapy.selector.SelectorList method), 53 REACTOR_THREADPOOL_MAXSIZE setting, 107 REDIRECT_ENABLED setting, 184 REDIRECT_MAX_TIMES setting0 码力 | 281 页 | 1.15 MB | 1 年前3
Scrapy 0.22 DocumentationSiteSpecificLoader(ProductLoader): name_in = MapCompose(strip_dashes, ProductLoader.name_in) 3.5. Item Loaders 53 Scrapy Documentation, Release 0.22.0 Another case where extending Item Loaders can be very helpful [follow] INFO: Crawled 343 pages (at 4140 pages/min), scraped 0 items (at 0 items/min) 2013-05-16 13:08:53-0300 [follow] INFO: Crawled 410 pages (at 4020 pages/min), scraped 0 items (at 0 items/min) 2013-05-160 码力 | 199 页 | 926.97 KB | 1 年前3
Scrapy 1.0 Documentationcopy() >>> print product3 Product(name='Desktop PC', price=1000) Creating dicts from items: 3.4. Items 53 Scrapy Documentation, Release 1.0.7 >>> dict(product) # create a dict from all populated values {'price': [scrapy] INFO: Crawled 343 pages (at 4140 pages/min), ˓→scraped 0 items (at 0 items/min) 2013-05-16 13:08:53-0300 [scrapy] INFO: Crawled 410 pages (at 4020 pages/min), ˓→scraped 0 items (at 0 items/min) 2013-05-160 码力 | 244 页 | 1.05 MB | 1 年前3
共 58 条
- 1
- 2
- 3
- 4
- 5
- 6













