Bidang cetakan tergores tetapi tidak mengisi file XML

Saya mempunyai masalah saat mencetak file XML dengan benar tetapi tidak mengisi file XML dengan konten apa pun.

Output di terminal adalah ini:

[u'Tove'] [u'Jani'] [u'Reminder'] [u"Don't forget me this weekend!"]

Namun keluaran site_products.xml menghasilkan ini (yang salah, tidak ada data):

<?xml version="1.0" encoding="utf-8"?>
<items></items>

laba-laba.py

from scrapy.contrib.spiders import XMLFeedSpider
from crawler.items import CrawlerItem

class SiteSpider(XMLFeedSpider):
    name = 'site'
    allowed_domains = ['www.w3schools.com']
    start_urls = ['http://www.w3schools.com/xml/note.xml']
    itertag = 'note'

    def parse_node(self, response, selector):
        to = selector.xpath('//to/text()').extract()
        who = selector.xpath('//from/text()').extract()
        heading = selector.xpath('//heading/text()').extract()
        body = selector.xpath('//body/text()').extract()
        return item

pipa.py

from scrapy import signals
from scrapy.contrib.exporter import XmlItemExporter

class XmlExportPipeline(object):

    def __init__(self):
        self.files = {}

    @classmethod
    def from_crawler(cls, crawler):
         pipeline = cls()
         crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
         crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
         return pipeline

    def spider_opened(self, spider):
        file = open('%s_products.xml' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = XmlItemExporter(file)
        self.exporter.start_exporting()

    def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

item.py

import scrapy                                                                                           


class CrawlerItem(scrapy.Item):
    to = scrapy.Field()
    who = scrapy.Field()
    heading = scrapy.Field()
    body = scrapy.Field()
    pass

pengaturan.py

BOT_NAME = 'crawler'                                                                                                                                                                                           
SPIDER_MODULES = ['crawler.spiders']                                                                    
NEWSPIDER_MODULE = 'crawler.spiders'
ITEM_PIPELINES = {'crawler.pipelines.XmlExportPipeline': 300,}

Bantuan apa pun dalam hal ini akan sangat dihargai.


person J.Zil    schedule 24.04.2015    source sumber
comment
Laba-laba Anda tidak mengisi bidang di item Anda (di mana Anda mendefinisikan item)?   -  person Blender    schedule 25.04.2015
comment
@Blender Saya tidak yakin bagaimana melakukan ini. Sebelumnya saya punya barang Retur tetapi tidak berhasil ketika saya mencobanya. Bantuan apa pun akan dihargai.   -  person J.Zil    schedule 25.04.2015


Jawaban (1)


Anda perlu membuat instance CrawlerItem dalam metode parse_node() Anda:

def parse_node(self, response, selector):
    item = CrawlerItem()
    item['to'] = selector.xpath('//to/text()').extract()
    item['who'] = selector.xpath('//from/text()').extract()
    item['heading'] = selector.xpath('//heading/text()').extract()
    item['body'] = selector.xpath('//body/text()').extract()
    return item
person alecxe    schedule 24.04.2015