Baru mengenal python, berasal dari php. Saya ingin mengikis beberapa situs menggunakan Scrapy dan telah melalui tutorial dan skrip sederhana dengan baik. Sekarang menulis real deal muncul kesalahan ini:
Traceback (panggilan terakhir terakhir):
File "C:\Users\Naltroc\Miniconda3\lib\site-packages\twisted\internet\defer.py", baris 653, di _runCallbacks current.result = callback(current.result, *args, **kw)
File "C:\Users\Naltroc\Documents\Python Scripts\tutorial\tutorial\spiders\quotes_spider.py", baris 52, di parse self.dispatchersite
TypeError: tesaurus() tidak ada 1 argumen posisi yang diperlukan: 'respons'
Scrapy secara otomatis membuat instance objek ketika perintah shell scrapy crawl words
dipanggil.
Dari apa yang saya pahami, self
adalah parameter pertama dari metode kelas apa pun. Saat memanggil metode kelas, Anda tidak meneruskan self
sebagai argumen dan malah mengirimkannya ke variabel Anda.
Pertama ini disebut:
# Scrapy automatically provides `response` to `parse()` when coming from `start_requests()`
def parse(self, response):
site = response.meta['site']
#same as "site = thesaurus"
self.dispatcher[site](response)
#same as "self.dispatcher['thesaurus'](response)
Kemudian
def thesaurus(self, response):
filename = 'thesaurus.txt'
words = ''
ul = response.css('.relevancy-block ul')
for idx, u in enumerate(ul):
if idx == 1:
break;
words = u.css('.text::text').extract()
self.save_words(filename, words)
Di php, ini harus sama dengan memanggil $this->thesaurus($response)
. parse
jelas mengirimkan response
sebagai variabel, tetapi python mengatakan itu hilang. Ke mana perginya?
Kode lengkap di sini:
import scrapy
class WordSpider(scrapy.Spider):
def __init__(self, keyword = 'apprehensive'):
self.k = keyword
name = "words"
# Utilities
def make_csv(self, words):
csv = ''
for word in words:
csv += word + ','
return csv
def save_words(self, words, fp):
with ofpen(fp, 'w') as f:
f.seek(0)
f.truncate()
csv = self.make_csv(words)
f.write(csv)
# site specific parsers
def thesaurus(self, response):
filename = 'thesaurus.txt'
words = ''
print("in func self is defined as ", self)
ul = response.css('.relevancy-block ul')
for idx, u in enumerate(ul):
if idx == 1:
break;
words = u.css('.text::text').extract()
print("words is ", words)
self.save_words(filename, words)
def oxford(self):
filename = 'oxford.txt'
words = ''
def collins(self):
filename = 'collins.txt'
words = ''
# site/function mapping
dispatcher = {
'thesaurus': thesaurus,
'oxford': oxford,
'collins': collins,
}
def parse(self, response):
site = response.meta['site']
self.dispatcher[site](response)
def start_requests(self):
urls = {
'thesaurus': 'http://www.thesaurus.com/browse/%s?s=t' % self.k,
#'collins': 'https://www.collinsdictionary.com/dictionary/english-thesaurus/%s' % self.k,
#'oxford': 'https://en.oxforddictionaries.com/thesaurus/%s' % self.k,
}
for site, url in urls.items():
print(site, url)
yield scrapy.Request(url, meta={'site': site}, callback=self.parse)