मैं स्क्रेपी का उपयोग करके बास्केटबॉल टीमों के शेड्यूल को CSV फ़ाइल में सहेजने का प्रयास कर रहा हूं। मैंने इन फाइलों में निम्नलिखित कोड लिखा है:

सेटिंग्स.py

BOT_NAME = 'test_project'

SPIDER_MODULES = ['test_project.spiders']
NEWSPIDER_MODULE = 'test_project.spiders'

FEED_FORMAT = "csv"
FEED_URI = "cportboys.csv"

# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'test_project (+http://www.yourdomain.com)'

# Obey robots.txt rules
ROBOTSTXT_OBEY = True

khsaabot.py

import scrapy


class KhsaabotSpider(scrapy.Spider):
    name = 'khsaabot'
    allowed_domains = ['https://scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978']
    start_urls = ['http://https://scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/']

def parse(self, response):
    date = response.css('.mdate::text').extract()
    opponent = response.css('.opponent::text').extract()
    place = response.css('.schedule-loc::text').extract()


    for item in zip(date,opponent,place):
        scraped_info = {
            'date' : item[0],
            'opponent' : item[1],
            'place' : item[2],
        }

        yield scraped_info

अब, मुझे यकीन नहीं है कि यहां क्या गलत हो रहा है, जब मैं इसे "स्क्रैपी क्रॉल खसाबोट" का उपयोग करके टर्मिनल में चलाता हूं तो यह कोई त्रुटि नहीं देता है, और ऐसा लगता है कि यह ठीक काम कर रहा है। हालाँकि, टर्मिनल में जो हो रहा है, उसमें कोई समस्या होने पर, मैंने वह आउटपुट शामिल किया जो मुझे वहाँ मिला था:

2017-12-27 17:21:49 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: test_project)
2017-12-27 17:21:49 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'test_project', 'FEED_FORMAT': 'csv', 'FEED_URI': 'cportboys.csv', 'NEWSPIDER_MODULE': 'test_project.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['test_project.spiders']}
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.feedexport.FeedExporter',
 'scrapy.extensions.logstats.LogStats']
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-12-27 17:21:49 [scrapy.core.engine] INFO: Spider opened
2017-12-27 17:21:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-12-27 17:21:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https/robots.txt> (failed 1 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https/robots.txt> (failed 2 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://https/robots.txt> (failed 3 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://https/robots.txt>: DNS lookup failed: no results for hostname lookup: https.
Traceback (most recent call last):
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/endpoints.py", line 954, in startConnectionAttempts
    "no results for hostname lookup: {}".format(self._hostStr)
twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/> (failed 1 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/> (failed 2 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/> (failed 3 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.core.scraper] ERROR: Error downloading <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/>
Traceback (most recent call last):
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/endpoints.py", line 954, in startConnectionAttempts
    "no results for hostname lookup: {}".format(self._hostStr)
twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.core.engine] INFO: Closing spider (finished)
2017-12-27 17:21:49 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 6,
 'downloader/exception_type_count/twisted.internet.error.DNSLookupError': 6,
 'downloader/request_bytes': 1416,
 'downloader/request_count': 6,
 'downloader/request_method_count/GET': 6,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 12, 27, 23, 21, 49, 579649),
 'log_count/DEBUG': 7,
 'log_count/ERROR': 2,
 'log_count/INFO': 7,
 'memusage/max': 50790400,
 'memusage/startup': 50790400,
 'retry/count': 4,
 'retry/max_reached': 2,
 'retry/reason_count/twisted.internet.error.DNSLookupError': 4,
 'scheduler/dequeued': 3,
 'scheduler/dequeued/memory': 3,
 'scheduler/enqueued': 3,
 'scheduler/enqueued/memory': 3,
 'start_time': datetime.datetime(2017, 12, 27, 23, 21, 49, 323652)}
2017-12-27 17:21:49 [scrapy.core.engine] INFO: Spider closed (finished)

आउटपुट मेरे लिए सही दिखता है, लेकिन मैं अभी भी स्क्रेपी के लिए नया हूं इसलिए मुझे कुछ याद आ रहा है।

धन्यवाद आप सब

0
Hunter 28 पद 2017, 02:32

1 उत्तर

सबसे बढ़िया उत्तर

आपको लॉग में twisted.internet.error.DNSLookupError संदेश मिल रहे हैं। आपकी start_urls सूची को देखते हुए, आइटम "http://https://" से शुरू होता है। परिवर्तन:

start_urls = ['http://https://scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/']

प्रति:

start_urls = ['https://scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/']
5
crookedleaf 28 पद 2017, 03:21