unable to scrape myntra API data using scrapy framework 307 redirect error

unable to scrape myntra API data using scrapy framework 307 redirect error

Questions : unable to scrape myntra API data using scrapy framework 307 redirect error

452

Below is the spider code:

import scrapy class MyntraSpider(scrapy.Spider): custom_settings = { 'HTTPCACHE_ENABLED': False, 'dont_redirect': True, #'handle_httpstatus_list' : [302,307], #'CRAWLERA_ENABLED': False, 'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36', } name = "heytest" allowed_domains = ["www.myntra.com"] start_urls = ["https://www.myntra.com/web/v2/search/data/duke"] def parse(self, response): self.logger.debug('Parsed jabong.com') 

“Parsed jabong.com” is not getting logged. in4codes_scrapinghub Actually, callback method(parse) is not in4codes_scrapinghub getting called. Kindly revert.

Please find Error logs from scraping hub:

See also Postman screenshot

Total Answers 1
26

Answers 1 : of unable to scrape myntra API data using scrapy framework 307 redirect error

I run this code (only few times) and I in4codes_python have no problem to get data.

It looks similar to your code so I don’t in4codes_python know why you have problem.

Maybe they block you for some reason.

#!/usr/bin/env python3 import scrapy import json class MySpider(scrapy.Spider): name = 'myspider' allowed_domains = ['www.myntra.com'] start_urls = ['https://www.myntra.com/web/v2/search/data/duke'] #def start_requests(self): # for tag in self.tags: # for page in range(self.pages): # url = self.url_template.format(tag, page) # yield scrapy.Request(url) def parse(self, response): print('url:', response.url) #print(response.body) data = json.loads(response.body) print('data.keys():', data.keys()) print('meta:', data['meta']) print("data['data']:", data['data'].keys()) # download files #for href in response.css('img::attr(href)').extract(): # url = response.urljoin(src) # yield {'file_urls': [url]} # download images and convert to JPG #for src in response.css('img::attr(src)').extract(): # url = response.urljoin(src) # yield {'image_urls': [url]} # --- it runs without project and saves in `output.csv` --- from scrapy.crawler import CrawlerProcess c = CrawlerProcess({ 'USER_AGENT': 'Mozilla/5.0', #'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36', # save in CSV or JSON 'FEED_FORMAT': 'csv', # 'json 'FEED_URI': 'output.csv', # 'output.json # download files to `FILES_STORE/full` # it needs `yield {'file_urls': [url]}` in `parse()` #'ITEM_PIPELINES': {'scrapy.pipelines.files.FilesPipeline': 1}, #'FILES_STORE': '/path/to/valid/dir', # download images and convert to JPG # it needs `yield {'image_urls': [url]}` in `parse()` #'ITEM_PIPELINES': {'scrapy.pipelines.files.ImagesPipeline': 1}, #'IMAGES_STORE': '/path/to/valid/dir', #'HTTPCACHE_ENABLED': False, #'dont_redirect': True, #'handle_httpstatus_list' : [302,307], #'CRAWLERA_ENABLED': False, }) c.crawl(MySpider) c.start() 

0