'How to output a nested json in Scrapy?

I am building a Scrapy project and I realised that I need to nest the json's to get desired output for further use. Until this point, I was saving the json regularly without any formatting.

[{"Title": "Cukrus RIMI, 1 kg", "Price": "0.89", "Image": "https://rimibaltic-res.cloudinary.com/image/upload/b_white,c_fit,f_auto,h_480,q_auto,w_480/d_ecommerce:backend-fallback.png/MAT_801045_PCE_LT", "Link": "https://www.rimi.lt/e-parduotuve/lt/produktai/bakaleja/cukrus-ir-saldikliai/baltasis-cukrus-/cukrus-rimi-1-kg/p/801045"},
{"Title": "Pomidorų padažas su bazilikais BARILLA, 400 g", "Price": "2.69", "Image": "https://rimibaltic-res.cloudinary.com/image/upload/b_white,c_fit,f_auto,h_480,q_auto,w_480/d_ecommerce:backend-fallback.png/MAT_106498_PCE_LT", "Link": "https://www.rimi.lt/e-parduotuve/lt/produktai/bakaleja/padazai-garstycios-krienai/padazai-maisto-ruosimui-ir-makaronams/pomidoru-padazas-su-bazilikais-barilla-400g/p/106498"},
{"Title": "Padažas makaronams RIMI su bazilikais, 390 g", "Price": "1.65", "Image": "https://rimibaltic-res.cloudinary.com/image/upload/b_white,c_fit,f_auto,h_480,q_auto,w_480/d_ecommerce:backend-fallback.png/MAT_810787_PCE_LT", "Link": "https://www.rimi.lt/e-parduotuve/lt/produktai/bakaleja/padazai-garstycios-krienai/padazai-maisto-ruosimui-ir-makaronams/padazas-makaronams-su-bazilikais-rimi-390-g/p/810787"},
{"Title": "Ekologiški raudonieji lęšiai I LOVE ECO, 400 g", "Price": "2.79", "Image": "https://rimibaltic-res.cloudinary.com/image/upload/b_white,c_fit,f_auto,h_480,q_auto,w_480/d_ecommerce:backend-fallback.png/MAT_141700_PCE_LT", "Link": "https://www.rimi.lt/e-parduotuve/lt/produktai/bakaleja/ankstiniai/lesiai/ekologiski-raudonieji-lesiai-i-love-eco-400g/p/141700"}]

But now, I am trying to make it nested, by adding the values of the shop that I am scraping at the top of the json.

Example (desired output):

{
   "shop" : {
      "sid" : 1,
      "name" : "Barbora",
      "domain" : "https://barbora.lt",
      "image_url" : ""
   },
   "products" : [
      {
         "Image" : "https://cdn.barbora.lt/products/1d747537-6760-4098-ab24-8c658d1f9491_m.png",
         "Link" : "/produktai/bananai-1-kg",
         "Price" : "€1,39",
         "Title" : "Bananai, 1 kg"
      },
      {
         "Image" : "https://cdn.barbora.lt/products/9d38e2e4-8106-4e8e-9b26-dec28b4eed96_m.png",
         "Link" : "/produktai/suris-rokiskio-ekstra-45-proc-rieb-s-m-1-kg",
         "Price" : "€8,79",
         "Title" : "Sūris ROKIŠKIO EKSTRA, 45% rieb. s. m., 1 kg"
      },...

I have tried putting the items into a list, but I get an error (it asks to return an item not a list) So now I tried combining Scrapy items to build myself a structure. There's what I've tried so far, but it does not seem to be working:

import scrapy
from pbl.items import PblSpider
from pbl.items import ShopCard

SHOP_ID = 1
SHOP_NAME = 'Asorti'
shop = ShopCard()

shop['id'] = SHOP_ID
shop['name'] = SHOP_NAME
shop['domain'] = 'https://www.assorti.lt'
#shop['imageurl'] = response.xpath()

class SpiderasortiSpider(scrapy.Spider):
    name = 'spiderAsorti'
    allowed_domains = ['www.assorti.lt']
    start_urls = ['https://www.assorti.lt/katalogas/maistas/']

    def __init__(self):
        self.declare_xpath()

    def declare_xpath(self):
        self.getAllItemsXpath = '//*[@id="products_wrapper"]/div[2]/div/a/@href'
        self.TitleXpath  = '//*[@id="products_detailed"]/div[1]/div/div/div[2]/h1/text()'
        self.ImageXpath = '//*[@id="products_photos"]/div[1]/img/@src'
        self.PriceXpath = '//*[@id="products_add2cart"]/form/div/div[1]/div/div[1]/span/span[1]/text()'

    def parse(self, response):

        for href in response.xpath(self.getAllItemsXpath):
            url = response.urljoin(href.extract())
            yield scrapy.Request(url,callback=self.parse_item)

        next_page = response.xpath('//*[@id="products_wrapper"]/div[3]/div[2]/ul/li/a[contains(@class, "pagination_link")]/@href').extract()
        if next_page[1] is not '#':
            print('-' * 70)
            print(next_page[1])
            print('-' * 70)
            url = response.urljoin(next_page[1])
            yield scrapy.Request(url, callback=self.parse)

    def parse_item(self, response):
        shop = ShopCard()
        
        shop['product'] = PblSpider()

        Title = response.xpath(self.TitleXpath).extract_first()
        Link = response.url
        Image = response.xpath(self.ImageXpath).extract_first()
        Price = response.xpath(self.PriceXpath).extract_first()

        shop['product']['Title'] = Title
        shop['product']['Link'] = Link
        shop['product']['Image'] = Image
        shop['product']['Price'] = Price

        return shop

What am I doing wrong, and is there another way to build nested json files in Scrapy, or is it only capable of doing non-indented output like the first example?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source