'scrapy POST request is not working, it is returning 200 but invalid response

I am trying to send a post there to this website. You can select any state and court and click on Suchen at the bottom of the page.

The request return 200 in the scrapy but the data in the table is missing.

I have tried to replace the FormRequest body with parsed data using request.replace(body=request.body.replace()) to replicate the behaviors but still the same result.

The request is not working even in the postman if I copy the curl from chrome and paster it in the postman

Please let me know if anything else is required to track the issue

 def start_requests(self):
    cookies = {'JSESSIONID': 'Y9b9BC9pTwubWIpg9A4Fzs3CadkjtlhOeUVnxfn5.node-086'}
    headers = {
        'Connection': 'keep-alive',
        'Cache-Control': 'max-age=0',
        'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"macOS"',
        'Upgrade-Insecure-Requests': '1',
        'Origin': 'https://neu.insolvenzbekanntmachungen.de',
        'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Sec-Fetch-Site': 'same-origin',
        'Sec-Fetch-Mode': 'navigate',
        'Sec-Fetch-User': '?1',
        'Sec-Fetch-Dest': 'document',
        'Referer': 'https://neu.insolvenzbekanntmachungen.de/ap/suche.jsf',
        'Accept-Language': 'en-US,en;q=0.9',
    }

    data = {
        'frm_suche': 'frm_suche',
        'frm_suche:lsom_bundesland:lsom2': '0',
        'frm_suche:lsom_gericht:lsom2': '0',
        'frm_suche:ldi_datumVon:tag': '19',
        'frm_suche:ldi_datumVon:monat': '01',
        'frm_suche:ldi_datumVon:jahr': '2022',
        'frm_suche:ldi_datumBis:tag': '02',
        'frm_suche:ldi_datumBis:monat': '02',
        'frm_suche:ldi_datumBis:jahr': '2022',
        'frm_suche:lsom_wildcard:lsom2': '0',
        'frm_suche:litx_firmaNachName:text': '',
        'frm_suche:litx_vorname:text': '',
        'frm_suche:litx_sitzWohnsitz:text': '',
        'frm_suche:iaz_aktenzeichen:itx_abteilung': '',
        'frm_suche:iaz_aktenzeichen:som_registerzeichen': '--',
        'frm_suche:iaz_aktenzeichen:itx_lfdNr': '',
        'frm_suche:iaz_aktenzeichen:som_jahr': '--',
        'frm_suche:lsom_gegenstand:lsom2': '-- Alle Gegenst\xE4nde innerhalb des Verfahrens --',
        'frm_suche:ireg_registereintrag:som_registergericht': '--',
        'frm_suche:ireg_registereintrag:som_registerart': '',
        'frm_suche:ireg_registereintrag:itx_registernummer': '',
        'frm_suche:ireg_registereintrag:ihd_validator': 'true',
        'frm_suche:cbt_suchen': 'Suchen',
        'javax.faces.ViewState': '-7754325005107570566:706046929639379693'
    }
    url_second_website = 'https://neu.insolvenzbekanntmachungen.de/ap/suche.jsf'
    yield FormRequest(url_second_website, headers=headers, cookies=cookies, formdata=data, callback=self.parse_data)


Solution 1:[1]

Scrapy handles cookies very well. The key is to start scraping the first page that set's the cookie. The rest of the things should be taken care of by Scrapy.

Also, hard coding any kind of Session ID, JSESSIONID in this case, will not work. You will need to read it dynamically. This should not be required if you start the scraping from the page that sets the cookie.

Finally, if you still can't figure out the flow of the cookies, set the COOKIE_DEBUG=True in settings.py or custom_settings.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Upendra