'Pyspark: Dynamically update columns position of a dataframe according to other dataframe

I have a requirement to change column positions frequently. instead of changing the code i created a temporary dataframe Index_df. here i will update the column positions and it should reflect on actual dataframe on which the changes should perform.

sample_df

F_cDc,F_NHY,F_XUI,F_NMY,P_cDc,P_NHY,P_XUI,P_NMY
415    258   854   245   478   278   874   235
405    197   234   456   567   188   108   267
315    458   054   375   898   978   677   134

Index_df

   col   position
    F_cDc,1 
    F_NHY,3
    F_XUI,5
    F_NMY,7
    P_cDc,2 
    P_NHY,4
    P_XUI,6
    P_NMY,8

here according to the index_df,sample_df should change.

Expected output:

F_cDc,P_cDc,F_NHY,P_NHY,F_XUI,P_XUI,F_NMY,P_NMY
415    478   258   278   854   874   245   235
405    567   197   188   234   108   456   267
315    898   458   978   054   677   375   134

here column positions are changed according to the positions i have updated in Index_df

I could do sample_df.select("<column order>") but i have more than 70 columns. Technically which is not a best way to deal.



Solution 1:[1]

Add external jars to local .m2 (for local development)

This approach is not distributable. It assumes just putting jars to local .m2 and nothing more.

For adding jars to local maven repository:

mvn install:install-file -Dfile=<path-to-file> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version>

<path-to-file> - is path to jar

Create a new maven repo and distribute it within a project

This approach assumes creating a new maven repository, which will include only external jars. Then this repository placed to the project root, added to git and referented by the project pom.

So anyone, who will download the project will have all maven dependencies in place without extra actions.

Adding jar to the new maven repo:

mvn deploy:deploy-file -Dfile=<path-to-file> -DgroupId=<group-id> -DartifactId=<artifact-id> -Dversion=<version> -Dpackaging=jar -Durl=file:./.m2/repository/ -DrepositoryId=project-internal -DupdateReleaseInfo=true

Then reference the repo in your pom:

<repositories>
    ...
    <repository>
        <id>project-internal</id>
        <url>file:///${project.basedir}/.m2/repository</url>
    </repository>
</repositories>

Reference

https://www.google.com/amp/s/roufid.com/3-ways-to-add-local-jar-to-maven-project/amp/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1