'Optimal design patterns for reading and restructurize datasets with different formats?

Problems:

  1. (I want to solve this) Homogenize image datasets with different formats: HDF5, folder images (with different structures), etc.

  2. (Just to give you context) Then, the datasets are concatenated, preprocessed by a client code and stored in a HDF5 file with a defined/fixed structure.

My solution to 1:

Use the template pattern as the following pseudo-UML shows:

pseudo-UML diagram

Noticed drawbacks of this solution to 1:

  1. Client code needs to be changed each time a new dataset comes into play because it doesn't know which ConcreteStructurizer should use for a given dataset, I mean, the client does something like that:
if datset_0 use ConcreteStructurizerFolder
ConcreteStructurizerFolder(cfg_dataset_0).reorganize()
.
.
.
if dataset_n use ConcreteStructurizerHDF5
ConcreteStructurizerHDF5(cfg_dataset_n).reorganize()

Could you propose a better/optimal approach/design pattern?

PD: I am learning software design (physics background), I'd be grateful if you could provide a pedagogical/well explained answer, thanks.



Solution 1:[1]

You can adopt chain-of-responsibility pattern to your design.

You have different generalization of Structurizer which is a good start. Your design defined what a Structurizer can perform on a certain type of dataset. Let make it more powerful by adding a logic to define the types of dataset that it can handle. Then connect all the Structurizer as a chain.

We pass a dataset to the chain, if the first Structurizer can handle it, it process it. If it cannot, it pass the dataset to next piece to handle. So on and so forth.

My example introduce a new interface DatasetHandler which add the behavior to

  1. set its next piece. setNextHandler(DatasetHander)
  2. define the types of dataset that it can handle. handle(Dataset). The boolean return indicate a dataset is successfully handled or not.

enter image description here

I am not good at PHP, but I think design pattern applies to any OO language. Below example are Java.

Interfaces


public interface Structurizer {

    public void reorganize();

    public void createDest();

    public void makeMetaDataJson();

    public void moveToFiles();

}


public interface DatasetHandler {

    public void setNextHandler(DatasetHandler handler);

    public boolean handle(Object dataset);

}

Structurizer implementation


public abstract class AbstractStructurizer implements DatasetHandler, Structurizer {

    private String datsetPath;
    private String destPath;
    private boolean overwrite;
    private String metadataFn;
    private boolean possiblyOtherAttributes;
    private DatasetHandler nextHandler;

    @Override
    public void setNextHandler(DatasetHandler handler) {
        if (getNextHandler() == null) {
            this.nextHandler = handler;
        } else {
            this.nextHandler.setNextHandler(handler);
        }
    }

    public boolean tryNextHandler(Object dataset) {
        if (getNextHandler() == null) {
            return false;
        } else {
            return getNextHandler().handle(dataset);
        }
    }

    public String getDatsetPath() {
        return datsetPath;
    }

    public void setDatsetPath(String datsetPath) {
        this.datsetPath = datsetPath;
    }

    public String getDestPath() {
        return destPath;
    }

    public void setDestPath(String destPath) {
        this.destPath = destPath;
    }

    public boolean isOverwrite() {
        return overwrite;
    }

    public void setOverwrite(boolean overwrite) {
        this.overwrite = overwrite;
    }

    public String getMetadataFn() {
        return metadataFn;
    }

    public void setMetadataFn(String metadataFn) {
        this.metadataFn = metadataFn;
    }

    public boolean isPossiblyOtherAttributes() {
        return possiblyOtherAttributes;
    }

    public void setPossiblyOtherAttributes(boolean possiblyOtherAttributes) {
        this.possiblyOtherAttributes = possiblyOtherAttributes;
    }

    public DatasetHandler getNextHandler() {
        return nextHandler;
    }

}

import java.io.File;

public class ConcreteStructurizerFolder extends AbstractStructurizer {

    @Override
    public void reorganize() {
        System.out.println("reorganizing folder dataset...");
    }

    @Override
    public void createDest() {
        System.out.println("creating folder destination...");
    }

    @Override
    public void makeMetaDataJson() {
        System.out.println("making folder metadata json...");
    }

    @Override
    public void moveToFiles() {
        System.out.println("moving folders...");
    }

    @Override
    public boolean handle(Object dataset) {
        if (dataset instanceof File) {
            File fileData = (File) dataset;
            if (fileData.isDirectory()) {
                reorganize();
                createDest();
                makeMetaDataJson();
                moveToFiles();
                return true;
            } else {
                return tryNextHandler(dataset);
            }
        } else {
            return tryNextHandler(dataset);
        }

    }

}

import java.io.File;

public class ConcreteStructurizerHDF5 extends AbstractStructurizer {

    @Override
    public boolean handle(Object dataset) {
        if (dataset instanceof File) {
            File datafile = (File) dataset;
            if (datafile.getName().toLowerCase().endsWith("hdf5")) {
                reorganize();
                createDest();
                makeMetaDataJson();
                moveToFiles();
                return true;
            } else {
                return tryNextHandler(dataset);
            }
        } else {
            return tryNextHandler(dataset);
        }
    }

    @Override
    public void reorganize() {
        System.out.println("reorganizing HDF5 dataset...");
    }

    @Override
    public void createDest() {
        System.out.println("creating HDF5 destination...");
    }

    @Override
    public void makeMetaDataJson() {
        System.out.println("making HDF5 metadata json...");
    }

    @Override
    public void moveToFiles() {
        System.out.println("moving HDF5 files...");
    }

}



public class ConcreteStructurizerUnknown extends AbstractStructurizer {

    @Override
    public boolean handle(Object dataset) {
        System.out.println(String.format("unknown dataset :%s", dataset.getClass()));
        return false;
    }

    @Override
    public void reorganize() {
    }

    @Override
    public void createDest() {
    }

    @Override
    public void makeMetaDataJson() {
    }

    @Override
    public void moveToFiles() {
    }

}

Client

import java.io.File;

public class Client {

    public static void main(String[] args) {

        // prepare handler chain
        DatasetHandler handlerChain = new ConcreteStructurizerFolder();
        handlerChain.setNextHandler(new ConcreteStructurizerHDF5());
        handlerChain.setNextHandler(new ConcreteStructurizerUnknown());

        // let handler chain handle different type of dataset
        System.out.println("==== test HDF5 dataset ====");
        handlerChain.handle(new File("dataset.HDF5"));
        System.out.println("==== test txt dataset ====");
        handlerChain.handle(new File("Untitled.txt"));
        System.out.println("==== test folder dataset ====");
        handlerChain.handle(new File("C:\\"));
        System.out.println("==== test unknown type dataset ====");
        handlerChain.handle("this is an unknown type");
    }

}

Output

==== test HDF5 dataset ====
reorganizing HDF5 dataset...
creating HDF5 destination...
making HDF5 metadata json...
moving HDF5 files...
==== test txt dataset ====
unknown dataset :class java.io.File
==== test folder dataset ====
reorganizing folder dataset...
creating folder destination...
making folder metadata json...
moving folders...
==== test unknown type dataset ====
unknown dataset :class java.lang.String

Basically what we are doing here is, we break down the n-th if case. For each if condition, we encapsulate it inside the DatasetHandler. If new dataset type is added in the future, you implement a new Structurizer for that type and add it to the chain. The client will be much more manageable without the long running if.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1