'Pyspark Increment the timestamp column based on row_number value

I am pulling the data from eventhub and I am getting 10 records in each packet and a timestamp is coming on each packet. I want to explode the packet consisting of 10 records and I want to add the packet timestamp to each record incrementing by 1 sec when partitioned by EnqueuedTimeUtc and vehicleid

Below is the intermediate data that I have in the dataframe.


df.show()

+-------------------+---------------+-------------------+
|    EnqueuedTimeUtc|      vehicleid|   datetime_pkt    |
+-------------------+---------------+-------------------+
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|
+-------------------+---------------+-------------------+

expected output

+-------------------+---------------+-------------------+-------------------+
|    EnqueuedTimeUtc|      vehicleid|   datetime_pkt    | nw_datetime_pkt   |
+-------------------+---------------+-------------------+-------------------+
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:19:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:20:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:21:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:22:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:23:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:24:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:25:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:26:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:27:43|
|5/1/2022 7:19:46 AM|86135903910    |2022-05-01 07:19:43|2022-05-01 07:28:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:19:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:20:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:21:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:22:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:23:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:24:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:25:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:26:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:27:43|
|5/1/2022 7:19:49 AM|86135903910    |2022-05-01 07:19:48|2022-05-01 07:28:43|
+-------------------+---------------+-------------------+-------------------+


Solution 1:[1]

Firstly, It looks like your first overload is unnecessary, as you're saying that when the id is a string or a number, the returned object is either Partial<O>, or a full O. O will always be valid when mapped to a Partial<O> type, so you can just say its type is Partial<O>.

With regards to the inference, if you let TypeScript infer the type it will use your input to infer the output of the function. What it seems you are asking for is that the function's second argument MUST be of type Photo, which isn't inference, and can't be inferred unless you pass it a variable that is of type Photo already. For TS to infer, you'd need to replace your last line with something like:

const myPhoto: Photo { id: 4, name: 'my-photo' };
const extendedPhotos = extendWith<Photo>(photos, myPhoto);

So that TypeScript can use the information from the input values to infer the output value.

Solution 2:[2]

This can be done (though this example will need extending):

type Index = string | number;
type ObjectWithIdIndex = { id: Index };

function extendWith<O extends Record<Index, ObjectWithIdIndex>>(obj: O, v: O[keyof O]): O {
    (obj as Record<Index, ObjectWithIdIndex>)[v.id] = v;

    return obj;
}

The types of these parameters can be reduced to:

  • requiring that obj is an object with values that have id properties. We can constrain this by only accepting types that extend Record<Index, ObjectWithIdIndex>. So the following should give a type error:
extendWith(null, { id: 3, name: "photo-4" }); 
extendWith({ abc: 1 }, { id: 3, name: "photo-4" });
  • requiring that v is of the same type as of the values of obj. We can constrain this with O[keyof O]. As keyOf is a union of the properties of obj, O[keyof O] is a union of the values of these properties. The following should also give a type error:
interface Photo { id: number; name: string; }

const photos: Record<number, Photo> = {
  1: { id: 1, name: "photo-1" },
  2: { id: 2, name: "photo-2" }
};

extendWith(photos, { id: 4, name: "photo-4", abc: 2 });
extendWith(photos, { id: 4, name: 1 });
extendWith(photos, { name: "abc" });
extendWith(photos, null);

When calling extendWith, typescript will infer a more specific type for obj than Record<Index, ObjectWithIdIndex>. This has the consequence that:

  • We can use this type to infer the types of obj's values and then constrain v.
  • We can no longer assign new properties to obj as typescript doesn't know if obj is still an extensible Record type (eg. { a: 1 } is a subtype of Record<string, number>, but new properties can't be assigned to it). We can however cast obj back to its more generic type and then extend it.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tom
Solution 2 c-richard