Spark Dataframe Minus Minutes Operation In Scala
How to perform minus operation on a date type or timestamp time.
Assume that you have following data set and you would like to perform minus/plus operation to the date/timestemp field.
id | cr_date
1 | 2017-03-17 11:12:00
2 | 2017-03-17 15:10:00
You first convert the field to unix timestemp and then call minus operation or plus operation and then finally convert the field to appropriate formate
df.select(from_unixtime(unix_timestamp(col("cr_dt")).minus(5 * 60), "YYYY-MM-dd HH:mm:ss"))
The result will appear as below
id | cr_date
1 | 2017-03-17 11:07:00
2 | 2017-03-17 15:05:00
There is another important point to remember while performing minus operation. For example that you have a data frame with the timestamp "2015-01-01 00:00:00" when applying:
df.select(from_unixtime(unix_timestamp(col("cr_dt")).minus(5 * 60), "YYYY-MM-dd HH:mm:ss"))
The result is "2015-12-31 23:55:00" however expected result is "2014-12-31 23:55:00". It seems that this is due to having "YYYY" as opposed to "yyyy". Making this change:
df.select(from_unixtime(unix_timestamp(col("cr_dt")).minus(5 * 60), "yyyy-MM-dd HH:mm:ss"))
Gives the result what we are looking for.