2016-10-18 10 views
0

40個の列に100,000行のデータセットがあり、フィルタリング/削減/間引きが必要です。したがって、2014年10月10日前と20.8年後のすべての注文を削除します0.2016(私はテーブルに保存しておきたい時間スパンが1.10.2104-20.8.2016で)どのように私はこれを行うことができます(とちょうどテーブルから不要な古いデータを削除)Here's例:特定の時間間隔を持たない行を削除する

DB <- data.frame(orderID = c(1,2,3,4,5,6,7,8,9,10),  
orderDate = c("01.07.2014 05:11","12.08.2014 12:39","09.09.2015 09:14","04.10.2014 16:15","02.11.2015 07:04", "10.11.2015 16:52","20.02.2016 08:08","12.04.2016 14:07","24.07.2016 17:04","09.09.2016 06:04"), 
itemID = c(2,3,2,5,12,4,2,3,1,5), 
size = c("m", "l", 42, "xxl", "m", 42, 39, "m", "m", 44), 
color = c("green", "red", "blue", "yellow", "red", "yellow", "blue", "red", "green", "black"), 
manufacturer = c("11", "12", "13", "12", "13", "13", "12", "11", "11", "13") 
customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 1, 1) 

期待される結果:

DB <- data.frame(orderID = c(3,4,5,6,7,8,9),  
orderDate = c("09.09.2015 09:14","04.10.2014 16:15","02.11.2015 07:04", "10.11.2015 16:52","20.02.2016 08:08","12.04.2016 14:07","24.07.2016 17:04"), 
itemID = c(2,5,12,4,2,3,1), 
size = c(42, "xxl", "m", 42, 39, "m", "m"), 
color = c("blue", "yellow", "red", "yellow", "blue", "red", "green"), 
manufacturer = c("13", "12", "13", "13", "12", "11", "11") 
customerID = c(3, 1, 1, 3, 2, 2, 1) 
+0

では、[このポスト](HTTPを見て/stackoverflow.com/questions/23622338/subset-a-dataframe- between-2-dates-in-r-betterway)。また、あなたの日付をフォーマットするために、lubridateと 'dmy_hm'を使うこともできます – etienne

答えて

1

データを定義するサンプルコードにはカンマと閉じ括弧がありません。

固定した後、データ定義は、(dputによって生成される)、このように見えること:/:

structure(list(orderID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), orderDate = structure(c(1L, 
8L, 4L, 3L, 2L, 6L, 9L, 7L, 10L, 5L), .Label = c("01.07.2014 05:11", 
"02.11.2015 07:04", "04.10.2014 16:15", "09.09.2015 09:14", "09.09.2016 06:04", 
"10.11.2015 16:52", "12.04.2016 14:07", "12.08.2014 12:39", "20.02.2016 08:08", 
"24.07.2016 17:04"), class = "factor"), itemID = c(2, 3, 2, 5, 
12, 4, 2, 3, 1, 5), size = structure(c(5L, 4L, 2L, 6L, 5L, 2L, 
1L, 5L, 5L, 3L), .Label = c("39", "42", "44", "l", "m", "xxl" 
), class = "factor"), color = structure(c(3L, 4L, 2L, 5L, 4L, 
5L, 2L, 4L, 3L, 1L), .Label = c("black", "blue", "green", "red", 
"yellow"), class = "factor"), manufacturer = structure(c(1L, 
2L, 3L, 2L, 3L, 3L, 2L, 1L, 1L, 3L), .Label = c("11", "12", "13" 
), class = "factor"), customerID = c(1, 2, 3, 1, 1, 3, 2, 2, 
1, 1)), .Names = c("orderID", "orderDate", "itemID", "size", 
"color", "manufacturer", "customerID"), row.names = c(NA, -10L 
), class = "data.frame") 

そして可能な解決策は

custom_format = "%d.%m.%Y" 
date <- as.Date(substr(DB$orderDate, 1, 11), format = custom_format) 
subset(DB, date > "2014-10-01" & date < "2016-08-20") 
関連する問題