logo
down
shadow

from data table, randomly select one row per group


from data table, randomly select one row per group

By : Tuncai
Date : December 01 2020, 05:00 PM
around this issue OP provided only a single column in the example. Assuming that there are multiple columns in the original dataset, we group by 'z', sample 1 row from the sequence of rows per group, get the row index (.I), extract the column with the row index ($V1) and use that to subset the rows of 'dt'.
code :
dt[dt[ , .I[sample(.N,1)] , by = z]$V1]


Share : facebook icon twitter icon
Randomly select a group of doubles

Randomly select a group of doubles


By : benyamin .p.m
Date : March 29 2020, 07:55 AM
this will help I'd use a Collection of Double, say an ArrayList rather than an array of double, and then call Collections.shuffle(myArrayList) on the ArrayList. Bingo, randomized. To get 10 of these values, simply call remove(0) on the list (after checking to make sure that the size is not 0).
This is not too different from shuffling a deck of cards and selecting 10 random cards.
randomly select a fixed number of rows in each group in SQL server table

randomly select a fixed number of rows in each group in SQL server table


By : Murat Ipek
Date : March 29 2020, 07:55 AM
like below fixes the issue You can use ROW_NUMBER and use NEWID() to generate a random ORDER:
EDIT: I replaced CHECKSUM(NEWID()) with NEWID() since I cannot prove which is faster and NEWID() is I think the most used.
code :
WITH CTE AS(
    SELECT *,
        RN = ROW_NUMBER() OVER(PARTITION BY id ORDER BY NEWID())
    FROM tbl
)
SELECT
    id, value1, value2
FROM Cte
WHERE RN <= 2
INSERT INTO yourNewTable(id, value1, value2)
    SELECT
        id, value1, value2
    FROM (
        SELECT *,
            RN = ROW_NUMBER() OVER(PARTITION BY id ORDER BY NEWID())
        FROM tbl
    )t
    WHERE RN <= 2
Flag randomly selected N rows by group in data.table

Flag randomly selected N rows by group in data.table


By : toli
Date : March 29 2020, 07:55 AM
like below fixes the issue At the data.table in column C3 I want to flag N randomly selected rows by each group (C1). There are several similar questions have already been asked on SO here, here and here. But based on the answers still cannot figure out how to find a solution for my task.
code :
dt[, C3 := 1:.N %in% sample(.N, min(.N, 2)), by = C1]
dt[, C3 := 1:.N %in% head(sample(.N), 2) , by = C1]
flagsz <- c(2, 1, 2, 3)
dt[, C3 := 1:.N %in% sample(.N, min(.N, flagsz[.GRP])), by = C1]
randomly ordering across groups (not within group) in data.table

randomly ordering across groups (not within group) in data.table


By : wayne
Date : March 29 2020, 07:55 AM
may help you . Let's say I want to order the iris dataset (as a data.table) by Species, keeping observations grouped by species and randomly ordering across species. , Alternatively you could do:
code :
e <- d[, .N, Species]
e[, g2 := runif(.N)]
d <- e[, .(Species, g2)][d, on = 'Species']
Randomly select a row from each group using pandas

Randomly select a row from each group using pandas


By : user1840460
Date : March 29 2020, 07:55 AM
it helps some times I have a pandas dataframe df which appears as following: , Use groupby with apply to select a row at random per group.
code :
np.random.seed(0)
df.groupby(['Month', 'Day'])['mnthShape'].apply(np.random.choice).reset_index()

   Month  Day  mnthShape
0      1    1   1.016754
1      1    2   0.963912
2      1    3   1.099451
np.random.seed(0)
(df.groupby(['Month', 'Day'])['mnthShape']
   .apply(pd.Series.sample, n=1)
   .reset_index(level=[0, 1]))

   Month  Day  mnthShape
2      1    1   0.963912
3      1    2   1.016754
6      1    3   1.016754
Related Posts Related Posts :
  • Fill in missing data that is same across columns
  • Remove same columns from left_join
  • Grouped barplot with R
  • Strange addTaskCallback work in RStudio
  • Make dual X-axs based on different variables using ggvis
  • Generate a normal distribution within certain limits in R
  • ngrams not in correct order
  • ggplot: plot title and plot overlap each other
  • Display groups with different borders in histogram with panel.superpose
  • Submit form with no submit button in rvest
  • Can't download .csv from dropbox
  • How to get Adjusted R-Square value as variable?
  • Check row-wise if element exists in comma-separated column with position
  • How to know which nodes in a graph are removable
  • Generate all possible permutations of a binary matrix
  • R, Windows and foreign language characters
  • Datapusher with missing values and mixed (srting/numeric) data
  • Adding greek characters with variables to axis title
  • R - Subsetting subsets of variable names in loops
  • R dynamically naming a data frame to be used in ggplot2
  • Optimize/vectorize a loop in R that generates randoms from ranges in input vectors?
  • How to sort groups within sorted groups?
  • How to get randomForest model output in probability using Caret?
  • how to plot a data frame into R
  • Calculate mean of a range of rows
  • How to sum the values in a text file?
  • Looking up polygons in a shapefile that point belong in... why does it work with some shapefiles, not with others?
  • R creating a list using variable columns from df
  • Determine programmatically if a function call refers to your own function or a package's (or base R's) function?
  • Adding Special Characters to a plot in R
  • How to dynamically change R .libPaths() based on hostname?
  • Print arguments of a function in R
  • Creating Variables via A loop in r
  • How to randomly divide an integer into a fixed number of integers, such that the obtained tuples are uniformly distribut
  • Remove columns using column name based on levels of factors
  • What units are the 'width = ' in geom_bar(aes = ) and position_dodge(width = ) rendered in?
  • Assigning Dates to Fiscal Year
  • conditional removal from data frame
  • Getting variables out of a function in R
  • Bar colors in ggplot2 geom_bar not applied correctly (R)
  • Create a vector in a dataframe based on matching a second vector to a row name in another object
  • Convert julian day to mm/dd and add year from another column in dataframe in r
  • Change SMOTE parameters inside CARET k-fold cross-validation classification
  • rollapply classes each segment the same
  • rename the header of the spliting group in r
  • Summary stats by factor level for multiple variables
  • Processing Rmarkdown inside chunks
  • R: how to join the duplicate rows in one dataframe
  • Creating a unified time-series, with dates coming from different (natural) languages
  • R error: attempt to apply non-function , randomForest , rfImpute
  • assign multiple color to each vertex in igraph
  • How do I add shading and color to the confidence intervals in ggplot 2 generated Kaplan-Meier plot?
  • How to generate spatial points with a pattern
  • ggplot2 + stat_contour variable binwidth
  • Add fake data to a data frame based on variable condition
  • Utilizing a character vector inside a function
  • match fundction with data frames that are differently constructed
  • Overlaying plots with a horizontal date in R
  • lapply'ing a for loop returns NULL
  • Dynamically add UI elements and gather their input in a dataframe in shiny
  • shadow
    Privacy Policy - Terms - Contact Us © animezone.co