logo
down
shadow

Creating a unified time-series, with dates coming from different (natural) languages


Creating a unified time-series, with dates coming from different (natural) languages

By : meste
Date : November 22 2020, 09:00 AM
To fix this issue Create a table tab that contains all the translations and then use subscripting to actually do the translation. The code below seems to work for me on Windows provided your input abbreviations are the same as the standard ones generated but the precise language names ("German", etc.) may vary depending on your system. See ?Sys.setlocale for more information. Also if the abbreviations in your input are different than the ones generated here you will have to add those to tab yourself, e.g. tab <- c(tab, Juli = "Jul")
code :
langs <- c("French", "German", "English")
tab <- unlist(lapply(langs, function(lang) {
  Sys.setlocale("LC_TIME", lang)
  nms <- format(ISOdate(2000, 1:12, 1), "%b")
  setNames(month.abb, nms)
}))

x <- c("18:00 - 10 Juli 2014", "18:00 - 10 Mai 2014") # test input

source_month <- gsub("[^[:alpha:]]", "", x)
mapply(sub, source_month, tab[source_month], x, USE.NAMES = FALSE)
[1] "18:00 - 10 Jul 2014" "18:00 - 10 May 2014"


Share : facebook icon twitter icon
Which natural languages are supported by Google Cloud Natural Language API?

Which natural languages are supported by Google Cloud Natural Language API?


By : Manoj Patil
Date : March 29 2020, 07:55 AM
wish of those help Your discovery is correct. On this page it states "The Cloud Natural Language API currently supports English, Spanish, and Japanese for sentiment analysis, entity analysis, and syntax analysis."
The "Google Cloud Natural Language API" web page states under multi-lingual support: "Combine the API with our Google Cloud Speech API and extract insights from audio conversations". The languages supported by the "Google Cloud Speech API" service are listed here. Russian, Polish, and Italian are supported.
Build Time Series of Natural Gas Forward Price Strips with Pandas

Build Time Series of Natural Gas Forward Price Strips with Pandas


By : ThomasECH
Date : March 29 2020, 07:55 AM
I hope this helps . You can do what you want by just creating your season label and then using pd.pivot_table() with the mean for the aggregation function.
code :
import numpy as np
import pandas as pd

conds = [df.Month.dt.month<=3, df.Month.dt.month.between(4,10), df.Month.dt.month > 10]
choices = [(df.Month.dt.year-1).astype(str).str[2:] + df.Month.dt.year.astype(str).str[2:],
           df.Month.dt.year.astype(str).str[2:],
           (df.Month.dt.year).astype(str).str[2:] + (df.Month.dt.year+1).astype(str).str[2:]]

df['syear'] = np.select(conds, choices)
df['Season'] =  df.Month.dt.month.between(4,10).map({False: 'XH', True: 'JV'}) + df.syear
print(df.head(7))
#  Location      Month       Date  Price syear  Season
#0        a 2017-11-01  11/1/2017      1  1718  XH1718
#1        a 2017-12-01  11/1/2017      1  1718  XH1718
#2        a 2018-01-01  11/1/2017      1  1718  XH1718
#3        a 2018-02-01  11/1/2017      1  1718  XH1718
#4        a 2018-03-01  11/1/2017      1  1718  XH1718
#5        a 2018-04-01  11/1/2017      2    18    JV18
#6        a 2018-05-01  11/1/2017      2    18    JV18
df2 = pd.pivot_table(df, index=['Date'], columns=['Location', 'Season'], 
                     values='Price', aggfunc='mean')
df2.index.name=None
Location     a                  b              
Season    JV18 XH1718 XH1819 JV18 XH1718 XH1819
11/1/2017    2      1      3    5      4      6
11/2/2017    8      7      9   11     10     12
R - Sample consecutive series of dates in time series without replacement?

R - Sample consecutive series of dates in time series without replacement?


By : MartinW
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , This is tricky, as you anticipated, because of the requirement of sampling without replacement. I have a working solution below which achieves a random sample and works fast on a problem of the scale given in your toy example. It should also be fine with more observations, but will get really really slow if you need to pick a lot of points relative to the sample size.
The basic premise is to pick n=10 points, generate the 10 vectors from these points forwards, and if the vectors overlap ditch them and pick again. This is simple and works fine given that 10*n << nrow(df). If you wanted to get 15 subvectors out of your 200 observations this would be a good deal slower.
code :
library(tidyverse)
library(lubridate)

date_data <- tibble(dates = c(seq(ymd("2015-03-22"),
                                  ymd("2015-07-03"),
                                  by = "days"),
                              seq(ymd("2015-08-09"),
                                  ymd("2015-10-01"),
                                  by = "days"),
                              seq(ymd("2015-11-12"),
                                  ymd("2016-01-03"),
                                  by = "days")),
                    sample_id = 0L)

# A function that picks n indices, projects them forward 10,
# and if any of the segments overlap resamples
pick_n_vec <- function(df, n = 10, out = 10) {
  points <- sample(nrow(df) - (out - 1), n, replace = F)
  vecs <- lapply(points, function(i){i:(i+(out - 1))})

  while (max(table(unlist(vecs))) > 1) {
    points <- sample(nrow(df) - (out - 1), n, replace = F)
    vecs <- lapply(points, function(i){i:(i+(out - 1))})
  }

  vecs
 }

# demonstrate
set.seed(42)
indices <- pick_n_vec(date_data)

for (i in 1:10) {
  date_data$sample_id[indices[[i]]] <- i
}

date_data[indices[[1]], ]
#> # A tibble: 10 x 2
#>         dates sample_id
#>        <date>     <int>
#>  1 2015-05-31         1
#>  2 2015-06-01         1
#>  3 2015-06-02         1
#>  4 2015-06-03         1
#>  5 2015-06-04         1
#>  6 2015-06-05         1
#>  7 2015-06-06         1
#>  8 2015-06-07         1
#>  9 2015-06-08         1
#> 10 2015-06-09         1
table(date_data$sample_id)
#> 
#>   0   1   2   3   4   5   6   7   8   9  10 
#> 111  10  10  10  10  10  10  10  10  10  10
pick_n_vec2 <- function(df, n = 10, out = 10) {
  points <- sample(nrow(df) - (out - 1), n, replace = F)
  while (min(diff(sort(points))) < 10) {
    points <- sample(nrow(df) - (out - 1), n, replace = F)
  }
  lapply(points, function(i){i:(i+(out - 1))})
}
How to use Pandas Series to plot two Time Series of different lengths/starting dates?

How to use Pandas Series to plot two Time Series of different lengths/starting dates?


By : cshwhale
Date : March 29 2020, 07:55 AM
Hope that helps I really don't get where you're having problems. I tried to recreate a piece of the dataframe, and it plotted with no problems.
Split hourly time-series in pandas DataFrame into specific dates and all other dates

Split hourly time-series in pandas DataFrame into specific dates and all other dates


By : maria
Date : October 03 2020, 06:00 PM
wish helps you Create boolean mask by numpy.in1d with converted dates to strings or Index.isin for test membership:
code :
m = np.in1d(df.index.date.astype(str), dates)
m = df.index.to_series().dt.date.astype(str).isin(dates)
m = df.index.strftime('%Y-%m-%d').isin(dates)
m = df.index.normalize().isin(dates)
#alternative
#m = df.index.floor('d').isin(dates)
df1 = df[m]
df2 = df[~m]
print (df1)
                        value
2018-03-20 00:00:00  0.348010
2018-03-20 01:00:00  0.406394
2018-03-20 02:00:00  0.944569
2018-03-20 03:00:00  0.425583
2018-03-20 04:00:00  0.586190
                      ...
2018-07-14 19:00:00  0.710710
2018-07-14 20:00:00  0.403660
2018-07-14 21:00:00  0.949572
2018-07-14 22:00:00  0.629871
2018-07-14 23:00:00  0.363081

[72 rows x 1 columns]
Related Posts Related Posts :
  • Remove same columns from left_join
  • Grouped barplot with R
  • Strange addTaskCallback work in RStudio
  • Make dual X-axs based on different variables using ggvis
  • Generate a normal distribution within certain limits in R
  • from data table, randomly select one row per group
  • ngrams not in correct order
  • ggplot: plot title and plot overlap each other
  • Display groups with different borders in histogram with panel.superpose
  • Submit form with no submit button in rvest
  • Can't download .csv from dropbox
  • How to get Adjusted R-Square value as variable?
  • Check row-wise if element exists in comma-separated column with position
  • How to know which nodes in a graph are removable
  • Generate all possible permutations of a binary matrix
  • R, Windows and foreign language characters
  • Datapusher with missing values and mixed (srting/numeric) data
  • Adding greek characters with variables to axis title
  • R - Subsetting subsets of variable names in loops
  • R dynamically naming a data frame to be used in ggplot2
  • Optimize/vectorize a loop in R that generates randoms from ranges in input vectors?
  • How to sort groups within sorted groups?
  • How to get randomForest model output in probability using Caret?
  • how to plot a data frame into R
  • Calculate mean of a range of rows
  • How to sum the values in a text file?
  • Looking up polygons in a shapefile that point belong in... why does it work with some shapefiles, not with others?
  • R creating a list using variable columns from df
  • Determine programmatically if a function call refers to your own function or a package's (or base R's) function?
  • Adding Special Characters to a plot in R
  • How to dynamically change R .libPaths() based on hostname?
  • Print arguments of a function in R
  • Creating Variables via A loop in r
  • How to randomly divide an integer into a fixed number of integers, such that the obtained tuples are uniformly distribut
  • Remove columns using column name based on levels of factors
  • What units are the 'width = ' in geom_bar(aes = ) and position_dodge(width = ) rendered in?
  • Assigning Dates to Fiscal Year
  • conditional removal from data frame
  • Getting variables out of a function in R
  • Bar colors in ggplot2 geom_bar not applied correctly (R)
  • Create a vector in a dataframe based on matching a second vector to a row name in another object
  • Convert julian day to mm/dd and add year from another column in dataframe in r
  • Change SMOTE parameters inside CARET k-fold cross-validation classification
  • rollapply classes each segment the same
  • rename the header of the spliting group in r
  • Summary stats by factor level for multiple variables
  • Processing Rmarkdown inside chunks
  • R: how to join the duplicate rows in one dataframe
  • R error: attempt to apply non-function , randomForest , rfImpute
  • assign multiple color to each vertex in igraph
  • How do I add shading and color to the confidence intervals in ggplot 2 generated Kaplan-Meier plot?
  • How to generate spatial points with a pattern
  • ggplot2 + stat_contour variable binwidth
  • Add fake data to a data frame based on variable condition
  • Utilizing a character vector inside a function
  • match fundction with data frames that are differently constructed
  • Overlaying plots with a horizontal date in R
  • lapply'ing a for loop returns NULL
  • Dynamically add UI elements and gather their input in a dataframe in shiny
  • All the possible outcomes
  • shadow
    Privacy Policy - Terms - Contact Us © animezone.co