logo
down
shadow

h2o: iterate through rows


h2o: iterate through rows

By : kicc
Date : November 22 2020, 02:42 PM
Hope this helps You can do a row-wise apply. iris.apply(foo,1)
Where foo is some lambda that h2o understands (there are some limits on what can go in there, but all basic math ops should work fine).
code :


Share : facebook icon twitter icon
Iterate through rows in pandas dateframe in a cleaner way using .iterrows() and keep track of rows inbetween specific va

Iterate through rows in pandas dateframe in a cleaner way using .iterrows() and keep track of rows inbetween specific va


By : Prakash Sharma
Date : March 29 2020, 07:55 AM
Does that help I've got a pandas dataframe in python 2.7 and I want to iterate through the rows and get the time between two types of events as well as the count of other types of events in between (given certain conditions). , You could remove the rows where Var1 equals 0 using:
code :
df = df.loc[df['Var1'] != 0]
mask = df['EvntType']==1
# 0     False
# 1      True
# ...
# 9      True
# 10    False
# Name: EvntType, dtype: bool
times = df.loc[mask, 'Time']
# 1    19
# 7    31
# 9    36
# Name: Time, dtype: int64
idx = np.flatnonzero(mask)
# array([1, 6, 8])
In [56]: times[:-1]
Out[56]: 
1    19
7    31
Name: Time, dtype: int64
In [55]: np.diff(times)
Out[55]: array([12,  5])
In [57]: np.diff(idx)-1
Out[57]: array([4, 1])
import numpy as np
import pandas as pd

df = pd.DataFrame({'EvntType': [2, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2],
                   'Time': [15, 19, 21, 23, 25, 26, 28, 31, 33, 36, 39],
                   'Var1': [1, 1, 6, 3, 0, 2, 3, 5, 1, 5, 1],
                   'Var2': [17, 45, 43, 65, 76, 35, 25, 16, 25, 36, 21]})

# Remove rows where Var1 equals 0
df = df.loc[df['Var1'] != 0]

mask = df['EvntType']==1
times = df.loc[mask, 'Time']
idx = np.flatnonzero(mask)

result = pd.DataFrame(
    {'start_time': times[:-1],
     'time_inbetween': np.diff(times),
     'event_count': np.diff(idx)-1})

print(result)
   event_count  start_time  time_inbetween
1            4          19              12
7            1          31               5
Excel Interop iterate through rows and conditionally delete entire row missing some rows

Excel Interop iterate through rows and conditionally delete entire row missing some rows


By : Nick Furfaro
Date : March 29 2020, 07:55 AM
fixed the issue. Will look into that further Pretty sure Tim Williams has answered your question, but I will add a little clarification on the matter.
When you are running a for-loop through the list of rows from top to bottom, say, you are at row 10, index is at 10 and it fits your criteria to be deleted, it will delete row 10, then jump to row at the index = index + 1.
Excel iterate through rows in 2 sheets and capturing the data on specific rows

Excel iterate through rows in 2 sheets and capturing the data on specific rows


By : user739576
Date : March 29 2020, 07:55 AM
will help you I tried to write some code to achieve your task, but I really don't fully understand your question:
code :
Sub myFunc()

    Dim i As Integer
    Dim j As Integer
    Dim myValue As String
    i = 1
    j = 1

    Do While Worksheets("myFirstSheet").Cells(i, 1).value <> "" 'I assume that when a cell is empty then we reached the end of your table
        myValue = Worksheets("myFirstSheet").Cells(i, 1).value 'I save the value of the cell in a variable

        Do While Worksheets("mySecondSheet").Cells(j, 1).value <> "" 'I assume that when a cell is empty then we reached the end of your table
            If Worksheets("mySecondSheet").Cells(j, 1).value = myValue Then 'if that value equals one of the cells of your second sheet...
                doSomethingFunc 'we have to write it
            End If
            j = j + 1 'go to the next row in your second sheet
        Loop

        i = i + 1 'go to the next row in your first sheet
        j = 1 'now we are going to iterate the next row of the first sheet, so we want to reset the position in our second sheet
    Loop

End Sub
How to iterate grouped rows to produce multiple rows in spark structured streaming?

How to iterate grouped rows to produce multiple rows in spark structured streaming?


By : user2243601
Date : March 29 2020, 07:55 AM
this will help You can use flatMap operation on the dataframe and generate required rows based on the conditions that you mentioned. Check this out
code :
scala> val df = Seq((1,null,1),(1,"discard",0),(2,null,1),(2,null,2),(2,"max",0),(3,null,1),(3,null,1),(3,"list",0)).toDF("id","operation","value")
df: org.apache.spark.sql.DataFrame = [id: int, operation: string ... 1 more field]

scala> df.show(false)
+---+---------+-----+
|id |operation|value|
+---+---------+-----+
|1  |null     |1    |
|1  |discard  |0    |
|2  |null     |1    |
|2  |null     |2    |
|2  |max      |0    |
|3  |null     |1    |
|3  |null     |1    |
|3  |list     |0    |
+---+---------+-----+


scala> df.filter("operation is not null").flatMap( r=> { val x=r.getString(1); val s = x match { case "discard" => (0,0) case "max" => (1,2) case "list" => (2,1) } ; (0
 until s._1).map( i => (r.getInt(0),null,s._2) ) }).show(false)
+---+----+---+
|_1 |_2  |_3 |
+---+----+---+
|2  |null|2  |
|3  |null|1  |
|3  |null|1  |
+---+----+---+
scala> val df2 = df.filter("operation is not null").flatMap( r=> { val x=r.getString(1); val s = x match { case "discard" => (0,0) case "max" => (1,2) case "list" => (2,1) } ; (0 until s._1).map( i => (r.getInt(0),null,s._2) ) }).toDF("id","operation","value")
df2: org.apache.spark.sql.DataFrame = [id: int, operation: null ... 1 more field]

scala> df2.show(false)
+---+---------+-----+
|id |operation|value|
+---+---------+-----+
|2  |null     |2    |
|3  |null     |1    |
|3  |null     |1    |
+---+---------+-----+


scala>
scala> val df =   Seq((0,null,1),(0,"discard",0),(1,null,1),(1,null,2),(1,"max",0),(2,null,1),(2,null,3),(2,"max",0),(3,null,1),(3,null,1),(3,"list",0)).toDF("id","operation","value")
df: org.apache.spark.sql.DataFrame = [id: int, operation: string ... 1 more field]

scala> df.createOrReplaceTempView("michael")

scala> val df2 = spark.sql(""" select *, max(value) over(partition by id) mx from michael """)
df2: org.apache.spark.sql.DataFrame = [id: int, operation: string ... 2 more fields]

scala> df2.show(false)
+---+---------+-----+---+
|id |operation|value|mx |
+---+---------+-----+---+
|1  |null     |1    |2  |
|1  |null     |2    |2  |
|1  |max      |0    |2  |
|3  |null     |1    |1  |
|3  |null     |1    |1  |
|3  |list     |0    |1  |
|2  |null     |1    |3  |
|2  |null     |3    |3  |
|2  |max      |0    |3  |
|0  |null     |1    |1  |
|0  |discard  |0    |1  |
+---+---------+-----+---+


scala> val df3 = df2.filter("operation is not null").flatMap( r=> { val x=r.getString(1); val s = x match { case "discard" => 0 case "max" => 1 case "list" => 2 } ; (0 until s).map( i => (r.getInt(0),null,r.getInt(3) )) }).toDF("id","operation","value")
df3: org.apache.spark.sql.DataFrame = [id: int, operation: null ... 1 more field]


scala> df3.show(false)
+---+---------+-----+
|id |operation|value|
+---+---------+-----+
|1  |null     |2    |
|3  |null     |1    |
|3  |null     |1    |
|2  |null     |3    |
+---+---------+-----+


scala>
How to use SQL Server to iterate through similar rows with exists operator and output rows that meet certain running tot

How to use SQL Server to iterate through similar rows with exists operator and output rows that meet certain running tot


By : user3076797
Date : March 29 2020, 07:55 AM
With these it helps Without temp tables here's an option which would require a couple sub-queries.
You'll need LAG and DATEDIFF to get the number of minutes between tran_date and the previous row. Then basically a running total on the evaluation of whether or not the time difference is greater than 30. So you know when to reset your numbering. Then you can use ROW_NUMBER() and SUM() partitioning it by the acct and your "reset" indicator.
code :
DECLARE @TestData TABLE
    (
        [tran_date] DATETIME
      , [acct] INT
      , [amt] INT
    );

INSERT INTO @TestData
VALUES ( '2019-07-01 01:21:08', 1, 100 )
     , ( '2019-07-01 01:30:50', 1, 200 )
     , ( '2019-07-01 01:46:21', 1, 150 )
     , ( '2019-07-01 03:23:41', 1, 50 )
     , ( '2019-07-01 03:24:40', 1, 300 )
     , ( '2019-07-01 09:53:28', 2, 400 )
     , ( '2019-07-01 12:56:15', 2, 50 )
     , ( '2019-07-01 17:43:55', 2, 500 )
     , ( '2019-07-01 05:15:54', 3, 20 )
     , ( '2019-07-01 05:30:00', 3, 50 )
     , ( '2019-07-01 05:36:27', 3, 10 )
     , ( '2019-07-01 05:59:00', 3, 250 )
     , ( '2019-07-01 06:18:00', 3, 80 )
     , ( '2019-07-01 06:25:56', 3, 100 )
     , ( '2019-07-01 09:34:34', 4, 150 )
     , ( '2019-07-01 09:47:24', 4, 300 )
     , ( '2019-07-01 09:52:25', 4, 50 )
     , ( '2019-07-01 11:34:34', 4, 250 )
     , ( '2019-07-01 11:47:24', 4, 100 )
     , ( '2019-07-01 11:52:25', 4, 150 );


--Read comments from inner most sub query out

SELECT   [b].[tran_date]
        , [b].[acct]
        , [b].[amt]
        , ROW_NUMBER() OVER ( PARTITION BY [b].[acct], [b].[diffincrement] ORDER BY [b].[tran_date]) AS [RN]  --Third:  We can now partition on our acct and "reset" indicator(diffincrement) to get our row number.
        , [b].[Time_Diff]
        , SUM([b].[amt]) OVER ( PARTITION BY [b].[acct], [b].[diffincrement] ORDER BY [b].[tran_date]) AS [Running_Total] --Third:  We can now partition on our acct and "reset" indicator(diffincrement) to get our running total.
FROM     (   --Second:  Here we now evalute Time_Diff and sum to basically give a running total so we know when to reset based on that.
             SELECT *
                  , SUM(CASE WHEN [a].[Time_Diff] >= 30 THEN 1 ELSE 0 END
                       ) OVER ( PARTITION BY [a].[acct] ORDER BY [a].[tran_date]) AS [diffincrement]
                 FROM   (
                            --First:  Here we use LAG and datediff to find the different in minutes of the previous row.
                            SELECT *
                                 , DATEDIFF(MINUTE, LAG([tran_date], 1, [tran_date]) OVER ( PARTITION BY [acct] ORDER BY [tran_date]), [tran_date]) AS [Time_Diff]
                            FROM   @TestData
                        ) AS [a]
         ) AS [b]
ORDER BY [b].[acct]
       , [b].[tran_date];
tran_date               acct        amt         RN                   Time_Diff   Running_Total
----------------------- ----------- ----------- -------------------- ----------- -------------
2019-07-01 01:21:08.000 1           100         1                    0           100
2019-07-01 01:30:50.000 1           200         2                    9           300
2019-07-01 01:46:21.000 1           150         3                    16          450
2019-07-01 03:23:41.000 1           50          1                    97          50
2019-07-01 03:24:40.000 1           300         2                    1           350
2019-07-01 09:53:28.000 2           400         1                    0           400
2019-07-01 12:56:15.000 2           50          1                    183         50
2019-07-01 17:43:55.000 2           500         1                    287         500
2019-07-01 05:15:54.000 3           20          1                    0           20
2019-07-01 05:30:00.000 3           50          2                    15          70
2019-07-01 05:36:27.000 3           10          3                    6           80
2019-07-01 05:59:00.000 3           250         4                    23          330
2019-07-01 06:18:00.000 3           80          5                    19          410
2019-07-01 06:25:56.000 3           100         6                    7           510
2019-07-01 09:34:34.000 4           150         1                    0           150
2019-07-01 09:47:24.000 4           300         2                    13          450
2019-07-01 09:52:25.000 4           50          3                    5           500
2019-07-01 11:34:34.000 4           250         1                    102         250
2019-07-01 11:47:24.000 4           100         2                    13          350
2019-07-01 11:52:25.000 4           150         3                    5           500
Related Posts Related Posts :
  • classification of data where attribute values are strings
  • Validate user input using regular expressions
  • Synchronizing and Resampling two timeseries with non-uniform millisecond intraday data
  • determing the number of sentences, words and letters in a text file
  • Deploying impure Python packages to AWS
  • Navigating between multiple Tkinter GUI frames
  • Python - Do I need to remove instances from a dictionary?
  • How can I get the edited values corresponding to the keys of a dictionary in views.py POST method, passed as a context v
  • differentiate between python function and class function
  • From array create tuples on if condition python
  • Looping over a text file list with python
  • Monitoring a real-time data stream with a flask web-app
  • Bad quality after multiple fade effect with pydub
  • delete rows in numpy array in python
  • What are the possible numpy value format strings?
  • Conditional Selecting of child elements in pdfquery
  • Python: split string by closing bracket and write in new line
  • SyntaxWarning: import * only allowed at module level
  • theano ~ use an index matrix and embeddings matrix to produce a 3D tensor?
  • Django background infinite loop process management
  • How can I use Pandas or Numpy to infer a datatype from a list of values?
  • How to add the sum of cubes using a function in python?
  • django registration redux URL's being effected by url with multiple query parameters
  • python - how can I generate a WAV file with beeps?
  • How can I implement a custom RNN (specifically an ESN) in Tensorflow?
  • Python modulo result differs from wolfram alpha?
  • Django on App Engine Managed VM
  • Python - CSV Reading with dictionary
  • Python script works in librarys examples folder, but not in parent directory
  • Dealing with Nested Loops in Python - Options?
  • Get indices of roughly equal sized chunks
  • python - creating dictionary from excel using specific columns
  • SQLAlchemy Determine If Unique Constraint Exists
  • Can I stop rendering variables in Django?
  • Scrapy: traversing a document
  • Common logger settings in Python logging dictConfig
  • Should I pass the object in or build it in the constructor?
  • 3d and 2d subplots in plotly
  • Apache Spark CombineByKey with list of elements in Python
  • How do I round up to the highest multiple of 10 in python?
  • ValueError: invalid literal for int() with base 10: 'skip'
  • How to get entire VARCHAR(MAX) column with Python pypyodbc
  • Use value of variable rather than keyword in python numpy.savez
  • Overlapping cron job that runs the same Django management command: problematic?
  • Distributed Powerset
  • Set a python variable to a C++ object pointer with boost-python
  • How to change array layout?
  • How do I properly structure Templates and include them in the settings in Django 1.8.6?
  • Function parameter semantics (with nested functions)
  • Not enough arguments for format string python
  • How to extract equation from a polynomial fit?
  • How to enable unicode for Python on Fedora?
  • How to check for a POST method in a ListView in Django views? I'm getting a 405 error
  • Pandas fillna values with numpy array
  • How to change the date format in a list of dates
  • Passing a variable between two test cases?
  • scipy curve_fit fails on easy linear fit?
  • how to multiply dataframes elementwise and create multi-index -- Python
  • Can Django's development server route mutliple sites? If yes, how?
  • Apply Number formatting to Pandas HTML CSS Styling
  • shadow
    Privacy Policy - Terms - Contact Us © animezone.co