Converting Strings to Reals in a List

I am new to Mathematica and working with streamed data:

 str = OpenRead["https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-03.csv"]
memory=ReadList[str, Record,500];

I'd now like to convert the records from the streamed file into a new list, by converting all numbers to reals. It seems like the default storage method of the data is string for any entry. Correct? In particular, I would need to convert the following variables to reals: Vendor ID, passenger_count, trip_distance, RatecodeID, PULocationID, DOLocationID, payment_type, fare_amount, extra, mta_tax, tip_amount, tolls_amount, improvement_surcharge, total_amount and congestion_surcharge.

Unfortunately, I am struggling to do this - any help would be greatly appreciated!

Answers 2

  • For a test, I only read the first 5 records:

    str = OpenRead[
      "https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-03.csv"]
    mem = ReadList[str, Record, 5];
    

    Now we extract the header and the data and split them into the corresponding table elements:

    header = StringSplit[mem[[1]], ","]
    dat = StringSplit[Rest[mem], ","];
    

    The rows of dat consists of strings that represent a mixture of numbers and strings. To transform the strings representing numbers we need to to pick out the corresponding columns:

    (dat[[All, #]] = ToExpression[dat[[All, #]]]) & /@ {1, 4 ;; 6, 8 ;;}
    

    To get the final table we prepend the header to the data:

    PrependTo[dat, header]
    

    enter image description here

    If in addition you want also to change the date strings into numeric lists, you may say:

    dat[[All, 2]] = DateList /@ dat[[All, 2]]
    dat[[All, 3]] = DateList /@ dat[[All, 3]]
    

    Or with DateObject:

      dat[[All, 2]] = DateObject /@ dat[[All, 2]]
    

    enter image description here


  • The third argument according to documentation is for:

    reads only the first n objects of the specified types.

    This code will run on 3 record (first row is column):

    records = ReadList[str, Record, 3];
    

    the output is a list of strings, use SemanticImportString to convert strings to their data types (Reals, ...).SemanticImportString needs a string with \n as the record separator.

    Use StringRiffle to concatenate records with \n in between.

    SemanticImportString[#, Automatic, "Rows", Delimiters -> ","] &@
     [email protected][records, "\n"]
    

    Output:

    enter image description here

    Using "Dataset" form instead of "Rows":

    SemanticImportString[#, Automatic, "Dataset", Delimiters -> ","] &@
     [email protected][records, "\n"]
    

    Output:

    enter image description here

    The output is a list of records with different types (String, Reals, ...). See SemanticImport documentation to read more about other output forms.


Related Questions