Converting Strings to Reals in a List

I am new to Mathematica and working with streamed data:

 str = OpenRead[""]
memory=ReadList[str, Record,500];

I'd now like to convert the records from the streamed file into a new list, by converting all numbers to reals. It seems like the default storage method of the data is string for any entry. Correct? In particular, I would need to convert the following variables to reals: Vendor ID, passenger_count, trip_distance, RatecodeID, PULocationID, DOLocationID, payment_type, fare_amount, extra, mta_tax, tip_amount, tolls_amount, improvement_surcharge, total_amount and congestion_surcharge.

Unfortunately, I am struggling to do this - any help would be greatly appreciated!

Answers 2

  • For a test, I only read the first 5 records:

    str = OpenRead[
    mem = ReadList[str, Record, 5];

    Now we extract the header and the data and split them into the corresponding table elements:

    header = StringSplit[mem[[1]], ","]
    dat = StringSplit[Rest[mem], ","];

    The rows of dat consists of strings that represent a mixture of numbers and strings. To transform the strings representing numbers we need to to pick out the corresponding columns:

    (dat[[All, #]] = ToExpression[dat[[All, #]]]) & /@ {1, 4 ;; 6, 8 ;;}

    To get the final table we prepend the header to the data:

    PrependTo[dat, header]

    enter image description here

    If in addition you want also to change the date strings into numeric lists, you may say:

    dat[[All, 2]] = DateList /@ dat[[All, 2]]
    dat[[All, 3]] = DateList /@ dat[[All, 3]]

    Or with DateObject:

      dat[[All, 2]] = DateObject /@ dat[[All, 2]]

    enter image description here

  • The third argument according to documentation is for:

    reads only the first n objects of the specified types.

    This code will run on 3 record (first row is column):

    records = ReadList[str, Record, 3];

    the output is a list of strings, use SemanticImportString to convert strings to their data types (Reals, ...).SemanticImportString needs a string with \n as the record separator.

    Use StringRiffle to concatenate records with \n in between.

    SemanticImportString[#, Automatic, "Rows", Delimiters -> ","] &@
     [email protected][records, "\n"]


    enter image description here

    Using "Dataset" form instead of "Rows":

    SemanticImportString[#, Automatic, "Dataset", Delimiters -> ","] &@
     [email protected][records, "\n"]


    enter image description here

    The output is a list of records with different types (String, Reals, ...). See SemanticImport documentation to read more about other output forms.

Related Questions