Category "floating-point"

Ho to convert base 10 floating point input to base 2 in MIPS?

I have been tasked with writing a program in MIPS that converts a user-inputted base 10 floating point number to base 2. I am a relatively inexperienced program

Round 37.1-28.75 float calculation correctly to 8.4 instead of 8.3

I have problem with floating point rounding. I want to calculate floating point numbers and round them to (given) N decimals. In this example I want to round to

How to detect non IEEE-754 float, and how to use them?

I'm writing classes for basic types, so that code is logically the same on multiple platforms and compilers (like int_least16_t for int). For fun! (I'm still a

Fast floating-point power of 2 on x86_64

Is there a fast way to take 2.0 to some floating-point degree x? I mean something faster than pow(2.0, x) and preferrably what vectorizes well with AVX2. The c

Convert float to int in Julia Lang

Is there a way to convert a floating number to int in Julia? I'm trying to convert a floating point number to a fixed precision number with the decimal part rep

How to solve - ValueError: could not convert string to float: ''

training_age_in = [1, 2, 3] training_salary_dep = [5, 9, 10] m = [(training_age_in[0], training_salary_dep[0]), (training_age_in[1], training_salary_dep[1]),

Python round a float to nearest 0.05 or to multiple of another float

I want to emulate this function. I want to round a floating point number down to the nearest multiple of 0.05 (or generally to the nearest multiple of anything)

DynamoDBNumberError on trying to insert floating point number using python boto library

Code snippet : conn = dynamo_connect() company = Table("companydb",connection=conn) companyrecord = {'company-slug':'www-google-com12','founding-year':1991,

Excel - float number cannot be summed

I have a bunch of values like 0.5, 1.0, 1.5 etc. I want to write a SUM function for those cells but the value is 0. If I use integers the summing works so the

How can I use a HashMap with f64 as key in Rust?

I want to use a HashMap<f64, f64>, for saving the distances of a point with known x and key y to another point. f64 as value shouldn't matter here, the fo

How to convert float to uint by float representation?

In C I will do this to convert float representation of number into DWORD. Take the value from the address and cast the content to DWORD. dwordVal = *(DWORD*)&a

How can I remove ".0" of float numbers?

Say I have a float number. If it is an integer (e.g. 1.0, 9.0, 36.0), I want to remove the ".0 (the decimal point and zero)" and write to stdout. For example, t

Why wasn't a specifier for `float` defined in `printf`?

It looks like it could have been, there are (at least in C99) length modifiers that can be applied to int: %hhd, %hd, %ld and %lld mean signed char, short, long

Significant digits with IEEE 754 float

The Wiki Double-precision floating-point format says: This gives 15–17 significant decimal digits precision. If a decimal string with at most 15 signific

Why are numbers with many significant digits handled differently in C# and JavaScript?

If JavaScript's Number and C#'s double are specified the same (IEEE 754), why are numbers with many significant digits handled differently? var x = (long)123412

Why is printf round floating point numbers up?

I am trying to print some floating point numbers using printf. For example: int main() { printf("%.1f",76.75); return 0; } Output: 76.8 And I have som

Is there a functional difference between "2.00" and "2.00f"?

I ask because I am using the Box2D library, which calls for mostly float arguments. Although I see a lot of example code that uses the 0.00f format, I am not qu

Does single floating point operation calculated at higher precision and immediately truncated always produce identical result?

Does single floating point operation (like a+b, a-b, a*b or a/b) calculated at higher precision (80 bits) and immediately truncated (to 32 bits) always produce

Algorithm for pow(float, float)

I need an efficent algorithm to do math::power function between two floats, do you have any idea how to do this, (i need the algorithm not to use the function

Getting maximum value of float in SQL programmatically

Is there an method for programmatically (in T-SQL) retrieving the maximum (and minimum) value of a datatype? That it would act like float.MaxValue in C#. I wou