from sciwork import MathStat
Mathstat workflows¶
Mathstat wraps the stateless helpers from sciwork.stats into a stateful,
method-oriented facade. Instantiate it with your data once, then call whichever
statistics or transforms you need without repeatedly passing the data around.
from sciwork.stats import Mathstat, normal_round, moving_average
ms = MathStat([1, 2, 3, 100])
print(ms.describe()["mean"]) # 26.5
print(ms.percentiles([25, 50, 75])) # array([...])
print(normal_round(1.25, decimals=1)) # 1.3 (half-up rounding)
print(moving_average(ms, window_size=2)) # array([...])
Supported inputs¶
Mathstat accepts anything coerce_vector can handle:
- Sequences or numeric values (lists/tuples)
- NumPy arrays and Pandas Series
- Dictionaries or mappings (values are read in insertion order)
- Pandas DataFrames — pass
column="name"orcolumn=indexto focus on a single column when necessary
When you pass a multi-column DataFrame, vector-based methods require the
column argument. describe_table is the only method so far that expects
a DataFrame and operates column-wise by default.
Frequently used helpers¶
describe¶
Produces a dictionary with min/max/sum/mean/median/std/coeff_var. The method
internally calls :func:sciwork.stats.describe_1d, so the keys, behavior, and
NaN handling match the stateless helpers exactly.
summary = ms.describe()
print(summary["std"]) # sample standard deviation (ddof=1)
percentiles¶
Provide a list of percentile values and get a NumPy array back. Percentiles are
NaN-aware and rely on :func:numpy.nanpercentile under the hood.
outliers¶
Detects outliers using the interquartile range (IQR) heuristic.
ms = MathStat([1, 2, 3, 100])
print(ms.outliers(threshold=1.5)) # array([100.])
Transforms log, power, moving_average¶
These helpers provide quick preprocessing steps. All of them accept optional
column and reuse the stateless implementations from sciwork.stats.transforms.
smoothed = ms.moving_average(window_size=3)
scaled = ms.power(base=10)
describe_table¶
When you initialize Mathstat with a DataFrame, you can get a multi-column
summary without specifying column selectors:
import pandas as pd
ms = MathStat(pd.DataFrame({"a": [1, 2, 3], "b": [2, 4, 6]}))
print(ms.describe_table().loc[:, ("mean", slice(None))])
Rounding helper: normal_round¶
normal_round is a small utility that implements half-up rounding semantics.
It is re-exported at the package root next to Mathstat and moving_average
so you can keep your statistical helpers together.
from sciwork import normal_round
normal_round(0.5) # 1.0
normal_round(1.25, decimals=1) # 1.3
normal_round(-1.25, 1) # -1.2
Unlike Python's round builtin (banker's rounding) the half-up strategy always
pushes 0.5 away from zero, which is ofter preferable for reporting.