Cartero

Agentic Harness Engineering

Hackernews AI Agents by Anon84 24 minutes ago

Time is a construct but it can still break your software

Tech Blogs Stack Overflow JavaScript and TypeScript by Phoebe Sajor 25 minutes ago

Ryan welcomes Jason Williams, senior software engineer at Bloomberg and the creator of Rust-based JavaScript engine Boa, to the show to dive into why date and time handling in JavaScript is so difficult and how the Temporal proposal aims to fix it.

Roboticist-Turned-Teacher Built a Life-Size Replica of Eniac

Hackernews Robotics by oldnetguy 44 minutes ago

DeepTutor: Towards Agentic Personalized Tutoring

Tech Blogs Arxiv Artificial Intelligence by Bingxi Zhao, Jiahao Zhang, Xubin Ren, Zirui Guo, Tianzhe Chu, Yi Ma, Chao Huang 1 hour ago

arXiv:2604.26962v1 Announce Type: new Abstract: Education represents one of the most promising real-world applications for Large Language Models (LLMs). However, conventional tutoring systems rely on static pre-training knowledge that lacks adaptation to individual learners, while existing RAG-augmented systems fall short in delivering personalized, guided feedback. To bridge this gap, we present DeepTutor, an agent-native open-source framework for personalized tutoring where every feature s...

Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding

Tech Blogs Arxiv Compilers 1 hour ago

arXiv:2604.26961v1 Announce Type: new Abstract: Static program slicing is a fundamental software engineering technique for isolating code relevant to specific variables. While recent learning-based approaches using language models (LMs) show promise in automating slice prediction, they suffer from inaccurate dependency modeling and unconstrained generation, where LMs fail to capture precise data flow relations and produce slices containing hallucinated tokens and statements. To address these...

LLM Biases

Tech Blogs Arxiv Artificial Intelligence by Jinhui Han, Ming Hu, Xilin Zhang 1 hour ago

arXiv:2604.26960v1 Announce Type: new Abstract: Transformer-based agentic AI is rapidly being deployed on major platforms to help users shop, watch, and navigate content with less effort. While these systems can deliver impressive performance, a key concern is whether they may be less reliable than they appear. We ask a simple but fundamental question: whether the mechanisms that make transformer-based agents effective can also induce systematic biases or distortions? We study this question ...

CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs

Tech Blogs Arxiv AGI and AI Safety by Elham Nasarian, Abhilash Neog, Kwok-Leung Tsui, Niyousha HosseiniChimeh 1 hour ago

arXiv:2604.26959v1 Announce Type: new Abstract: Integrating large language models (LLMs) into patient-facing healthcare systems offers significant potential to improve access to medical information. However, ensuring clinical safety and factual reliability remains a critical challenge. In practice, AI-generated responses may be conditionally correct yet medically inappropriate, as models often fail to interpret patient context and tend to produce agreeable responses rather than challenge uns...

Designing Ethical Learning for Agentic AI: Toegye Yi Hwang's Ethical Emotion Regulation Framework

Tech Blogs Arxiv AI Psychosis by Ji Yeon Kim 1 hour ago

arXiv:2604.26958v1 Announce Type: new Abstract: Agentic AI systems capable of autonomous goal setting and proactive intervention introduce new challenges for regulating moral-emotional processes in learning environments. Existing frameworks typically treat emotion as reactive feedback or engagement optimization, overlooking the need for normative regulation across autonomous decision cycles.This paper proposes an ethical emotion regulation framework for agentic AI learning design inspired by...

Simulating Validity: Modal Decoupling in MLLM Generated Feedback on Science Drawings

Tech Blogs Arxiv LLM Evaluation by Arne Bewersdorff, Nejla Yuruk, Xiaoming Zhai 1 hour ago

arXiv:2604.26957v1 Announce Type: new Abstract: In science education, students frequently construct hand-drawn visual models of scientific phenomena. These drawings rely on a visual structure where information is encoded through visual objects, their attributes, and relationships. Multimodal large language models (MLLMs) are increasingly used to generate feedback on students' hand-drawn scientific models. However, the validity of such feedback depends on whether model claims are grounded in ...

Can AI be a moral victim? The role of moral patiency and ownership perceptions in ethical judgments of using AI-generated content

Tech Blogs Arxiv AI Psychosis by Hyesun Choung, Soojong Kim 1 hour ago

arXiv:2604.26956v1 Announce Type: new Abstract: The growing use of generative AI raises ethical concerns about authorship and plagiarism. This study examines how people judge the reuse of AI-generated content, focusing on moral patiency and ownership perceptions. In an experiment, participants evaluated two substantively similar manuscripts in which the original source was described as authored by a human, an AI system, or an AI agent with a human-like name. Results showed that copying AI-ge...

Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories

Tech Blogs Arxiv LLM Evaluation by Emmanuel A. Olowe, Danial Chitnis 1 hour ago

arXiv:2604.26955v1 Announce Type: new Abstract: AI tutoring systems in engineering labs face a tension between providing sufficient assistance and preserving learning opportunities. Existing systems typically offer instructors limited control over assistance timing, content, or cost. This paper describes a routing and governance system for LLM-based lab assistance comprising two components: Routiium, an OpenAI-compatible gateway that manages multiple LLM backends with configurable prompt mod...

The Impact of LLM Self-Consistency and Reasoning Effort on Automated Scoring Accuracy and Cost

Tech Blogs Arxiv LLM Evaluation by Scott Frohn 1 hour ago

arXiv:2604.26954v1 Announce Type: new Abstract: Strategic model selection and reasoning settings are more effective than ensembling for optimizing automated scoring with large language models (LLMs). We examined self-consistency (intra-model majority voting) and reasoning effort for scoring conversation-based assessment items in high school mathematics, evaluating 900 student conversations against human-scored ground truths using frontier and low-cost models from OpenAI and Google. Temperatu...

A Randomized Controlled Trial and Pilot of Scout: an LLM-Based EHR Search and Synthesis Platform

Tech Blogs Arxiv LLM Evaluation by Michael Gao, Suresh Balu, William Knechtle, Kartik Pejavara, William Jeck, Matthew Ellis, Jason Thieling, Blake Cameron, Jason Tatreau, Tareq Aljurf, Henry Foote, Michael Revoir, Marshall Nichols, Matthew Gardner, William Ratliff, Bradley Hintze, Angelo Milazzo, Sreekanth Vemulapalli 1 hour ago

arXiv:2604.26953v1 Announce Type: new Abstract: Clinical documentation and data retrieval within Electronic Health Records (EHRs) contribute substantially to clinician workload and burnout. To address this, we developed Scout, an LLM-based EHR search and synthesis platform that enables clinicians to query EHR data using natural language. Each response includes citations linking each claim to the original data source, facilitating easy verification of generated content. We conducted a prospec...

America's New Surveillance Dragnet

Hackernews Privacy by julienchastang 1 hour ago

I Got Sick of Remembering Port Numbers

Hackernews Service Mesh by graiz 1 hour ago