bluesky
Latest Stories
Agentic Harness Engineering
Agentic Harness Engineering
Time is a construct but it can still break your software
Time is a construct but it can still break your software
Ryan welcomes  Jason Williams, senior software engineer at Bloomberg and  the creator of Rust-based JavaScript engine Boa, to the show to dive into why date and time handling in JavaScript is so difficult and how the Temporal proposal aims to fix it.
Roboticist-Turned-Teacher Built a Life-Size Replica of Eniac
Roboticist-Turned-Teacher Built a Life-Size Replica of Eniac
DeepTutor: Towards Agentic Personalized Tutoring
DeepTutor: Towards Agentic Personalized Tutoring
arXiv:2604.26962v1 Announce Type: new Abstract: Education represents one of the most promising real-world applications for Large Language Models (LLMs). However, conventional tutoring systems rely on static pre-training knowledge that lacks adaptation to individual learners, while existing RAG-augmented systems fall short in delivering personalized, guided feedback. To bridge this gap, we present DeepTutor, an agent-native open-source framework for personalized tutoring where every feature s...
Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding
Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding
arXiv:2604.26961v1 Announce Type: new Abstract: Static program slicing is a fundamental software engineering technique for isolating code relevant to specific variables. While recent learning-based approaches using language models (LMs) show promise in automating slice prediction, they suffer from inaccurate dependency modeling and unconstrained generation, where LMs fail to capture precise data flow relations and produce slices containing hallucinated tokens and statements. To address these...
LLM Biases
LLM Biases
arXiv:2604.26960v1 Announce Type: new Abstract: Transformer-based agentic AI is rapidly being deployed on major platforms to help users shop, watch, and navigate content with less effort. While these systems can deliver impressive performance, a key concern is whether they may be less reliable than they appear. We ask a simple but fundamental question: whether the mechanisms that make transformer-based agents effective can also induce systematic biases or distortions? We study this question ...
CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs
CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs
arXiv:2604.26959v1 Announce Type: new Abstract: Integrating large language models (LLMs) into patient-facing healthcare systems offers significant potential to improve access to medical information. However, ensuring clinical safety and factual reliability remains a critical challenge. In practice, AI-generated responses may be conditionally correct yet medically inappropriate, as models often fail to interpret patient context and tend to produce agreeable responses rather than challenge uns...
Designing Ethical Learning for Agentic AI: Toegye Yi Hwang's Ethical Emotion Regulation Framework
Designing Ethical Learning for Agentic AI: Toegye Yi Hwang's Ethical Emotion Regulation Framework
arXiv:2604.26958v1 Announce Type: new Abstract: Agentic AI systems capable of autonomous goal setting and proactive intervention introduce new challenges for regulating moral-emotional processes in learning environments. Existing frameworks typically treat emotion as reactive feedback or engagement optimization, overlooking the need for normative regulation across autonomous decision cycles.This paper proposes an ethical emotion regulation framework for agentic AI learning design inspired by...
Simulating Validity: Modal Decoupling in MLLM Generated Feedback on Science Drawings
Simulating Validity: Modal Decoupling in MLLM Generated Feedback on Science Drawings
arXiv:2604.26957v1 Announce Type: new Abstract: In science education, students frequently construct hand-drawn visual models of scientific phenomena. These drawings rely on a visual structure where information is encoded through visual objects, their attributes, and relationships. Multimodal large language models (MLLMs) are increasingly used to generate feedback on students' hand-drawn scientific models. However, the validity of such feedback depends on whether model claims are grounded in ...
Can AI be a moral victim? The role of moral patiency and ownership perceptions in ethical judgments of using AI-generated content
Can AI be a moral victim? The role of moral patiency and ownership perceptions in ethical judgments of using AI-generated content
arXiv:2604.26956v1 Announce Type: new Abstract: The growing use of generative AI raises ethical concerns about authorship and plagiarism. This study examines how people judge the reuse of AI-generated content, focusing on moral patiency and ownership perceptions. In an experiment, participants evaluated two substantively similar manuscripts in which the original source was described as authored by a human, an AI system, or an AI agent with a human-like name. Results showed that copying AI-ge...
Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories
Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories
arXiv:2604.26955v1 Announce Type: new Abstract: AI tutoring systems in engineering labs face a tension between providing sufficient assistance and preserving learning opportunities. Existing systems typically offer instructors limited control over assistance timing, content, or cost. This paper describes a routing and governance system for LLM-based lab assistance comprising two components: Routiium, an OpenAI-compatible gateway that manages multiple LLM backends with configurable prompt mod...
The Impact of LLM Self-Consistency and Reasoning Effort on Automated Scoring Accuracy and Cost
The Impact of LLM Self-Consistency and Reasoning Effort on Automated Scoring Accuracy and Cost
arXiv:2604.26954v1 Announce Type: new Abstract: Strategic model selection and reasoning settings are more effective than ensembling for optimizing automated scoring with large language models (LLMs). We examined self-consistency (intra-model majority voting) and reasoning effort for scoring conversation-based assessment items in high school mathematics, evaluating 900 student conversations against human-scored ground truths using frontier and low-cost models from OpenAI and Google. Temperatu...
A Randomized Controlled Trial and Pilot of Scout: an LLM-Based EHR Search and Synthesis Platform
A Randomized Controlled Trial and Pilot of Scout: an LLM-Based EHR Search and Synthesis Platform
arXiv:2604.26953v1 Announce Type: new Abstract: Clinical documentation and data retrieval within Electronic Health Records (EHRs) contribute substantially to clinician workload and burnout. To address this, we developed Scout, an LLM-based EHR search and synthesis platform that enables clinicians to query EHR data using natural language. Each response includes citations linking each claim to the original data source, facilitating easy verification of generated content. We conducted a prospec...
America's New Surveillance Dragnet
I Got Sick of Remembering Port Numbers
Smart App Control
Smart App Control
Show HN: Winpodx – run Windows apps on Linux as native windows
Show HN: Winpodx – run Windows apps on Linux as native windows
Show HN: What happens when you load a webpage (Interactive)
Show HN: What happens when you load a webpage (Interactive)
ChatGPT Images 2.0 is a hit in India, but not a big winner elsewhere, yet
ChatGPT Images 2.0 is a hit in India, but not a big winner elsewhere, yet
AI Skills as loader spec, not prompts – why the architecture changes everything
AI Skills as loader spec, not prompts – why the architecture changes everything